The Resource Index is the Fedora module that provides the infrastructure for indexing relationships among objects and their components. Examples of relationships between digital objects include well-known management relationships such as the part-whole links between individual chapters and a book, and semantic relationships useful in digital library organization such as those expressed within the Functional Requirements for Bibliographic Records (FRBR).
Fedora expresses relationships by defining a base relationship ontology [RELS-EXT] using RDFS and provides a slot in the digital object abstraction for RDF expression of relationships based on this ontology. Assertions from other ontologies may also be included along with the base Fedora relationships. All relationships are represented as a graph that can be queried using an RDF query language. The query interface to the Resource Index is exposed as a web service [risearch], [Fedora Relationships].
With release 2.2, Fedora's Resource Index has improved in several significant ways:
syncUpdates module parameter, Fedora can be configured
to provide the guarantee that the triples associated with a given
API-M operation have been flushed before the API-M call returns.
See the Fedora 2.2 API-M Performance Report for more details on how these changes and configuration options affect performance.
The Fedora object model can be abstractly viewed as a directed graph, consisting of internal arcs that relate digital object nodes to their dissemination nodes and external arcs between digital objects. The Resource Index is a Fedora service that allows storage and query of this graph. The Resource Index is automatically updated whenever an object is added or modified.
The Resource Index builds on the RDF primitives build within the semantic web community. Fedora supplies a base relationship ontology [RELS-EXT](defining a core set of internal and external relationships) that can co-exist with domain-specific relationship ontologies from other namespaces. Each digital object's external relationships to other digital objects are expressing in RDF/XML within a reserved datastream in the respective object. A relationship graph over the digital objects in the repository can then be derived by merging the internal relationships implied by the Fedora object model with the external relationships explicitly stated in their relationship datastreams.
The Fedora base ontology describes such relations and properties as the behavior definition implemented, behavior mechanism used, creation date, state, and mime-type.
In the figure below, the graph (abbreviated for clarity) represents three
objects in the repository. demo:SmileyStuff uses the behavior
mechanism demo:dualResImageCollection, which in turn implements the
behavior definition demo:Collection.
Dublin Core statements, as shown in the figure below, are automatically extracted from an object's DC datastream and inserted into the Resource Index. datastream, as shown in the figure below:
The Resource Index will automatically index object-to-object relationships defined in the RELS-EXT datastream. Please consult Fedora Metadata for Object-to-Object Relationships for more information.
Please note that many configuration changes require a full rebuild of the Resource Index to ensure consistency. For example, turning the Resource Index Module off and on again will result in an inconsistent state, as the Resource Index will know nothing about the digital objects created or modified while the module was not loaded. Similarly, enabling full-text indexing [Full-Text] after the repository has already been populated will only add new objects to the full-text model. In general, the only safe configuration changes to make on a running repository are limited to the performance-related pool, buffer and flush parameters. In all cases, configuration changes require a restart of the Fedora server before taking effect.
The Resource Index is configured within two sections of fedora.fcfg, module and datastore.
The Resource Index module is configured with fedora.fcfg
Here's an example of a Resource Index module configuration that uses Kowari with delayed updates:
<module role="fedora.server.resourceIndex.ResourceIndex"
class="fedora.server.resourceIndex.ResourceIndexModule">
<param name="level" value="2"/>
<param name="datastore" value="localKowariTriplestore"/>
<param name="syncUpdates" value="false"/>
</module>Here's another example, this time using MPTStore with immediate updates:
<module role="fedora.server.resourceIndex.ResourceIndex"
class="fedora.server.resourceIndex.ResourceIndexModule">
<param name="level" value="2"/>
<param name="datastore" value="localPostgresMPTTriplestore"/>
<param name="syncUpdates" value="true"/>
</module>An explanation of the parameters and their possible values:
getImage with parameter size,
whose domain is {small, large}).
Because calculating method parameters may result in a
combinatorial explosion of statements in the Resource Index
(depending on the design of a particular repository's
Behavior Definition Objects), this level of indexing must be
explicitly set.
connectorClassName
parameter with a valid Trippi Connector class.
The example datastore configuration below (with the path parameter modified for the installation environment) would provide a local Kowari triplestore that buffers up to 20,000 triples in memory at a time or waits for 5 seconds of buffer inactivity before flushing them to disk. Because writing triples to disk is a relatively expensive operation, the buffer takes advantage of Kowari's bulk update handler to ingest a mass of triples at a time. The performance gain is significant during a bulk ingest of objects. The size or inactivity interval of the buffer may be adjusted according to performance needs and physical memory capacity.
<datastore id="localKowariTriplestore"> <param name="connectorClassName" value="org.trippi.impl.kowari.KowariConnector"/> <param name="remote" value="false"/> <param name="path" value="/opt/fedora/store/resourceIndex"/> <param name="serverName" value="fedora"/> <param name="modelName" value="ri"/> <param name="poolInitialSize" value="3"/> <param name="poolMaxGrowth" value="-1"/> <param name="readOnly" value="false"/> <param name="autoCreate" value="true"/> <param name="autoTextIndex" value="false"/> <param name="memoryBuffer" value="true"/> <param name="autoFlushDormantSeconds" value="5"/> <param name="autoFlushBufferSize" value="20000"/> <param name="bufferSafeCapacity" value="40000"/> <param name="bufferFlushBatchSize" value="20000"/> </datastore>
An explanation of the parameters and their possible values follows. Certain parameters require other parameters, as indicated in the hierarchy below. Optional parameters are also indicated below. As noted previously, many of these parameters, with the exception of the pool, buffer, and flush parameters, cannot be changed on a running repository without a full rebuild of the Resource Index.
jdbc:mysql://localhost/mydb
would use the local database named
mydb. For McKoi,
jdbc:mckoi://localhost:9157/
would use the local database at port
9157. For oracle,
jdbc:oracle:thin:@localhost:1521:mydb
would use the thin driver to connect
to the local database named mydb at
port 1521.
num (a large numeric type),
action (char(1)),
subject,
predicate, and
object (all
large varchar or text types).
The example datastore configuration below would provide a local MPTStore triplestore backed by Postgres.
<datastore id="localPostgresMPTTriplestore">
<comment>
Example local MPTStore backed by Postgres.
To use this triplestore for the Resource Index:
1) In fedora.fcfg, change the "datastore" parameter of the
ResourceIndex module to localPostgresMPTTriplestore.
2) Login to your Postgres server as an administrative user and
run the following commands:
CREATE ROLE "fedoraAdmin" LOGIN PASSWORD 'fedoraAdmin'
NOINHERIT CREATEDB
VALID UNTIL 'infinity';
CREATE DATABASE "riTriples"
WITH ENCODING='SQL_ASCII'
OWNER="fedoraAdmin";
3) Make sure you can login to your Postgres server as fedoraAdmin.
4) Download the appropriate Postgres JDBC 3 driver from
http://jdbc.postgresql.org/download.html
and make sure it's accessible to your servlet container.
If you're running Tomcat, putting it in common/lib/ will work.
</comment>
<param name="connectorClassName" value="org.trippi.impl.mpt.MPTConnector"/>
<param name="ddlGenerator" value="org.nsdl.mptstore.impl.postgres.PostgresDDLGenerator"/>
<param name="jdbcDriver" value="org.postgresql.Driver"/>
<param name="jdbcURL" value="jdbc:postgresql://localhost/riTriples"/>
<param name="username" value="fedoraAdmin"/>
<param name="password" value="fedoraAdmin"/>
<param name="poolInitialSize" value="3"/>
<param name="poolMaxSize" value="10"/>
<param name="backslashIsEscape" value="true"/>
<param name="fetchSize" value="1000"/>
<param name="autoFlushDormantSeconds" value="5"/>
<param name="autoFlushBufferSize" value="1000"/>
<param name="bufferFlushBatchSize" value="1000"/>
<param name="bufferSafeCapacity" value="2000"/>
</datastore>The Resource Index Search interface is exposed in a REST architectural style, providing a stateless query interface that accepts queries by value or by reference [Fedora Relationships].
The query interface to the Resource Index currently supports three RDF query languages, RDQL (Kowari-only), iTQL (Kowari-only), and SPO (Kowari and MPTStore). NOTE: As of Fedora 2.1, RDQL support is now deprecated and may be entirely absent in future releases. Support for SPARQL is planned for a future release.
Please consult the Resource Index Search documentation for more information.
Demonstration objects that utilize the Resource Index are included in the Fedora distribution. Please see the Demo documentation for more information.
To understand what triples are indexed in the Fedora Resource Index, and to estimate the size of your Kowari triplestore consult document named Triples in the Resource Index.