Table of Contents

  1. About This Service
  2. Installation
  3. Demonstration Setup
  4. Configuration

About This Service

The Fedora OAI Provider Service is part of the Fedora Service Framework and offers a vast improvement over the previous, simple OAI provider that Fedora used in previous versions. The new provider is based on Proai, an open source caching, polling OAI provider. It has the following features:

Installation

To install the service:

  1. Make sure you have a suitable database installed (MySQL, Oracle, or McKoi) and a database user account that can create tables in the database.
  2. Make sure your Fedora repository is running with the ResourceIndex turned ON. This is necessary because the oai provider periodically queries the resource index to discover what records of interest have changed.
  3. Deploy the oaiprovider.war file into your servlet container.
  4. Configure the OAI provider as described in the Configuration section below
  5. Re-start the webapp (this is often done by restarting the servlet container itself).

Demonstration / Test Setup

The source distribution of the OAI Provider Service includes several test Fedora objects.  You can use these object and the default proai.properties configuration file to  quickly get an idea for how the service works.
  1. Complete installation steps 1-4 above, using most of the default values in the proai.properties configuration file, but making sure that the following properties are set according to your own Fedora installation:
    • driver.fedora.baseURL
    • driver.fedora.user
    • driver.fedora.pass
  2. Make sure your Fedora installation is configured to retain (rather than re-generate) PIDs of objects in the "demo" PID namespace on ingest.  You can check this in your fedora.fcfg file:  If one of the values of "retainPIDs" is "demo" or "*" (asterisk), Fedora is configured correctly.  Otherwise, you should add this value and re-start Fedora.
  3. Use the fedora-admin GUI or fedora-ingest command-line utility to ingest all foxml objects in the src/test/foxml directory of the Fedora OAI Provider source distribution.
  4. Start the webapp.
Upon starting the webapp, the service will poll Fedora for objects that provide oai record content and have changed since its last update.  It will find the objects you just ingested, request appropriate disseminations of each, and save them in its cache.  Once it has successfully completed a cache update cycle, you should be able to hit the front-end (where you installed the service) with OAI-PMH verbs.  Here are a couple examples:

http://localhost/oaiprovider/?verb=Identify
http://localhost/oaiprovider/?verb=ListRecords&metadataPrefix=oai_dc

Now try using fedora-admin to edit one of the datastreams of one of the demo objects you just ingested.  The next time the oaiprovider service polls Fedora for modified records (the poll frequency is 60 seconds by default, see proai.driverPollSeconds below), it should pick up this change and make it available via the front-end.

Configuration

By editing the WEB-INF/classes/proai.properties file, you can configure the OAI provider in many different ways. See the example configuration values and change them to match the configuration you want.

The example configuration follows.


#
# Proai Configuration
# ================================

#######################
# Required Properties #
#######################

proai.sessionBaseDir = /tmp/proai/sessions
proai.secondsBetweenRequests = 300
proai.incompleteRecordListSize = 2
proai.incompleteSetListSize = 2
proai.incompleteIdentifierListSize = 2

# [proai.readLockWait]
# The number of milliseconds to wait for a read lock on the cache
# before giving up. A higher value means that OAI harvesters won't
# have to deal with "Service Unavailable" requests as often, whereas
# a lower value will result in quicker response.

proai.readLockWait = 5000

# [proai.writeLockWait]
# The number of milliseconds to wait for a write lock on the cache
# before giving up. If this amount of time passes while waiting
# to commit a change to the cache, the update will be canceled and
# attempted again after proai.cache.RecordCache.pollSeconds

proai.writeLockWait = 60000

# [proai.cacheBaseDir]
# The directory where cache files should be stored. This will be created
# if it doesn't exist.

proai.cacheBaseDir = /tmp/proai/cache


# [proai.driverClassName]
# The class name of proai.driver.OAIDriver implementation. This should be
# in the classpath.

proai.driverClassName = fedora.services.oaiprovider.FedoraOAIDriver


# [proai.driverPollSeconds]
# How often to poll the driver for updates.

proai.driverPollSeconds = 60


# [proai.db.url]
# The JDBC connection URL for the database that will be used by the cache.
# [proai.db.driverClassName]
# The class name of the JDBC driver appropriate for use with the
# connection url. This class should be in the classpath.
# [proai.db.username]
# The database user. This user should already exist in the database
# and must have permission to create, modify, and query tables.
# [proai.db.password]
# The password for the database user.

# MySQL
proai.db.url = jdbc:mysql://localhost/proai?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true
proai.db.driverClassName = com.mysql.jdbc.Driver

# McKoi
#proai.db.url = jdbc:mckoi:local://build/test/mckoi/mckoi.conf?create_or_boot=true
#proai.db.driverClassName = com.mckoi.JDBCDriver

proai.db.username = proai
proai.db.password = proai

# [JDBC Driver-specific DDL Converters]
# A ddlConverter class is used to generate the commands necessary for
# creating the tables required Proai's record cache. The name of the
# property should be the driverClassName of the JDBC driver you're using
# plus ".ddlConverter".
com.mckoi.JDBCDriver.ddlConverter = proai.util.McKoiDDLConverter
com.mysql.jdbc.Driver.ddlConverter = proai.util.MySQLDDLConverter
oracle.jdbc.driver.OracleDriver.ddlConverter = proai.util.OracleDDLConverter

#################################
# OAIDriver-Specific Properties #
#################################

driver.fedora.baseURL = http://localhost:8080/fedora/
driver.fedora.user = fedoraAdmin
driver.fedora.pass = fedoraAdmin

# [driver.fedora.queryConnectionTimeout]
# When querying the resource index, the maximum number of seconds to
# wait for the http connection to be established before giving up.
driver.fedora.queryConnectionTimeout = 30

# [driver.fedora.querySocketTimeout]
# When querying the resource index, the maximum number of seconds of
# socket inactivity to allow before giving up.
driver.fedora.querySocketTimeout = 600

# [driver.fedora.disseminationConnectionTimeout]
# When getting xml data from Fedora, the maximum number of seconds to
# wait for the http connection to be established before giving up.
driver.fedora.disseminationConnectionTimeout = 30

# [driver.fedora.disseminationSocketTimeout]
# When getting xml data from Fedora, the maximum number of seconds of
# socket inactivity to allow before giving up.
driver.fedora.disseminationSocketTimeout = 120

driver.fedora.queryFactory = fedora.services.oaiprovider.ITQLQueryFactory
driver.fedora.identify = http://localhost:8080/fedora/get/demo:MyRepository/Identify.xml

# space-separated list of metadata formats
driver.fedora.md.formats = oai_dc test_format formatX formatY

driver.fedora.md.format.oai_dc.loc = http://www.openarchives.org/OAI/2.0/oai_dc.xsd
driver.fedora.md.format.oai_dc.uri = http://www.openarchives.org/OAI/2.0/oai_dc/
driver.fedora.md.format.oai_dc.dissType = info:fedora/*/oai_dc
# optional
driver.fedora.md.format.oai_dc.about.dissType = info:fedora/*/about_oai_dc
driver.fedora.md.format.formatX.about.dissType = info:fedora/*/demo:XYFormatsBDef/getMetadataAbout?format=x
driver.fedora.md.format.formatY.about.dissType = info:fedora/*/demo:XYFormatsBDef/getMetadataAbout?format=y

driver.fedora.md.format.test_format.loc = http://example.org/testFormat.xsd
driver.fedora.md.format.test_format.uri = http://example.org/testFormat/
driver.fedora.md.format.test_format.dissType = info:fedora/*/test_format

driver.fedora.md.format.formatX.loc = http://example.org/formatX.xsd
driver.fedora.md.format.formatX.uri = http://example.org/formatX/
driver.fedora.md.format.formatX.dissType = info:fedora/*/demo:XYFormatsBDef/getMetadata?format=x

driver.fedora.md.format.formatY.loc = http://example.org/formatY.xsd
driver.fedora.md.format.formatY.uri = http://example.org/formatY/
driver.fedora.md.format.formatY.dissType = info:fedora/*/demo:XYFormatsBDef/getMetadata?format=y

driver.fedora.itemID = http://www.openarchives.org/OAI/2.0/itemID
driver.fedora.setSpec = http://www.openarchives.org/OAI/2.0/setSpec
driver.fedora.setSpec.name = http://www.openarchives.org/OAI/2.0/setName
driver.fedora.setSpec.desc.dissType = info:fedora/*/SetInfo.xml

driver.fedora.itemSetSpecPath = $item <fedora-rels-ext:isMemberOf> $set $set <http://www.openarchives.org/OAI/2.0/setSpec> $setSpec

# optional oai-deleted property
#driver.fedora.deleted = info:fedora/fedora-system:def/model#state

# optional volatile property
#driver.fedora.volatile = true

#################################
# Advanced, Optional Properties #
#################################

# [Advanced connection pool configuration]
# These properties map to those defined by the Apache commons-DBCP project,
# documented at http://jakarta.apache.org/commons/dbcp/configuration.html
#
# proai.db.defaultAutoCommit =
# proai.db.defaultReadOnly =
# proai.db.defaultTransactionIsolation =
# proai.db.defaultCatalog =
# proai.db.maxActive =
# proai.db.maxIdle =
# proai.db.minIdle =
# proai.db.initialSize =
# proai.db.maxWait =
# proai.db.testOnBorrow =
# proai.db.testOnReturn =
# proai.db.timeBetweenEvictionRunsMillis =
# proai.db.numTestsPerEvictionRun =
# proai.db.minEvictableIdleTimeMillis =
# proai.db.testWhileIdle =
# proai.db.validationQuery =
# proai.db.accessToUnderlyingConnectionAllowed =
# proai.db.removeAbandoned =
# proai.db.removeAbandonedTimeout =
# proai.db.logAbandoned =
# proai.db.poolPreparedStatements =
# proai.db.maxOpenPreparedStatements =
#
# [Connection-specific properties]
# These properties can be anything you want. Two examples follow.
# The name after "db.connection." should be the name of any
# connection-specific property.
#
# proai.db.connection.anyConnectionProperty =
# proai.db.connection.anyConnectionProperty2 =