$Id: installation.dbx,v 1.25 2003/08/29 15:56:49 cwilper Exp $
Copyright © 2003 The Rector and Visitors of The University of Virginia and Cornell University
Table of Contents
This is the installation guide for Fedora. It includes instructions for installing the server and client distributions, as well as instructions for installing and compiling the complete source code distribution.
Required:
Sun's Java Software Development Kit, v1.4. Whether you are installing a binary or source distribution, you must have the J2SDK v1.4 or above. It should be installed on the machine you intend to use as the Fedora server. You can download it at http://java.sun.com/
Optional:
MySQL, v3.23.x, MySQL 4.x, or Oracle 9i The Fedora server is backed in part by a relational database. If you decide not to use the included pure java database, McKoi v0.94 (more about that later), you should download and install one of the known, working databases for your platform (or see the "Other Databases" section below). You can download MySQL at http://www.mysql.com/. (Please note that MySQL v3.23.56 will not work with Fedora in a Windows implementation. For more information please see http://www.fedora.info/wiki/bin/view/Fedora/PoolWasntFoundMessage)
The latest version of the software can always be found at http://www.fedora.info/release/
There are multiple distribution packages in a release. The server binaries, the client binaries, and the entire source code. Each of these packages is distributed in .tar.gz (for Unix) and .zip (for Windows) archives.
Server binary distribution:
Decide where you want the Fedora software to reside. For example, C:\ or /usr/local.
Unpack the archive in that directory. It will create its own directory, named according to the version... like fedora-1.1.1. This will be the home directory of Fedora.
If you're in Unix, you must change the permissions of the files in the server/bin directory so that you can execute them. For instance, if you installed in /usr/local, this can be done with the command chmod 755 /usr/local/fedora-1.1.1/server/bin/*
Client binary distribution:
If a server is already unpacked on the machine, put the client in the same place. That is, if your server is unpacked into C:\fedora-1.1.1\server\, go to C:\ and unpack the client from there. It will go into C:\fedora-1.1.1\client, so that now both the client and server are under one common directory.
If the machine you're installing on doesn't have a server, decide where you want the client to reside. For example, C:\ or /usr/local. Then unpack the archive in that directory. It will create its own directory, named according to the version... like fedora-1.1.1. This will be the home directory of Fedora.
If you're in Unix, you must change the permissions of the files in the client/bin directory so that you can execute them. For instance, if you installed in /usr/local, this can be done with the command chmod 755 /usr/local/fedora-1.1.1/client/bin/*
Source Code:
Decide where you want to work with the source code and unpack it there. It will create its own directory, named according to the version... like fedora-1.1.1-src. This will be your Fedora development directory.
For server, client, and source code installations:
JAVA_HOME must point to the base directory of your J2SDK installation. On Windows, this will usually be something like C:\j2sdk1.4.x\. On Unix, it could be in several places. The /usr and /usr/local directory are commonly used for Java installations.
Note: If you would rather not set your JAVA_HOME to point to the version required by Fedora (for instance, if you run other applications using java1.3) you can instead set the FEDORA_JAVA_HOME environment variable. This way, Fedora's java installation won't conflict with your other java applications.
PATH must contain the executable (bin) directory of the J2SDK installation. This is usally just under JAVA_HOME.
For server and client installations:
FEDORA_HOME must point to the base directory of your Fedora installation. You will choose this directory when you unpack a binary distribution.
For server installations:
PATH must contain the executable directory of the Fedora server software installation. This will be $FEDORA_HOME/server/bin on Unix, or %FEDORA_HOME%\server\bin in Windows.
For client installations:
PATH must contain the executable directory of the Fedora client software installation. This will be $FEDORA_HOME/client/bin on Unix, or %FEDORA_HOME%\client\bin in Windows.
For source code installations:
FEDORA_DEV should contain the directory under which the source was unpacked, including the version part of the path. For instance, if you installed the source in C:\ and it unpacked into C:\fedora-src-1.1.1\, FEDORA_DEV should be C:\fedora-src-1.1.1
PATH should contain the bin directory of Ant. If you don't already have Ant installed, or you have a version before v1.4.1, you can use the version that comes with the Fedora source distribution in under %FEDORA_DEV%\res\ant\bin (in Windows) or $FEDORA_DEV/res/ant/bin (in Unix). IMPORTANT: If you use your own version of Ant, be sure it's v1.4.1+ or the build may not work. You can verify this by typing "ant -version" at the command prompt. Note: If you opt to use the ant binaries included with the source, and you're in Unix, be sure to change the permissions on the files in the bin directory so that you can execute ant. This can be done with the command: EDORA_DEV/res/ant/bin/*
Provided that you've followed all the appropriate instructions above, you can now go to the $FEDORA_DEV directory and type:
ant serverdist
and everything necessary for a server and client distribution will be put in $FEDORA_DEV/dist
If you want to run the server and client straight from where it was compiled, be sure to set your FEDORA_HOME to $FEDORA_DEV/dist, and add $FEDORA_HOME/server/bin and $FEDORA_HOME/client/bin to your PATH environment variable.
For other targets, see the build.xml file or type ant -projecthelp for a brief description of each.
To erase all temporary and compiled files created by a build, first ensure that your database and Fedora server are stopped, and type ant clean. If you're not using McKoi, you should also drop the FedoraObjects (or whatever you named it) database from your RDBMS if you want to start fresh.
Fedora is designed to be RDBMS-independent, but comes with a pure java database called McKoi. It has been tested and works with McKoi, MySQL and Oracle 9i. If you choose to use any database but McKoi, we assume here that it is already installed.
Follow the below instructions for the RDBMS of your choice in order to create the user and tablespace so that the Fedora software can use the database.
Execute the command:
mckoi-init fedoraDBUser fedoraDBPass
...where the first parameter is the username you'd like to use for the database, and the second parameter is the password you'd like to use. Remember this information, as it will be needed later when configuring Fedora.
Note: This command resolves to a batch file (mckoi-init.bat) in Windows, and a shell script (mckoi-init.sh) in Unix. These reside in the server's bin directory.
Execute the command:
mysql-config installDir dbaUser dbaPass fedoraDBUser fedoraDBPass dbName
Where:
installDir is the location where MySQL is installed. In Unix, this might be /usr/local/mysql. In Windows, it might be C:\mysql
dbaUser is the name of the MySQL user with dba privileges. In a default MySQL installation, this will be root.
dbaPass is the password of the MySQL user with dba privileges. In a default MySQL installation, if you're running MySQL on the same machine as Fedora, you can usually pass "" for this argument.
fedoraDBUser is the username you'd like to use for the Fedora software's database access. (e.g., fedoraAdmin)
fedoraDBPass is the password you'd like to use for the above user. (e.g., fedoraAdmin)
dbName is the name you'd like to use for the Fedora database. (e.g., FedoraObjects)
Note: This command resolves to a batch file (mysql-config.bat) in Windows, and a shell script (mysql-config.sh) in Unix. These reside in the server's bin directory.
Important: Fedora comes installed with a JDBC driver that works with MySQL v3.23.x. If you wish to use MySQL 4.*, you must replace that driver. Detailed instructions can be found on this page of the Fedora wiki.
With Oracle 9i, you will need to use an administrative account to manually create 1) the fedora user on the db, and 2) the tablespace on which that user has complete control.
Then you'll need to make sure the Fedora server has the JDBC driver necessary to connect to the database. You can download the appropriate Oracle JDBC driver from http://technet.oracle.com/software/tech/java/sqlj_jdbc/content.html. This will come with a file called ojdbc14.jar. You will need to put that file on your Fedora server, in the directory: FEDORA_HOME/server/tomcat41/common/lib.
Then you should modify the FEDORA_HOME/server/config/fedora.fcfg file and ensure that the datastore entry (near the bottom) for Oracle points to the right host and database. The fedora.fcfg file that comes with the distribution uses a database on the local host called "Fedora".
When you start fedora, be sure to use the parameter "oracle", as in fedora-start oracle. Starting the server is more fully explained later.
If you want to use a different database, it's an option, but it will take some extra work. Here's how it can be done:
The database needs to be JDBC-compliant and must support common SQL-92 syntax. The driver will need to be put in the server's tomcat41/common/lib directory so that Fedora picks it up, and the JDBC url will need to be configured appropriately in the server's config/fedora.fcfg file.
Upon startup, Fedora checks the database for all required tables. If they don't exist, it creates them. Creation of tables is a much less standardized task across RDBMSs than regular SQL queries. Thus, you must either
create the tables and indexes and auto-increments yourself in your own database (see the file: src/dbspec/server/fedora/server/storage/resources/DefaultDOManager.dbspec in the source distribution for the RDBMS-neutral table specifications),
write your own subclass of fedora.utilities.DDLConverter for your database software, include it in a jar file in server/tomcat41/common/lib, and associate it with the JDBC driver inside the server/config/fedora.fcfg file (see how it's done by looking at the MySQLDDLConverter and McKoiDDLConverter associations with their respective drivers in the fedora.fcfg file, and the classes' implementations in the source distribution).
If you choose option #2, please tell us about it, as it will be useful for other users of Fedora! Option 2 is harder, but it will make future installations of new versions of Fedora (where the db schema will likely change) much easier for you if you plan on using that database later.
Before starting for the first time, you will want to make some changes in Fedora's main configuration file, server/config/fedora.cfg.
The configuration file has a simple schema. It starts with a server element, under which a series of param elements occur, followed by a series of module elements, followed by a series of datastore elements.
The param elements directly following the root server element are used to control what are considered generic server functionality. Examples include the level of logging to do, the port on which the server should be exposed, and the superuser password.
The module elements are used to configure specific parts of Fedora. For instance, the module with the role attribute, “fedora.server.search.FieldSearch” is used to configure the field-searching component of the server. Inside the module element, several param elements are included. These are specific to that module’s implementation. Descriptions of each parameter can currently be found in the configuration file itself. Important ones will be listed below.
The datastore elements are used to configure various databases that might be used by the system. Although the sample configuration file holds several, you will typically only need one. The datastore elements are associated with the modules by means of a parameter inside the associated module. In the sample configuration file, for example, the poolNames param of the fedora.server.storage.ConnectionPoolManager module refers to one of the datastore elements in it’s value.
You will probably want to change the values of the following parameters for you own installation. Items in bold must be changed prior to running the server for the first time. Other items may require changes depending on your environment. For instance, object_store_base should be changed if you're running the server in Unix, as the default value is windows-specific. All configurable items are described in detail in the fedora.fcfg file.
This is where you set the value of the administrative user, “fedoraAdmin”’s password. This is used when authenticating remote requests for performing administrative functions.
A directory where XML serializations of the digital objects will be stored. This will be created if it doesn’t already exist. This parameter must be given as an absolute path.
A temporary directory. This will be created if it doesn’t already exist. This parameter must be given as an absolute path.
A directory where datastreams of digital objects will be stored. This will be created if it doesn't already exist. This parameter must be given as an absolute path.
The port on which the server should run. 8080 is often used, but this can be anything your OS / user privleges will allow.
The fully qualified host name of the machine on which Fedora runs. If the machine has aliases, use the alias that will be used by people connecting to the server (be it via a web browser or one of the SOAP API exposures).
Since you will be changing this value before running Fedora for the first time, you must also change the hostname (and possibly the port) in the URLs inside the demo objects, if you want to use them. So, after making changes to the configuration file, but before you ingest the demo objects, run the fedora-convert-demos script. See the Client Command-Line Utilities document for instructions on using this and other scripts that come with Fedora.
This is the namespace id part of newly generated PIDs for objects. This should be a short string consisting of the characters [a-z][0-9]. When objects are first ingested or created in the repository, this will be the first part of the identifier used for them. (Note: the pid namespace won’t be used for objects that are ingested with a “demo:” pid)
This specifies the database connection pool to be used for the storage subsystem. Normally this will just identify ConnectionPoolManager’s default connection pool.
A comma-separated list of IP ranges (for example, “200.200.0.0-255.255.0.0,100.0.0.0-180.0.0.0”) that the client’s address is compared to for Management API access. If this is specified, the remote address must match for any Management request in order to be accepted. If this is not specified, all requests will be accepted unless the remote address matches a deny pattern (see below). Important: The fedora.fcfg file that comes with Fedora allows access to 127.0.0.1 (this is the loopback IP, meaning that the only host that can connect for write operations is the server). You need to also put your server's real IP address in this value, to take care of cases where your server, when acting as a client to itself, identifies itself with it's real IP address. If you don't do this, you may run into access restriction errors when trying to connect to the server from the same machine.
IP Ranges to deny. If specified, the remote address must not match for any Management request in order to be accepted. If this is not specified, request acceptance is governed solely by the allowHosts parameter.
Same as Management/allowHosts, except this controls access to the Access API.
Same as Management/denyHosts, except this controls access to the Access API.
This is for additional repository security. If true, all datastreams (even those that are referenced) will be piped through Fedora when they are sent to behavior services. See the description of this parameter inside fedora.fcfg for more detail on this option.
Specifies the database connection pool to be used for the fielded search functionality. Normally this will be ConnectionPoolManager’s default pool.
How you want your repository to be named in the OAI-PMH interface’s Identify request.
This will be used in conjunction with the pidNamespace to uniquely identify the items in your repository to OAI harvesters. The default value is example.org, but you'll want to change it to the domain name where the repository is hosted.
Who the OAI-PMH interface identifies as administrators to harvesters. This is a space-separated list.
A space separated list of OAI-PMH provider endpoints identifying other providers that you associate with. OAI harvesters use this for discovery purposes.
The pool to be provided to modules that request the default connection pool.
A comma delimited list of pools to make available. These should be identified by the id attribute of one of the datastore elements.
The username for database access. This should match the username used previously when setting up the database for Fedora.
The password for the database user.
A JDBC URL that can be used to connect to your database. The syntax must match that required by the driver. Working examples for MySQL and McKoi are included.
Identifies the driver to use for connecting to the database. This is RDBMS-specific. If you need to use a driver that isn’t already included, put the .jar file in server/tomcat41/common/lib./
Datastream Mediation is an optional feature of the Fedora repository system that offers a higher level of access restriction for the physical location of datastreams in Fedora objects. It is controlled through the boolean parameter named doMediateDatastreams in the fedora.fcfg configuration file. By default, the value of the doMediateDatastreams parameter is set to false which disables datastream mediation. The following section describes the advantages and disadvantages of using Datastream Mediation.
Content is stored in Fedora digital objects as one or more datastreams. Each datastream contains a pointer to the physical location of its content. For Referenced External Content and Redirected datastreams, this is a pointer to remote content that is managed outside the custodianship of the repository. For Managed Content and XML Metadata datastreams, this is a pointer to internal content that is managed by the repository.
An object’s content is accessed through its disseminators that interact with external mechanisms associated through the disseminator’s behavior mechanism object. In order for the external mechanisms to function, they must have access to the datastream locations. With Datastream Mediation disabled, Fedora passes the actual physical location of datastreams to external mechanisms as part of the dissemination process. If the external mechanisms are trusted, the risk of exposing the actual datastream locations to these mechanisms may be negligible. However, if the mechanisms are not trusted or reside completely outside the control of the repository administrator, the risk of exposure may be higher. The worst-case scenario would be where an untrusted mechanism captures the actual datastream locations and then accesses the datastreams directly rather than going through the Fedora repository.
The DatastreamMediation option was added to give Fedora repository administrators more control over how datastream locations are exposed. Using Datastream Mediation does affect the functioning of the repository so the administrator needs to weigh the advantages and disadvantages to determine which option best suits their needs.
When Datastream Mediation is enabled, it essentially activates an additional layer of proxying within the repository. When dissemination requests are processed, the repository will proxy all datastream requests to the external mechanisms providing an internal reference in place of the actual datastream locations. The external mechanisms then must communicate with the Fedora repository to resolve the proxy requests and receive the contents of the datastream in the form of a bytestream. This additional layer of proxying effectively hides the actual datastream locations from the external mechanisms and insures that Managed Content and XML Metadata datastreams can only be directly accessed through the Fedora repository. Datastream Mediation will also proxy Referenced External Content datastreams, but since these types of datastreams by definition are remotely available on the internet , Fedora cannot guarantee that their actual locations cannot be discovered by other means. Redirected datastreams are never proxied.
Advantages
Security – Datastream Mediation insures that Managed Content and XML Metadata datastreams can only be directly accessed through the Fedora repository. It provides an extra layer of security if you have external mechanisms that may not be trusted or desire to maintain tighter control over the exposure of datastream locations.
Disadvantages
Performance – Datastream Mediation adds an additional layer of proxying. It does degrade performance slightly since it adds one more network hop necessary to resolve a datastream location by the external mechanism.
Firewall Issues – If your Fedora repository is installed behind a network security firewall, Datastream Mediation will not function unless you have configured the firewall to allow access to the Fedora repository port number (default is port 8080). This is because external mechanisms must make HTTP read requests to the Fedora repository over the server’s port number to resolve the datastream locations. If you have ready access to the firewall’s configuration, this may be an easy task to accomplish. If your firewall is provided by a 3rd party, then accomplishing the necessary configuration may be more difficult or may even not be possible.
Ultimately, it is up to the Fedora repository administrator to decide if Datastream Mediation should be used. Here are some sample scenarios to provide some additional guidance.
Scenario 1 – Consider a Fedora repository where all objects use Managed Content datastreams and the administrator has no control over external mechanisms. The administrator also wants to insure that the only access to Fedora datastreams is through the repository. In this scenario, the Fedora administrator will want to enable Datastream Mediation since the repository manages all the datastreams and also wants tight control over how datastreams may potentially be accessed.
Scenario 2 – Consider a Fedora repository where all objects use Referenced External Content datastreams and some mechanisms may be untrusted. Since all datastreams are Referenced External Content, their actual locations could be discovered through alternate means on the web. In this scenario, the administrator may want to disable Datastream Mediation since it does not offer a significant improvement in controlling datastream location exposure since all of the datastreams in the repository are remote content.
Scenario 3 – Consider a Fedora repository where objects use a mix of datastreams and all external mechanisms are untrusted. In this scenario, the administrator might decide to enable or disable Datastream Mediation depending on the percentage of remote datastreams versus Managed Content datastreams and on the importantance controlling datastream exposure.
Scenario 4 – Consider a Fedora repository where all objects use Managed Content datastrams and the administrator has no control over external mechanisms. The Fedora repository is also behind a firewall and the administrator cannot alter the firewall configuration. The administrator has no choice in this scenario and must disable Datastream Mediation for the repository to function.
If the database isn't already started, start it now.
If you've chosen to use the included mckoi database, you can simply type "mckoi-start" and the database will start
Fedora is normally started with the command:
fedora-start
The configuration file, fedora.fcfg, can be written so that it takes a parameter which has the effect of using an alternate value for some of the <param...> elements, if an alternate parameter exists. The default fedora.fcfg assumes that you're using MySQL, but has been written to use the mckoi database if you start it with:
fedora-start mckoi
...so if you're using mckoi, start it with the above command instead.
Read the fedora.fcfg file and look for the attributes named "mckoivalue" to see how this is done, if you're curious or would like to define your own parameter. You can also change the fedora.fcfg so that it doesn't have to take the "mckoi" parameter in order to run with mckoi (simply set the value of the parameters to that of the mckoivalue attributes so that these values are taken by default.
Note for Unix users: The commands fedora-start and fedora-start mckoi do not have to be run by root.
You can stop the Fedora server with the command:
fedora-stop
The database should be shut down only after the Fedora server has been stopped. If you're using mckoi, you can stop it with the command:
mckoi-stop [dbUser] [dbPassword]
Other databases will have their own shutdown scripts or commands.
The Fedora server now has internal support for avoiding breakage of self-referential URLs (i.e., URLs that contain the hostname and port of the Fedora server itself). This change allows the Fedora server host and port to be re-configured without breaking digital objects that contained URLs that referred to a previous host and port configuration.
The host name and port number on which the Fedora server runs is controlled by the fedoraServerHost and fedoraServerPort parameters in the fedora.fcfg configuration file. By default, these are set to the host name of localhost and port number of 8080. The steps you need to take in order to change host name and/or port number vary slightly depending on whether you are making the change for the first time or making the change on a pre-existing Fedora server.
If you are installling Fedora 1.1.1 for the first time and want to run using a host or port other than the defaults, you need to do the following:
Change the values of fedoraServerHost and fedoraServerPort to the values you want to use in the fedora.fcfg file.
Run the demo object converter utility script to change the host and/or port in the demo objects from localhost and 8080 to the values you specified in the fedora.fcfg file. Refer to the Command-line Utilities documentation for additional details on running the demo object converter.
If you plan to use the Soap Client which provides a sample interface to API-A, you will also need to update the parameter fedoraEndpoint with the new host name and/or port number in the soapclient.properties file. On Windows, this file is located in the distribution in:
%fedora_home%\server\tomcat41\webapps\soapclient\WEB-INF\soapclient.properties
On unix, this file is located in the distribution in:
$FEDORA_HOME/server/tomcat41/webapps/soapclient/WEB-INF/soapclient.properties
Refer to the Soap Client documentation for additional information regarding configuration of the Soap Client.
Start the Fedora server.
Ingest the demo objects.
When invoking the Fedora admin-client, remember to specify the new host name and/or port number when invoking the client. e.g., fedora-admin hostname port fedora-username fedora-password
If you have an existing Fedora 1.1.1 server already running and you need to change the host name and/or port number on which the server runs, you need to do the following:
Stop the Fedora server.
Change the values of fedoraServerHost and fedoraServerPort to the values you want to use in the fedora.fcfg file.
If you plan to use the Soap Client which provides an example interface to API-A, you will also need to update the parameter fedoraEndpoint with the new host name and/or port number in the soapclient.properties file. On Windows, this file is located in the distribution in:
%fedora_home%\server\tomcat41\webapps\soapclient\WEB-INF\soapclient.properties
On unix, this file is located in the distribution in:
$FEDORA_HOME/server/tomcat41/webapps/soapclient/WEB-INF/soapclient.properties
Refer to the Soap Client documentation for additional information regarding configuration of the Soap Client.
Restart the Fedora server.
When invoking the Fedora Admin Client, remember to specify the new host name and/or port number when invoking the client. e.g., fedora-admin hostname port fedora-username fedora-password.
There are some minor backward compatibility issues related to the host/port changes in Fedora 1.1.1. Please note that if you have an existing Fedora 1.0 server and want to upgrade to version 1.1.1 of the software, but leave your existing objects as they were ingested under the Fedora 1.0 server (i.e., choose not to purge and then re-ingest the objects), this is possible with the following caveats. Any demo objects ingested under Fedora 1.0 will have the host and port of localhost and 8080 embedded in the demo objects as well as any user-defined objects created using Fedora 1.0 that reference the default host and port of the Fedora 1.0 server. Provided the host and port do not change for the Fedora server, these objects will continue to function correctly under the Fedora 1.1.1 server. However, if the host and/or port changes in the future, these pre-existing objects will no longer function correctly. The only way to fix them after a host/port change is to purge and then re-ingest the objects.
While the server is running, it writes logs to FEDORA_HOME/server/logs. These are timestamped files, written in a simple XML format. You can use these logs to get more extensive information about what the server is doing or to diagnose problems.
If you've just got your Fedora server up and running, it's a good idea to check out the demonstration objects to get an idea of how Fedora works. See the Demo Manual for complete descriptions of the demos.
The demos may be ingested via the Fedora Admin GUI tool or from the command line. If you ingest the from the Admin GUI, you must be sure to ingest bdef objects first, then bmech objects, then regular objects.
To access the Admin GUI use the command:
fedora-admin [hostname] [port] [username] [password]
To ingest the demos from the command line (recommended), use the command:
fedora-ingest-demos [hostname] [port] [username] [password]
Please note that the demo objects must be ingested before they can be discovered using the default search interface.