-
The Fedora Digital Object
Fedora defines a generic digital object model that can be used to persisting and delivering the essential characteristics many kinds of digital content including documents, images, electronic books, multi-media learning objects, datasets, metadata, and many others. This digital object model is a fundamental building block of the Content Model Architecture and all other Fedora-provided functionality.
-
The Fedora Digital Object Model
Fedora uses a "compound digital object" design which aggregates one or more content items into the same digital object. Content items can be of any format and can either be stored locally in the repository, or stored externally and just referenced by the digital object. The Fedora digital object model is simple and flexible so that many different kinds of digital objects can be created, yet the generic nature of the Fedora model allows all objects to be managed in a consistent manner in a Fedora repository.
A good discussion of the Fedora object model (Fedora 2 and prior versions) exists in a recent paper (draft) published in the International Journal of Digital Libraries. While some details of this paper have been made obsolete by the CMA (e.g. Disseminators), the core principles of the model still are part of the CMA. The Fedora object model is defined in XML schema language (see The Fedora Object XML - FOXML). For more information, also see the Introduction to FOXML in the Fedora System Documentation.
The basic components of a Fedora digital object are:
- PID: a persistent, unique identifier for the object.
- Object Properties: a set of system-defined descriptive properties that are necessary to manage and track the object in the repository.
- Datastream(s): the element in a Fedora object that represents a content item. An object can have one or more Datastreams. Each Datastream records useful attributes about the content such as the MIME-type (for Web compatibility) and, optionally, the URI identifying the content's format (from a format registry). The content of a Datastream is treated as an opaque bit stream. It is up to the user to determine how to interpret the content (i.e. data or metadata). The content can either be stored internally in the Fedora repository, or stored remotely (in which case Fedora holds a pointer to the content in the form of a URL). The Fedora object model also supports versioning of Datastream content (see the Fedora Versioning Guide for more information). In the current implementation, every Fedora digital object has one Dublin Core Datastream by default which is used to contain metadata about the object.
-
Datastreams
A datastream is a component of a digital object that represents a data source. Every object will have a reserved Dublin Core datastream (that will be created by the Fedora repository service automatically if one is not provided). The Fedora repository service will also maintain a special datastream that records an audit trail of all changes made to the object. This datastream can not be edited, since only the system controls it. In addition to these special datastreams, a digital object may have any number of additional custom datastreams. Each datastream can be any mime-typed data or metadata, and can either be content managed locally in the Fedora repository or by some external data source (and referenced by a URL).
The basic properties that the Fedora object model defines for a datastream are as follows:
- Datastream Identifier: an identifier for the datastream that is unique within the digital object (but not necessarily globally unique)
- State: the datastream state of Active, Inactive, or Deleted
- Created Date: the date/time that the datastream was created (assigned by the repository service)
- Modified Date: the date/time that the datastream was modified (assigned by the repository service)
- Versionable: an indicator (true/false) as to whether the repository service should version the datastream. By default the repository versions all datastreams.
- Label: a descriptive label for the datastream
- MIME Type: the MIME type of the datastream (required)
- Format Identifier: an optional format identifier for the datastream. Examples of emerging schemes are PRONOM and the Global Digital Format Registry (GDRF).
- Alternate Identifiers: one or more alternate identifiers for the datastream. Such identifiers could be local identifiers or global identifiers such as Handles or DOI.
- Checksum: an integrity stamp for the datastream which can be calculate using one of many standard algorithms (MD5, SHA-1, etc.)
- Bytestream Content: the "stuff" of the datastream is about (such as a document, digital image, video, metadata record)
- Control Group: pertaining the the bytestream content, a new datastream can be defined as one of four types, or control groups, as follows:
- Internal XML Metadata - In this case, the datastream will be stored as XML that is actually stored inline within the digital object XML file. The user may enter text directly into the editing window or data may imported from a file by clicking Import and selecting or browsing to the location of the XML metadata file.
- Managed Content - In this case, the datastream content will be stored in the Fedora repository and the digital object XML file will store an internal identifier to that datastream. To get content, click Import and select or browse to the file location of the import file. Once import is complete, you will see the imported file in a preview box on the screen.
- External Referenced Content - In this case, the datastream content will be stored outside of the Fedora repository, and the digital object will store a URL to that datastream. The datastream is "by reference" since it is not actually stored inside the Fedora repository. While the datastream content is stored outside of the Fedora repository, at runtime, when an access request for this type of datastream is made, the Fedora repository will use this URL to get the content from its remote location, and the Fedora repository will mediate access to the content. This means that behind the scenes, Fedora will grab the content and stream in out the the client requesting the content as if it were served up directly by Fedora. This is a good way to create digital objects that point to distributed content, but still have the repository in charge of serving it up. To create this type of datastream, specify the URL for the datastream content in the Location URL text box.
- Redirect Referenced Content - In this case, the datastream content is also stored outside the repository and the digital object points to its URL ("by-reference"). However, unlike the External Referenced Content scenario, the Redirect scenario signals the repository to redirect to the URL when access requests are made for this datastream. This means that the datastream will not be streamed through the Fedora repository when it is served up. This is beneficial when you want a digital object to have a datastream that is stored and served up by some external service, and you want the repository to get out of the way when it comes time to serve the content up. A good example is when you want a datastream to be content that is stored and served by a streaming media server. In such a case, you would want to pass control to the media server to actually stream the content to a client (e.g., video streaming), rather than have Fedora in the middle re-streaming the content out. To create a Redirect datastream, specify the URL for the content in the Location text box.
Decisions about what to include in a digital object and how to configure its datastreams are basic modeling choices as you develop your repository. The examples in this tutorial demonstrate some common models that you may find useful as you develop your application. Different patterns of datastream designed around particular "genre" of digital object (e.g., article, book, dataset, museum image, learning object) are general known as "content models" in Fedora.
-
Digital Object Model - Access Perspective
Below is an alternative view of a Fedora digital object that shows
the object from an access perspective. The object contains both datastream and disseminator components. Only a few of the object properties are depicted for simplicity. The diagram shows how these components map to various access points on the digital object, known as "representations" of the object. Each representation is identified by a URI that conforms to the Fedora "info" URI scheme . These URIs can be easily converted to the URL
syntax for the Fedora REST-based access service (API-A-LITE).
In the diagram, the object aggregates three datastreams: a Dublin Core metadata record, a thumbnail image, and a high resolution image. From a management perspective each datastream component stores key information including MIME type, creation dates, alternate identifiers, state, and more. From an access perspective, each datastream constitutes a direct representation of the object's content, meaning whatever bytestream is associated with the datastream component is what is accessible (it is a direct transcription of datastream content).
In the diagram there is one disseminator. A disseminator is an optional component used to extend the access points on the digital object. Behind the scenes the disseminator points to a set of service methods that are called upon by the repository to produce "virtual
representations" of the object. A "virtual representation" is content that is not explicitly stored in a digital object, instead it is produced at runtime. A disseminator defines a service-mediated view of the object. In this example, there are two service methods associated with the disseminator, one for producing zoomable images and one for producing grayscale images. These service methods both require a jpeg image as input, therefore datastream
labeled "HIGH" is associated with this disseminator as a runtime parameter. The net effect is that the disseminator produces two extra views of the object's content. The disseminator contains enough information so that a Fedora repository can automatically mediate all interactions with the associated service. To enable this, each disseminator is linked to a special object that contains a service description encoded in the Web Service Description Language (WSDL). The Fedora repository uses this information to make appropriate service calls at run time to produce virtual representations. From a client perspective this is transparent, and the client just requests the virtual
representation with the appropriate Fedora identifier.
-
Four Types of Fedora Digital Objects
Although every Fedora digital object conforms to the Fedora object model, as described above, there are four distinct types of Fedora digital objects that can be stored in a Fedora repository. The distinction between these four types is fundamental to how the Fedora repository system works. In Fedora, there are objects that store digital content entities, objects that store service descriptions, objects used to deploy services, and objects used to organize other objects.
Data Object
In Fedora, a Data object is the type of object used to represent a digital content entity. Data objects are what we normally think of when we imagine a repository storing digital collections. Data objects can represent such varied entities as images, books, electronic texts, learning objects, publications, datasets, and many other entities. One or more Datastreams are used to represent the parts of the digital content. A Datastream is an XML element that describes the raw content (a bitstream or external content). In the CMA, Disseminators, a metadata construct used to represent services, are eliminated though their functionality is still provided in other ways.
The Data object, indeed all Fedora digital objects, now consists of the FOXML digital object encapsulation (foxml:digitalObject) and two fundamental XML elements: Object Properties (foxml:objectProperties) and Datastreams (foxml:datastream). The Data object is the simplest of all the specialized object types and is identical to the digital object described in the Fedora Digital Object Model section above.
Data objects can now be freely shared between Fedora repositories. If a federated identifier-resolver system, such as the Handle System™, or any authoritative name registry system is used, the Data object will have the same identifier for each copy of itself in each participating repository. Sharing Data objects while keeping the same identifier in each copy greatly simplifies replication, and enables many business processes and services that are needed for large scale repository installations integrated within the Fedora Framework. Data objects can still be shared between repositories by including both the original identifier and alternate identifiers as part of the object's metadata.
Behavior Definition Object
In Fedora, a Behavior Definition object or BDef (Service Definition object or SDef) is a special type of control object used to store a model of a Service. A Service contains an integrated set of Operations that a Data object supports. In object-oriented programming terms, the BDEF (SDef) defines an "interface" which lists the operations that are supported but does not define exactly how each operation is performed. This is also similar to approaches used in Web (REST) programming and in SOAP Web services. In order to execute an operation you need to identify the Data object, the BDef (SDef), and the name of the Operation. Some Operations use content from Datastreams (supplied by the Data object) and, possibly, additional parameters supplied by the client program or browser requesting the execution.
Conceptually an Operation is called using the following form (the specifics vary with the actual Fedora interface being used but all will contain some form of this information):
Repository : Get : Data object PID : BDef PID : Operation Name : Optional Parameters
A BDef (SDef) is a building block in the CMA that enables adding customized functionality for Data objects. Using a BDEF (SDef) is a way of saying "this Data object supports these operations." Essentially, a BDef (SDef) defines a "behavior contract" to which one or more Data objects may "subscribe." In repositories, we usually create a large number of similar Data objects and want them all to have the same functionality. To make this approach flexible and easier to use, the CMA uses the Content Model (CModel) object (described below) to contain the model for similar Data objects. Instead of associating the BDef (SDef) directly with each Data object, the relation fedora-model:hasBDef (providesService) is asserted to the CModel object. By following the relation between the Data object to the CModel object, and then from the CModel object to the BDef (SDef) object, we can determine what Operations the Data object can perform. Also note that a Data object (through its CModel object) may support more than one Service (by having multiple BDef (SDef) relations).
BDef (SDef) objects can now be freely shared between Fedora repositories. If a federated identifier-resolver system, such as the Handle System™, or any authoritative name registry system is used the BDef (SDef) object will have the same identifier for each copy of itself in each participating repository. Sharing BDef (SDef) objects while keeping the same identifier in each copy greatly simplifies replication, and enables many business processes and services that are needed for large scale repository installations integrated within the Fedora Framework. BDef (SDef) objects can still be shared between repositories by including both the original identifier and alternate identifiers as part of the object's metadata. The best results will be gained by sharing the Data object, BDef (SDef) objects, and Content Model object as a group maintaining the same original identifies. By using the CMA in this fashion, you transfer a significant unit of the data and metadata that documents the expression pattern for your intellectual work. While this is, by itself, not everything needed, it is a big step forward for creating a durable content repository.
It is worth noting that Behavior Definition objects (Service Definition objects) conform to the basic Fedora object model. Also, they are stored in a Fedora repository just like other Fedora objects. As such, a collection of BDef (SDef) objects in a repository constitutes a "registry" of Service Definitions.
Behavior Mechanism Object
The Behavior Mechanism object (Service Deployment Mechanism) is a special type of control object that describes how a specific repository will deliver the Service Operations described in a BDef (SDef) for a class of Data objects described in a CModel. In the CMA, the BMech (SMech) acts as a deployment object only for the specific repository in which it is ingested; each repository is free to provide functionality in a different way. For example, one Fedora repository may choose to use a Servlet and another may use a SOAP Web service to perform the same function. As another example, individual repository implementations may need to provide the functionality at different end points. Or perhaps, a specific installation may use a dynamic end point resolution mechanism to permit failover to different service providers.
Since the BMech (SMech) operates only within the scope of an individual repository, the operators of that repository are free to make changes to the BMech (SMech) or the functionality it represents at any time (except for temporarily making the object's services unavailable while the change is being made). This approach permits the system operators to control access to services called by the Fedora repository to institute security or policies as their organization determines. It enables Fedora-called services to be managed using the same principles and tools for the deployment of any distributed system. It also enables the system operators to reconfigure their systems quickly without having to change any part of their content except the BMech (SMech) deployment object.
The BMech (SMech) stores concrete service binding metadata. A BMech (SMech) uses a fedora-model:hasBDef (deploysService) relation to a BDef (SDef) as its way of saying "I am able to perform the service methods described by that BDef (SDef)." A BMech (SMech) object is related to a BDef (SDef) in the sense that it defines a particular concrete implementation of the abstract operations defined in a BDef (SDef) object. The BMech (SMech) also uses a fedora-model:isContractor (deploysServiceFor) relation to a CModel as a way of saying "Use me to do the service operations for any Data objects conforming to that CModel."
A BMech (SMech) Object stores several forms of metadata that describe the runtime bindings for invoking service methods. The most significant of these metadata formats is service binding information encoded in the Web Services Description Language (WSDL). The Fedora repository system uses the WSDL at runtime to dispatch service method requests in fulfilling client requests for "virtual representations" of a Data object (i.e., via its Operations). This enables Fedora to talk to a variety of different services in a predictable and standard manner. A BMech (SMech) also contains metadata that defines a "data contract" between the service and a class of Fedora Data objects as defined in the CModel. For the initial deployment of the CMA a simple data contract mechanism was chosen. Since the Datastream IDs are specified in the CModel and the BMech (SMech) is now a deployment control object only for a specific repository, the BMech (SMech) is able to uniformly bind directly to these IDs. In the future a more abstract binding mechanism may be used but this approach is simple and clear, though it may require the creation of a small number of additional BMech (SMech) objects.
A major aspect of the CMA redesign is that there is no requirement that conformance to a Content Model or that referential integrity between objects be checked at ingest time. This may result in a runtime error if the repository cannot find referenced objects, interpret the Content Model or if there are any conformance problems.
It is worth noting that BMech (SMech) objects conform to the basic Fedora object model. Also, they are stored in a Fedora repository just like other Fedora objects. As such, a collection of BMech (SMech) objects in a repository constitutes a "registry" of service deployments that can be used with Fedora objects. In the CMA, BMech (SMech) objects are not freely sharable across repositories. They represent how a specific repository implements a service. However, BMech (SMech) objects can be shared if the operator of the system modifies them for local deployment. Because of this, BMech (SMech) objects should not be automatically replicated between repositories without considering the affect.
Content Model Object
The Content Model object or CModel is a new specialized control object introduced as part of the CMA. It acts as a container for the Content Model document which is a formal model that characterizes a class of digital objects. It can also provide a model of the relationships which are permitted, excluded, or required between groups of digital objects. All digital objects in Fedora including Data, BDef (SDef), BMech (SMech), and CModel objects are organized into classes by the CModel object. In this section, we will primarily discuss the relationship between the Data and CModel objects.
To create a class of Data objects, create a CModel object. Each Data object belonging to the class asserts the relation fedora-model:hasContentModel (conformsTo) using the identifier of the CModel as the object of the assertion. The current CModel object contains a structural model of the Data object. Over time there will be additional elements to the Content Model document but this initial implementation is sufficient to describe the Datastreams which are required to be present in each Data object in the class. The other key relation is to the BDef (SDef) objects. You can assert zero or more fedora-model:hasBDef (providesService) relations in the CModel to BDef (SDef) objects. Regardless of whether any relations are asserted to BDef (SDef) objects, the repository provides access to the Datastreams using a "Default Datastream Service." In effect the default service is inherited from the CModel's parent CModel (remember all digital objects have a CModel even if it is not explicitly declared as a relation). This is the only kind of inheritance permitted in the initial implementation of the CMA.
You do not have to explicitly assert a relation from a Data object to a CModel object if all you want to do is access the Datastreams. Fedora will use its implicit default Content Model and its Default Datastream Service for these Data objects. However, without an explicit Content Model you cannot validate whether the Data object is correctly formed. In the CMA, if the repository cannot find and interpret all the control objects related to a Data object, or cannot interpret the Content Model, it will issue a runtime error when the Data object is accessed. Other than basic conformance to the rules for a properly formed digital object, there is no warning or error issued on ingest or modification of an object in the CMA.
CModel objects can now be freely shared between Fedora repositories. If a federated identifier-resolver system, such as the Handle System™, or any authoritative name registry system is used the CModel object will have the same identifier for each copy of itself in each participating repository. Sharing CModel objects while keeping the same identifier in each copy greatly simplifies replication, and enables many business processes and services that are needed for large scale repository installations integrated within the Fedora Framework. CModel objects can still be shared between repositories by including both the original identifier and alternate identifiers as part of the object's metadata. The best results will be gained by sharing the Data object, BDef (SDef) objects, and CModel object as a group maintaining the same original identifiers. By using the CMA in this fashion, you transfer a significant unit of the data and metadata that documents the expression pattern for your intellectual work. While this is, by itself, not everything needed, it is a big step forward for creating a durable content repository. Over time, Content Model languages can be developed that permit describing an ever larger portion of the essential characteristics of the content and its behaviors.
It is worth noting that Content Model Objects conform to the basic Fedora object model. Also, they are stored in a Fedora repository just like other Fedora objects. As such, a collection of Content Model objects in a repository constitutes a "registry" of Content Models.