The Smithsonian Institution: SIdora
The Smithsonian Institution Office of Research Information Services (ORIS) within the Office of the Chief Information Officer (OCIO) addresses the need to manage the digital results of research activities of the institution including outputs from 19 museums, 9 research centers, 8 advanced study centers, 22 libraries, 2 archives and a zoo. ORIS has built the pilot system of “SIdora,” a general information architecture and software environment based on Islandora and Fedora that is designed to be used by Smithsonian researchers to capture and organize the digital “evidence” as they create it in their research processes, and use it directly in their analysis (by integrating workflow management systems like Taverna and Galaxy, as well as Dropbox-like desktop filesystem integration of repository content) and dissemination activities.
The SIdora goal is to actively support the research process as it unfolds, throughout the information lifecycle, leaving behind a coherent expression of the digital content for a complete research project that can permanently stand alongside related publications.
The digital content for each research project is captured in a Fedora repository as a related network of digital objects. The architecture is based on the idea that there are two types of digital objects, concept objects that are used to organize the digital artifacts and provide a meaningful context, to “tell the story” of the research; and resource objects, such as documents, images and data sets that contain the digital artifacts that make up the evidence. At a minimum, concept objects can be created that just have a title, resulting in a context that provides the same kind of container structure as directories in a file-system. With a bit more effort, the concept objects can be fleshed out to provide a rich descriptive framework for the digital artifacts, providing the all-important contextual information that captures the motivation and high-level analysis of the research project.
The Sidora system is designed to allow researchers to create any combination of concept objects that they wish to express the organization of their project. Beginning with one concept object that is the formal description of the project as a whole, they can add child concepts, arbitrarily and recursively, to build their project context. Each concept that is added is a Fedora object that has one main content datastream that contains an XML file in the schema that is supported for the chosen type of concept. When the object is created, an RDF statement is added to its relationship datastream that links the new object back to the parent from which it was added.
Smithsonian Institution researchers will become directly involved in describing and organizing the information that results from their activities. Getting researchers to do these kinds of things has often been compared to herding cats. It has also been said that the only way to herd cats is to tilt the floor. Funding agencies around the world have begun to tilt the floor with requirements for data management plans as part of research grant applications. It remains to be seen whether or not researchers can be enticed into creating their own metadata and organizing their own content to build the durable research data corpus.
This project aims to be somewhere between the metaphor of a library and a layer of RDF-based linked data on top of the World-Wide Web, providing trusted, durable digital information in the initial stage of the research data lifecycle, and growing it as part of a web-based scholarly record that is truly interoperable.
The SIdora architecture is designed to manage research output as if it were part of a network of information. The first version of the software demonstrates managing the excavation evidence of a complete archaeological site in Panama, and for an international study of mammal populations. The demonstration will show how the system enables researchers to manage and describe their own data, use it in with Taverna workflows for analysis, and expose sets of durable resources to be cited in publications.