FAIR digital information
First, let’s do the math
The three building blocks of the technical infrastructure:
The infrastructure will provide community services to discover, consume and interact with the federated Digital Collection and Digital Specimen data. Some of these services will be provided in collaboration with other research infrastructures to enable innovative services for multi-disciplinary science.
Digital Object Infrastructure
The data will be linked through a Digital Object (DO) infrastructure in which Digital Collection (DC) objects and Digital Specimen (DS) objects are the principal object types. For this it is planned to use CORDRA software as the basis for a Natural Sciences Identifier Registry (NSIDR). A sandbox, nsidr.org is presently being used to demonstrate and develop this. The DO infrastructure will include tools for federation and linkage as well as services to support annotation and enrichment of the data by the scientific community. It will draw upon common services e.g., authentication and authorization, provided at the global level by European Open Science Cloud (EOSC).
Repositories with data provided by the DiSSCo participating institutions
DiSSco will connect data provided by its participating institutions in trusted repositories. These can include local institutional repositories as well as global thematic repositories such as GBIF. It will also connect data in third-party repositories like genetic sequence and literature databases. All data that can be linked to collection objects (specimens) are in scope.
Combining Historic collection data with data from new technologies
The DiSSCo infrastructure will connect historical collection data with specimen-derived data emerging from new technologies, e.g., DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (computer-assisted tomography (CT) and Synchrotron data). This integration will combine earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking.
Species interactions data
An advanced, state-of-the-art infrastructure is needed to facilitate novel approaches and technologies for accelerated field species identification, regular environmental monitoring, trend analysis and future prediction. Machine readability and actionability will enable integration of quality assured FAIR data into analytical workflows and tools.
The DiSSCo technical infrastructure will provide services tailored to researchers needs that have been collaboratively identified using user discussions and stories, and reinforced with an agile development process that integrates testing and feedback after each development round. All user stories (160+) are collected in the DiSSCo GitHub repository. Software components for the technical infrastructure are developed as open source.
A provisional Data Management Plan (DMP), which can be found in the DiSSCo Knowledge Base has been provided as deliverable D6.6 by the ICEDIG project. It is a living document that will be updated in the DiSSCo Prepare project. The DiSSCo DMP describes the main DiSSCo data management principles and requirements, using Digital Object Architecture (DOA) and FAIR Digital Objects (FDO) as its foundation. The DiSSCo DMP offers unified policies for data providers, managers and users, and guidance on technical standards to be applied.
Digital object infrastructure in more detail
(Click the ‘+’ symbol to the right of a topic to expand its description)
Digital Specimens (DS)
A Digital Specimen (DS) contains the data or links to data about a physical specimen in a natural sciences collection, and as such acts as its digital representation (or surrogate) on the Internet. Digital Specimens provide an anchoring function for data locked up in physical specimens and released through digitization and other computational practices. However, they are more than just digital representations. Philosophically, digital objects (of which DS are a kind) represent a new category of industrial object sitting alongside natural objects (such as rocks, plants and animals) and tools (hammers, drills, screwdrivers). This opens many new and exciting possibilities for digital manipulation and computation that can lead to new working practices and a digital transformation in collections-based science.
Digital Collections (DC)
Digital Collections (DC) are another kind of digital object supported by DiSSCo. They are used to provide descriptions of distinctly identifiable collections of natural sciences specimens, such as a specific herbarium collection or a collection of insects. In the digital world, there can be many Digital Collections, reflecting different themes with the possibility that a Digital Specimen can belong to one or many Digital Collections simultaneously.
Natural Science Identifiers (NSId)
A Natural Sciences Identifier (NSId) is a kind of universal and stable persistent identifier – a long-lasting reference to a digital resource – that is used to unambiguously, uniquely and globally identify a Digital Specimen or a Digital Collection.
The notion of Natural Sciences Identifiers is central to museums’ ambitions for widening access, and to proposed notions of Extended Specimens. NSIds act as a digital doorway that allows more than just finding, accessing and re-using specimens data. A wide variety of novel first and third-party services become possible, including for example: harmonizing the arrangement of loans and visits through the European Loans and Visits System (ELViS), finding specimens related to one another (think: ‘customers who viewed this also viewed these’), linking to third-party information, and providing support to the Nagoya Protocol on Access and Benefit Sharing.
FAIR Digital Objects
Technically, Digital Specimens, Digital Collection and other kinds of DiSSCo digital object are specific kinds of ‘FAIR Digital Objects’. A FAIR Digital Object combines the attributes of digital objects generally with the FAIR Guiding Principles. Formally, it is defined as ‘a unit composed of data and/or metadata regulated by structures or schemas, and with an assigned globally unique and persistent identifier (PID), which is findable, accessible, interoperable and reusable both by humans and computers for the reliable interpretation and processing of the data represented by the object.’ As structured data these specific kinds of FAIR Digital Object are stored in and reliably accessible from data storage repositories according to a schema that is being standardized as the ‘specification for open Digital Specimens and other objects – ‘openDS’ for short. Such a specification ensures not only that Digital Specimens, Digital Collections and other objects are findable and accessible but also that they are interoperable and reusable across a wide range of different software applications, services and systems. The openDS standard is intended to bring a harmonizing effect across the research infrastructure subsystems contributing to DiSSCo and also at the global level.
There is an emerging new generation of relevant standards that DiSSCo will be based on. To make it possible to accumulate data about a specimen from the first point of creation of digital data, a new specification is created in TDWG to share Minimum Information about a Digital Specimen, MIDS. Providing the specimen collections in Europe as one virtual collection requires a new standard for describing the collections in a uniform way. For this a Collection Description specification, CD is created in TDWG. Digital Specimens will be created as FAIR Digital Objects. For the object definitions of the Digital Specimens and their related objects an open Digital Specimen (openDS) specification is being created. OpenDS deﬁnes the structure and content of each object type, and the operations that can act upon them. The specification will also describe the serialization and packaging for transfer between systems.