Technical Infrastructure

FAIR digital information

First, let’s do the math

The three building blocks of the technical infrastructure:

Community Services

The infrastructure will provide community services to discover, consume and interact with the federated Digital Specimen data. Examples are DiSSCover with its annotation services and ELViS. DiSSCo will work together with other Research Infrastructures to implement provided Digital Specimen DOIs to enable innovative services for multi-disciplinary science.

^

Digital Object Infrastructure

The data will be linked through a Digital Object (DO) infrastructure in which Digital Collection (DC) objects and their Digital Media objects are the principal object types.

A sandbox environment is provided for demonstration and testing purposes. Developers of Machine Annotation Services (MAS) can use this environment to test their service. The DO infrastructure includes services for creation of dynamic digital specimen objects based on data from collection management systems as well as annotation services to support quality enhancement and enrichment of the data by the scientific community. A tight integration with collection management systems is planned for bi-directional data exchange between these systems and the digital specimens, transforming the digital specimen into the authoritative source for data aggregators like GBIF, ENA, GGBN, GeoCase, COL and others. Part of the digital specimen data cannot be stored in the collection management systems in use. DiSSCo will provide central repository and indexing services for the full digital specimen objects, but part of this data will be stored as a link (entity relationship) when it is already stored in other trustworthy infrastructures. Source data such as raw TIFFs made in digitisation projects is a national responsibility and will not be stored centrally in DiSSCo.

Combining Historic collection data with data from new technologies

The DiSSCo infrastructure will connect historical collection data with specimen-derived data emerging from new technologies, e.g., DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data (computer-assisted tomography (CT) and Synchrotron data). This integration will combine earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking.

Genomic data

Biochemical data

Morphological data

Geographical data

Taxonomic information

Species interactions data

Ecological data

An advanced, state-of-the-art infrastructure is needed to facilitate novel approaches and technologies for accelerated field species identification, regular environmental monitoring, trend analysis and future prediction. Machine readability and actionability will enable integration of quality assured FAIR data into analytical workflows and tools.

The DiSSCo technical infrastructure will provide services tailored to researchers needs that have been collaboratively identified using user discussions and stories, and reinforced with an agile development process that integrates testing and feedback after each development round. All user stories (160+) are collected in the DiSSCo GitHub repository. Software components for the technical infrastructure are developed as open source.

A provisional Data Management Plan (DMP), which can be found in the DiSSCo Knowledge Base has been provided as deliverable D6.6 by the ICEDIG project. It is a living document updated during the DiSSCo Prepare project. The DiSSCo DMP describes the main DiSSCo data management principles and requirements, using Digital Object Architecture (DOA) and FAIR Digital Objects (FDO) as its foundation. The DiSSCo DMP offers unified policies for data providers, managers and users, and guidance on technical standards to be applied.

DiSSCo offers different technical knowledge platforms at the scientific community’s disposal:

DiSSCoTech

Get the latest technical posts about the design of DiSSCo’s Infrastructure

DiSSCo Labs

A preview of experimental services and demonstrators by the DiSSCo community

DiSSCo GitHub

Code hosting for DiSSCo software, version control and collaboration

Modelling FWK

A WikiBase tool that is configured to create an abstraction of the DiSSCo data model.

Digital object infrastructure in more detail

(Click the ‘+’ symbol to the right of a topic to expand its description)

Digital Specimens (DS)

A Digital Specimen (DS) contains the data or links to data about a physical specimen in a natural sciences collection, and as such acts as its digital representation (or surrogate) on the Internet. Digital Specimens provide an anchoring function for data locked up in physical specimens and released through digitization and other computational practices. However, they are more than just digital representations. Philosophically, digital objects (of which DS are a kind) represent a new category of industrial object sitting alongside natural objects (such as rocks, plants and animals) and tools (hammers, drills, screwdrivers). This opens many new and exciting possibilities for digital manipulation and computation that can lead to new working practices and a digital transformation in collections-based science.

Digital Collections (DC)

Digital Collections (DC) are another kind of digital object supported by DiSSCo. They are used to provide descriptions of distinctly identifiable collections of natural sciences specimens, such as a specific herbarium collection or a collection of insects. In the digital world, there can be many Digital Collections, reflecting different themes with the possibility that a Digital Specimen can belong to one or many Digital Collections simultaneously.

DiSSCo DOI Infrastructure

Bringing together collections data from hundreds of sources requires sophisticated coordination. Globally resolvable Persistent Identifiers, or PIDs, allow for tracking provenance, annotating, and referencing Digital Specimens.

So far, DiSSCo has used Handles as PIDs for all digital objects, but Handles alone aren’t persistent. DiSSCo needs DOIs, or Digital Object Identifiers. DOIs are Handles with guaranteed persistence, which makes them a reliable tool for citation and provenance.

DiSSCo recently developed test setup has demonstrated that DiSSCo is capable of developing and maintaining reliable DOI infrastructure, a step in the right direction towards FAIR and FAIR Digital Objects implementation. As DiSSCo’s infrastructure and data model mature, the future RI is getting closer to linking and annotating collections data across its partners.

FAIR Digital Objects

Technically, Digital Specimens, Digital Collection and other kinds of DiSSCo digital object are specific kinds of ‘FAIR Digital Objects’. A FAIR Digital Object combines the attributes of digital objects generally with the FAIR Guiding Principles. Formally, it is defined as ‘a unit composed of data and/or metadata regulated by structures or schemas, and with an assigned globally unique and persistent identifier (PID), which is findable, accessible, interoperable and reusable both by humans and computers for the reliable interpretation and processing of the data represented by the object.’ As structured data these specific kinds of FAIR Digital Object are stored in and reliably accessible from data storage repositories according to a schema that is being standardized as the ‘specification for open Digital Specimens and other objects – ‘openDS’ for short. Such a specification ensures not only that Digital Specimens, Digital Collections and other objects are findable and accessible but also that they are interoperable and reusable across a wide range of different software applications, services and systems. The openDS standard is intended to bring a harmonizing effect across the Research Infrastructure subsystems contributing to DiSSCo and also at the global level.

New Standards

There is an emerging new generation of relevant standards that DiSSCo will be based on. To make it possible to accumulate data about a specimen from the first point of creation of digital data, a new specification is created in TDWG to share Minimum Information about a Digital Specimen, MIDS. Providing the specimen collections in Europe as one virtual collection requires a new standard for describing the collections in a uniform way. For this a Collection Description specification, CD is created in TDWG. Digital Specimens will be created as FAIR Digital Objects. For the object definitions of the Digital Specimens and their related objects an open Digital Specimen (openDS) specification is being created. OpenDS deﬁnes the structure and content of each object type, and the operations that can act upon them. The specification will also describe the serialization and packaging for transfer between systems.