Technical Infrastructure

Open access to digital information

A robust infrastructure is needed

With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in 90 petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. A robust technical infrastructure is required to support working with large digital datasets over their entire research data life cycle and to provide unified open access to the digital information.

How the infrastructure collects the data

Historical data combined with data from new techniques

The infrastructure will combine earlier investments in data interoperability practices with technological advancements in digitisation, cloud services and semantic linking. The infrastructure will connect historical collection data with data emerging from new techniques that is derived from the specimen but is not necessarily linked to species names. These new data include DNA barcodes, whole genome sequences, proteomics and metabolomics data, chemical data, trait data, and imaging data, e.g Computer-assisted Tomography (CT) and Synchrotron data.

A novel and advanced infrastructure is needed to deliver the diagnostic information required for novel approaches and technologies for accelerated field identification of species, regular environmental monitoring, trend analysis and future prediction. Machine readability will enable integration of quality assured FAIR data into analytical workflows and tools.

The DiSSCo technical infrastructure will provide eServices tailored to actual researchers needs. This is achieved by discussions with users, inventory of user stories and an agile development process that allows for testing and feedback after each development round (sprint). All user stories are collected in the DiSSCo GitHub repository. Software components for the technical infrastructure are developed as open source.

A provisional Data Management Plan (DMP) is provided as deliverable D6.6 in ICEDIG. It is a living document by design that will be updated in the DiSSCo Prepare project. The DiSSCo DMP describes the main DiSSCo data management principles and requirements. It offers unified policies for data providers and users, and guidance on technical standards to be applied.

The three building blocks of the technical infrastructure:

^

Repositories with data provided by the DiSSCo Facilities

The Infrastructure will link data that is provided by the DiSSCo facilities in trusted repositories. These may include local institutional repositories as well as global thematic repositories such as GBIF. All data that can be linked to collection objects (specimens) are in scope.

^

Digital Object Infrastructure

The data will be linked through a Digital Object (DO) infrastructure. For this it is planned to use CORDRA as a Natural Specimen Identifier Repository (NSIDR) and nsidr.org is being used as a sandbox to demonstrate and develop it. The DO infrastructure will include tools for federation and linkage as well as services to support annotation and enrichment of the data by the scientific community. It will draw upon common services provided, e.g. global and European Open Science Cloud services for authentication and authorization.

^

Community Services

The infrastructure will provide community services to discover, consume and interact with the federated Digital Specimen data. Part of these services will be provided in collaboration with other research infrastructures to enable innovative services for multi-disciplinary science.