Open access to digital information
A robust infrastructure, compliant with FAIR principles is needed
With approximately 1.5 billion objects to be digitised, bringing natural science collections to the information age is expected to result in 90 petabytes of new data over the next decades, used on average by 5,000 – 15,000 unique users every day. A robust technical infrastructure is required to support working with digital specimens and collections over their entire research data life cycle and to provide unified open access to the digital information, ensuring that it is Findable, Accessible, Interoperable and Reusable (FAIR).
The three building blocks of the technical infrastructure:
Repositories with data provided by the DiSSCo Facilities
The Infrastructure will connect data that is provided by the DiSSCo facilities in trusted repositories. These can include local institutional repositories as well as global thematic repositories such as GBIF. It will also connect data in third-party repositories like genetic sequence and literature databases. All data that can be linked to collection objects (specimens) are in scope.
Digital Object Infrastructure
The data will be linked through a Digital Object (DO) infrastructure in which Digital Collection (DC) objects and Digital Specimen (DS) objects are the principal object types. For this it is planned to use CORDRA software as the basis for a Natural Sciences Identifier Registry (NSIDR). A sandbox, nsidr.org is presently (early 2020) being used to demonstrate and develop this. The DO infrastructure will include tools for federation and linkage as well as services to support annotation and enrichment of the data by the scientific community. It will draw upon common services provided at the global level or by European Open Science Cloud (EOSC) for, for example authentication and authorization.
The infrastructure will provide community services to discover, consume and interact with the federated Digital Collection and Digital Specimen data. Some of these services will be provided in collaboration with other research infrastructures to enable innovative services for multi-disciplinary science.