IN51C-1700: Rolling Deck to Repository (R2R): Linking and Integrating Data for Oceanographic Research (Invited)
Authors: Robert A Arko1, Cynthia L Chandler2, Paul D Clark3, Adam Shepherd2, Carla Moore4
Author Institutions: 1. Lamont-Doherty Earth Observatory, Palisades, NY, USA; 2. Woods Hole Oceanographic Institution, Woods Hole, MA, USA; 3. Scripps Institution of Oceanography, La Jolla, CA, USA; 4. National Geophysical Data Center, Boulder, CO, USA
The Rolling Deck to Repository (R2R) program is developing infrastructure to ensure the underway sensor data from NSF-supported oceanographic research vessels are routinely and consistently documented, preserved in long-term archives, and disseminated to the science community. We have published the entire R2R Catalog as a Linked Data collection, making it easily accessible to encourage linking and integration with data at other repositories. We are developing the R2R Linked Data collection with specific goals in mind: 1.) We facilitate data access and reuse by providing the richest possible collection of resources to describe vessels, cruises, instruments, and datasets from the U.S. academic fleet, including data quality assessment results and clean trackline navigation. We are leveraging or adopting existing community-standard concepts and vocabularies, particularly concepts from the Biological and Chemical Oceanography Data Management Office (BCO-DMO) ontology and terms from the pan-European SeaDataNet vocabularies, and continually re-publish resources as new concepts and terms are mapped. 2.) We facilitate data citation through the entire data lifecycle from field acquisition to shoreside archiving to (ultimately) global syntheses and journal articles. We are implementing globally unique and persistent identifiers at the collection, dataset, and granule levels, and encoding these citable identifiers directly into the Linked Data resources. 3.) We facilitate linking and integration with other repositories that publish Linked Data collections for the U.S. academic fleet, such as BCO-DMO and the Index to Marine and Lacustrine Geological Samples (IMLGS). We are initially mapping datasets at the resource level, and plan to eventually implement rule-based mapping at the concept level. We work collaboratively with partner repositories to develop best practices for URI patterns and consensus on shared vocabularies. The R2R Linked Data collection is implemented as a lightweight “virtual RDF graph”ù generated on-the-fly from our SQL database using the D2RQ (http://d2rq.org) package. In addition to the default SPARQL endpoint for programmatic access, we are developing a Web-based interface from open-source software components that offers user-friendly browse and search.