Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent...

42
Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Iden;fiers 1

description

Slides from the One NOAA Science Seminar in May 2013, given by Kerstin Lehnert.

Transcript of Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent...

Page 1: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Integrated Earth Data Applications: Enhancing  Reliable  Data  Services    

Through  the  Use  of  Persistent  Iden;fiers  

1

Page 2: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Data  services  @  IEDA  

Types  of  Unique  Iden;fiers  @  IEDA  

Use  of  Unique  Iden;fiers  @  IEDA  • Data  Publica+on  • Linking  Data,  Samples,  &  Literature  • Data  Compliance  Support  •  Interoperability  

 

2

Outline

Page 3: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

3

Thanks to the IEDA Team

R. Arko V. Ferrini S. Carbotte A. Goodwillie

L. Hsu A. Johansson K. Lehnert J. Morton

W. Ryan

S. O’Hara

L. Song R. Weissel

T. Rivera

S. Chan B. Chen

D. Walker

J. Ash E. Bohl K. McLain J. Zampas

Page 4: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

4

IEDA Integrated Earth Data Applications

www.iedadata.org

“…  a  community-­‐based  facility  that  serves  to  support,  sustain,  and  advance  the  geosciences  by  providing  a  

centralized  loca+on  for  the  registry  of  and  access  to  data  essen+al  for  research  in  the  solid-­‐earth  and  polar  sciences.”    

Page 5: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Scope:

Field Data

Derived Data

Sensor-based Sample-based

Solid Earth Observational Data

5

Page 6: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Sensor-­‐based  (MGDS)  •  Field  data:  e.g.:  sonar  ping  files,  seismic  reflec+on  shot  data,  side-­‐scan  sonar,  photographs,  gravity  field  data,  temperature  (>70  data  types)  

•  Derived  data:  e.g.:  bathymetric  grids,  side-­‐scan  sonar  grids,  micro-­‐seismicity  catalogs,  migrated  seismic  reflec+on  profiles,  gravity  MBA  grids,  magne+za+on  grids  (>65  data  types)  

Sample-­‐based  (EarthChem)  •  Sample  metadata  profiles:  rocks,  sediments,  liquids,  soils  •  Analy+cal  lab  data:  e.g.:  major  &  trace  element  composi+ons,  isotopic  ra+os,  mineralogy,  geochronology,  age  models,  P/T  model  data,  calculated  end-­‐member  composi+ons  (>  500  measured  proper4es)  

6

IEDA Data Types

Page 7: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

7

IEDA hosts diverse data

•  Derived Geophysical Data!

•  Analytical geochemistry data"

•  Geochronological data"

•  Sample metadata"

•  Seismic Reflection Data"

•  Photos and images"

Soule et al., 2008"

Multibeam bathymetry data

Marine Geoscience Data System

Page 8: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

8

IEDA hosts diverse data

•  Derived Geophysical Data"

•  Analytical geochemistry data!

•  Geochronological data"

•  Sample metadata"

•  Seismic Reflection Data"

•  Photos and images"

Standish et al., 2008"

Major element geochemical analyses

PetDB: The Petrological Database

Page 9: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Web galleries for images, videos, maps, photos MGDS and IEDA MediaBank

9

IEDA hosts diverse data

•  Derived Geophysical Data"

•  Analytical geochemistry data"

•  Geochronological data"

•  Sample metadata"

•  Seismic Reflection Data"

•  Photos and images!Soule et al., 2008"

Page 10: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Data Holdings

 nearly  24  terabytes,  >320,000  files  in  MGDS   19  million  geochemical  values  from  36,000  publica+ons  

accessible  at  EarthChem   ca.  3.8  million  samples  registered  in  SESAR  

EarthChem Portal sample locations

10

Page 11: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Repositories  &  registries  • Marine  Geoscience  Data  System  •  EarthChem  Library  •  System  for  Earth  Sample  Registra+on  

Data  syntheses  &  products  • GMRT,  PetDB,  SedDB,  Geochron  

SoJware  tools  for  data  discovery,  access,  visualiza;on  and  analysis  

• GeoMapApp,  Virtual  Ocean,  EarthChem  Portals  to  complementary  data  held  in  other  repositories  

• ASP,  EarthChem,  USAP-­‐DCC  

11

IEDA Systems

MGDS Virtual Ocean

EarthChem TAS plots

Page 12: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

•  QA/QC, documentation •  Persistent identification (DOI) •  Long-term archiving

12

Page 13: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

•  Web-based User interfaces •  Programmatic access interfaces •  GeoMapApp, GoogleEarth, etc. •  Links to the literature

13

Page 14: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support

•  Visualization tools (GeoMapApp, Virtual Ocean, Earth Observer)

•  Syntheses & Products

14

Page 15: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Foci

Data Discovery & Access

Data Preservation & Curation

Data Analysis

Investigator Support •  Web-based data submission •  Data Management Plan tool •  Data Compliance Report tool •  Community

15

Page 16: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

IEDA Services & Architecture

16

IEDA Repository

Metadata Catalogs

datasets

Data submission

DOI registration

(datasets)

Long-term Archiving

Synthesis

Data Discovery & Access

Data Compliance Support

EarthChem MGDS SESAR

remote data

GMRT PetDB SedDB

IGSN registration

(samples)

Page 17: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Persistent  iden;fiers  help  IEDA  achieve  greater…  

 

• Accessibility:  by  naviga+ng  diverse  but  related  data  in  the  IEDA  systems  

• Reliability:  by  maintaining  links  between  IEDA  and  outside  systems  that  persist  through  +me  

• Citability:  by  enabling  proper  aaribu+on  to  research  with  long-­‐lived,  citable,  iden+fiers  

17

IEDA needs persistent identifiers

Page 18: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

18 18

What objects need to be identified? IDs  assigned  by  IEDA  

•  People    •  Samples  •  Datasets  /  Datafiles  /  Sobware  •  Cruises/Expedi+ons  

Externally  assigned  IDs,  used  in  IEDA  systems  

•  Publica+ons  •  Funding  Awards  •  Pladorms  •  Cruises  •  Organiza+on  IDs  •  Country,  State,  Language  codes  

18

Page 19: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

19 19

What identifiers are used? IDs  assigned  by  IEDA  

•  People    •  Samples  •  Datasets  /  Datafiles  /  Sobware  •  Cruises/Expedi+ons  

Externally  assigned  IDs,  used  in  IEDA  systems  

•  People  •  Publica+ons  •  Funding  Awards  •  Pladorms  •  Cruises  •  Organiza+ons  •  Country,  State,  Language  

IGSN  DOI  (DataCite)  

DOI  (Publishers)  NSF  Award  Numbers  

ICES  PlaSorm  Code  R2R  Cruise  ID  

IANA  ISO  codes  

ORCID  (coming  soon)  

19

Page 20: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

“DOI  system  provides  a  technical  and  social  infrastructure  for  the  registra;on  and  use  of  persistent  interoperable  iden;fiers  for  use  on  digital  networks.  The  DOI  system  implements  the  Handle  System  and  the  indecs  Framework.”  

20

DOI: Digital Object Identifier

www.doi.org

Page 21: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Externally  assigned  publica;on  DOIs  are  used  to  link  to  the  electronic  ar;cle  and  capture  IEDA  data  related  to  a  published  ar;cle  

Publication DOIs

21

10.1016/j.epsl.2006.09.012!

Page 22: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Linking Data & Publications

EarthChem ‘Landing Page’

22

Page 23: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

establish  easier  access  to  research  data  on  the  Internet  

increase  acceptance  of  research  data  as  legi;mate,  citable  contribu;ons  to  the  scholarly  record  

support  data  archiving  that  will  permit  results  to  be  verified  and  re-­‐purposed  for  future  study  

 

  23

Data DOI

Page 24: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Data  DOIs  are  assigned  to  digital  resources  (datasets,  technical  reports,  and  soJware)  in  IEDA  repository  

• help  ensure  proper  aaribu+on  to  the  author    • provide  open  access    • allow  versioning    •  long-­‐term  archiving  in  Columbia  University  Libraries  

24

Data DOIs 10.1594/IEDA/100041!

Page 25: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

25

EarthChem Library Data Publication

Create  dataset  (guidelines  &  data  templates  provided)  

Create  ECL  record  (enter  cataloging  

metadata)  

Upload  file  (set  release  date)  

automatic notification to ECL manager

QC  metadata  &  data  

Approve  Dataset  

Inves+gator   EarthChem  Data  Manager  

Register  Dataset  with  DOI  

(Release  dataset)  

Page 26: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

development  of  metadata  for  new  data  sets  •  extract  from  publica+ons  •  extract  from  secondary  literature  •  contact  authors  

con;nued  development  of  metadata  schemas  and  vocabularies  to  align  with  evolving  community  standards  ongoing  evalua;on  to  ensure  completeness  of  metadata  for  exis;ng  data  holdings  data  verifica;on  ensuring  that  data  files  are  readable  

26

QC/Review by Data Managers

Page 27: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Provides  persistent  unique  iden;fica;on  for  physical  samples  

• URN  type  syntax  • centralized  registra+on  via  interna+onal  governance  organiza+on  IGSN  e.V.  (DataCite  model)  

Ensure  access  to  ‘virtual  representa;ons’  of  samples  • standardized  ‘core’  metadata  profiles  (ISO19115,  GeoSciML)  • extended  metadata  profiles  at  alloca+ng  agents  (community  specific)    

27

Samples: IGSN International Geo Sample Number MGD000973!

Page 28: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

28 28

IGSN Attributes persistent  

resolvable  (via  handle  service)  

broad  applica;on  

compliant  with  interna;onal  standards  

interna;onally  governed  

does  not  replace  personal  or  ins;tu;onal  naming  protocols  

tracks  sample  geneologies  

 

 

28

Page 29: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

29

Need for Unique Sample Identifiers

The  EarthChem  Portal  shows  75  publica+ons  with  geochemical  data  

referenced  to  a  sample  with  the  name  M1  (or  M-­‐1).  

(www.earthchem.org)  

Names  of  dredge  sample  3  of  the  Amphitrite  cruise  

(PetDB  database,  www.petdb.org)  

Page 30: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

User  submi^ed  metadata  

QC  by  IGSN  Alloca;ng  Agent  

Access  via  IGSN  handle  or  UI  search  

QR  code  with  URL  

Long-­‐term  preserved  

30

IGSN Metadata Profile

Page 31: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

A Scalable IGSN Architecture

IGSN eV

SESAR Near Space Observatory

(invented example)

ExoPlanet (invented example)

ICDP Geoscience Australia USGS LDEO GFZ

Repository Analytical Lab Investigator

Metadata Clearinghouse

Allocating Agent

Registrant

IGSN Registry

31

Page 32: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Unambiguously  cite  physical  samples  (link  to  data  and  publica;ons).  Find,  link,  &  integrate  distributed  data  for  a  single  sample  Build  a  catalog  of  available  specimens,  cores,  etc.  to  find  and  access  these  objects  and  their  metadata  

32

IGSN Applications

Publica+on      doi:10.1029/2011GC003804  

Dataset              doi:10.1594/IEDA/100050  

Sample      igsn:OSU0056FT    

Page 33: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Elsevier creates a text link to http://www.geosamples.org/profile?

igsn:HRV0035F0

Researchers can link through to the sample at SESAR in one click –

more efficient

… igsn:HRV0035F0….

Author highlights/mentions IGSN of their sample in text of paper

slide courtesy of Bethan Keall, Elsevier

33

Page 34: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

GeoPass  IDs  iden;fy  users  across  mul;ple  IEDA  systems  (single  sign-­‐on)  

 

 

           

34

People – GeoPass ID

Log in allows saved content:"• "data management plans"• "database search results"• "sample metadata profiles"• "submitted content"

148!

Page 35: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

registry  of  unique  researcher  iden;fiers    transparent  method  of  linking  research  ac;vi;es  and  outputs  to  these  iden;fiers  ability  to  reach  across  disciplines,  research  sectors,  and  na;onal  boundaries  open,  non-­‐profit,  community-­‐based  effort    coopera;on  with  other  iden;fier  systems  

35

Coming Soon: ORCID IDs

Page 36: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Cruise  IDs  group  and  link  documents,  sensor  data,  sample  data,  and  informa;on  across  IEDA.    

 

36

Cruises & Expeditions

•  Cruise personnel and instruments"•  Geologic interpretation"•  Photographs"•  Bathymetry"•  Pressure and Temperature"•  Magnetic"•  Navigation"•  Seismic"•  Photographs"•  Samples"•  Fluid Geochemistry"

AT15-17!

Page 37: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

37

R2R Cruise IDs

“The  Rolling  Deck  to  Repository  (R2R)  program  aims  to  develop  comprehensive  fleet-­‐wide  management  of  underway  data  to  ensure  preserva+on  of  and  access  to  our  na+onal  oceanographic  research  data  resources.”  

Page 38: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

38

Platform IDs

Page 39: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

Award  IDs  in  the  Data  Compliance  Repor;ng  Tool  group  all  data  related  to  a  funding  award,  and  generate  a  dynamic  report  for  funding  agencies.  

39

Award numbers 0527053!

Data Compliance Report"

Page 40: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

40

Award numbers 0527053!

Data Compliance Report"

Page 41: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

41 41

Identifier challenges and future work

Challenges  

 

• Maintaining  iden+fiers  with  growing  content    needs  ac;ve  management  

•  Incorpora+ng  legacy  iden+fiers      listen  to  community  feedback  

•  Upda+ng  of  content  by  users      allow  versioning    

Page 42: Integrated Earth Data Applications: Enhancing Reliable Data Services Through the Use of Persistent Identifiers

42

IEDA identifiers in a research workflow

2. Data management plan"

1. Background research"

3. Sample management" 4. Dataset publication"

5. Article publication"

6. Funding agency report"GeoPassID!

Cruise ID!Publication DOI!

GeoPassID!

IGSN!GeoPassID!

Dataset DOI!GeoPassID!

NSF Award #!

Publication DOI!IGSN!Dataset DOI!Researchers"