Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National...

39
Data Management and Data Management and Representations in Ecce and Representations in Ecce and CMCS CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory Molecular Science Software Group

Transcript of Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National...

Page 1: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

Data Management and Data Management and Representations in Ecce and CMCSRepresentations in Ecce and CMCS

Data Management and Data Management and Representations in Ecce and CMCSRepresentations in Ecce and CMCS

Theresa L. WindusPacific Northwest National Laboratory

Environmental Molecular Sciences LaboratoryMolecular Science Software Group

Page 2: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

2

OutlineOutlineOutlineOutline

Some “definitions”Data and task representations Ecce CMCS

SummaryAcknowledgement

Page 3: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

3

DataData and and metadatametadata(one scientist’s data is another scientist’s metadata)(one scientist’s data is another scientist’s metadata)

DataData and and metadatametadata(one scientist’s data is another scientist’s metadata)(one scientist’s data is another scientist’s metadata)

CH3OOHH°atomiz ( ) = 522.09 2.02± kcal/mol

: value and uncertainty dataunits: kcal/molquantity: enthalpy of atomization

species: methylhydroperoxide, CAS# 3031-73-0

temperature: 0 K

0

calculated: G3//B3LYPcreator: T. Windus using Eccemore info: http://avatar.emsl.pnl.gov:8080/Ecce/.../CH3OOH/.../GxEnergy

[calculated, G3//B3LYP, T. Windus, more at http://...]

Page 4: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

4

Metadata Converts Scientific Data into Metadata Converts Scientific Data into KnowledgeKnowledge

Metadata Converts Scientific Data into Metadata Converts Scientific Data into KnowledgeKnowledge

Metadata provides identification and documentation to scientific data. Example: Attaching an owner, creation date, abstract, type to data. Example: Tracking data to program versions, and possibly bugs for that version.

Metadata documents the context and value of the data. Example: The theoretical atomization energy of methylhydroperoxide (and its uncertainty) from

Ecce (used as input to ATcT) contains information identifying the species and the quantity, units, the theoretical method used, vibrational frequencies and geometry, reference to source file, creator, etc.

Metadata facilitates cross-scale transfer of data. Example: Can show a chain of inputs, including input parameters and configuration

files, across scales. Example: Can retrieve literature references which describe this data.

Metadata allows users to comment on the data and its quality. Example: Can be used for scientific peer review of data.

Metadata is necessary for effective collaboration. Example: Scientific data becomes more usable to others when it is documented.

Annotation is another term for metadata. Annotations can be added by either the data owner or a third party.

Page 5: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

5

Data Pedigree: A Special Kind of MetadataData Pedigree: A Special Kind of MetadataData Pedigree: A Special Kind of MetadataData Pedigree: A Special Kind of Metadata

Data pedigree or data provenance is a relationship which provides a “line of ancestors”.

Pedigree allows for the categorization and tracing of the scientific data, and for the identification of the data’s ultimate origin, possibly across scales.

Pedigree includes the series of steps necessary to reproduce the data.

Data is linked, for example, to projects, references, inputs, and outputs.

Page 6: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

6

Knowledge GridKnowledge GridKnowledge GridKnowledge Grid

A set of scalable tools, middleware, and services

For the creation, analysis, dissemination, evaluation, and use

Of data, information, and knowledge

By individuals, groups, and communities

…A digital place for performing ‘all’ aspects of science

Page 7: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

7

Ecce & NWChemEcce & NWChemEcce & NWChemEcce & NWChemEcce – Extensible Computational Chemistry Environment

comprehensive problem solving environment

common graphical user interfaces scientific modeling management seamless transfer of information between

applications persistent data storage through DAV integrated scientific data management tools for ensuring efficient use of

computing resources across a distributed network

visualization of multi-dimensional data structures

http://ecce.emsl.pnl.gov

NWChem – massively parallel computational chemistry program

Energetics, geometries, frequencies, etc. at various levels of theory

http://www.emsl.pnl.gov/docs/nwchem

Page 8: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

8

Ecce is… (cont.)Ecce is… (cont.)Ecce is… (cont.)Ecce is… (cont.)

Page 9: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

9

Ecce ArchitectureEcce ArchitectureEcce ArchitectureEcce Architecture

Page 10: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

10

Distributed Authoring and Versioning (DAV)Distributed Authoring and Versioning (DAV)Distributed Authoring and Versioning (DAV)Distributed Authoring and Versioning (DAV)

An early web service (XML commands over HTTP)A widely adopted standard for metadata/data transport

Put/Get data with arbitrary properties (dynamic)Properties can be discovered and accessed independentlyDASL, Versioning, Transactions, …

Page 11: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

11

What does the WebDAV protocol provide?What does the WebDAV protocol provide? What does the WebDAV protocol provide?What does the WebDAV protocol provide?

C ollection

C ollection

R esource R esource C ollection

R esource

P ropertiesP roperties

P roperties

W ebD A V

H TTP

D A V S erver

A pp lica tionsD ata

S torageP rovider

Page 12: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

12

Accessing WebDAV Server from Windows 2000Accessing WebDAV Server from Windows 2000Accessing WebDAV Server from Windows 2000Accessing WebDAV Server from Windows 2000

Page 13: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

13

Accessing WebDAV Server Using BrowserAccessing WebDAV Server Using BrowserAccessing WebDAV Server Using BrowserAccessing WebDAV Server Using Browser

Page 14: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

14

Accessing WebDAV Server Using EcceAccessing WebDAV Server Using EcceAccessing WebDAV Server Using EcceAccessing WebDAV Server Using Ecce

Calculation

PropertiesFiles

BasisSetChem icalSystem

Page 15: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

15

Ecce Physical ModelEcce Physical ModelEcce Physical ModelEcce Physical Model

contains

contains

is composed of

Project

Calculation Project

PropertiesFiles

BasisSetChem icalSystem

Project

Calculation

Setup Data/LogsPropertiesChem ical SystemBasis Set

Calculations are referred to as a “virtual document” because we distribute the structure across many physical objects.

Physical collections and resources are URI addressable.

Collections are unordered and allow mixed content.

Page 16: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

16

Calculation SetupCalculation SetupCalculation SetupCalculation Setup

CalculationEditor

Builder Basis SetTool

.edml File

TheoryDetails

RuntypeDetails

Parameters

Geometry

ESP

Basis Set

ai.input

Input Deck

Python

Perl

TemplateFile

Basis SetReformatting

Script

Perl

Page 17: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

17

Output ParsingOutput ParsingOutput ParsingOutput Parsing

Output

Job MonitorJob Monitor

ParseDescriptor

Text Block 1

Text Block 2

Text Block N

.

.

.

Parse Script 1

Parse Script 2

Parse Script N

.

.

.

EcceDataBase

EcceDataBase

CalculationViewer

CalculationViewer

Perl

Page 18: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

18

Example metadataExample metadataExample metadataExample metadata

On the calculation: http://www.emsl.pnl.gov/ecce:contenttype=ecceCalculationhttp://www.emsl.pnl.gov/ecce:resourcetype=VIRTUAL_DOCUMENThttp://www.emsl.pnl.gov/ecce:createdWith=v3.2http://www.emsl.pnl.gov/ecce:owner=d39974http://www.emsl.pnl.gov/ecce:application=NWChemhttp://www.emsl.pnl.gov/ecce:theory=SCF/RHFhttp://www.emsl.pnl.gov/ecce:spinmultiplicity=Singlethttp://www.emsl.pnl.gov/ecce:currentVersion=v3.2http://www.emsl.pnl.gov/ecce:creationdate=Mon, 22 Mar 2004 17:24:00 GMThttp://www.emsl.pnl.gov/ecce:reviewed=falsehttp://www.emsl.pnl.gov/ecce:runtype=ESPhttp://www.emsl.pnl.gov/ecce:launch_machine=aruntahttp://www.emsl.pnl.gov/ecce:launch_nodes=1http://www.emsl.pnl.gov/ecce:launch_rundir=/home/d39974/eccerunshttp://www.emsl.pnl.gov/ecce:launch_totalprocs=1http://www.emsl.pnl.gov/ecce:launch_user=d39974http://www.emsl.pnl.gov/ecce:launch_maxmemory=0http://www.emsl.pnl.gov/ecce:launch_remoteShell=sshhttp://www.emsl.pnl.gov/ecce:job_jobid=13858http://www.emsl.pnl.gov/ecce:job_path=/home/d39974/ecceruns/tracebug/esphttp://www.emsl.pnl.gov/ecce:job_clienthost=aruntahttp://www.emsl.pnl.gov/ecce:startdate=Mon, 22 Mar 2004 17:25:11 GMThttp://www.emsl.pnl.gov/ecce:version=Thu May  8 13:16:51 PDT 2003 Version 4.5http://www.emsl.pnl.gov/ecce:state=Completehttp://www.emsl.pnl.gov/ecce:completiondate=Mon, 22 Mar 2004 17:25:14 GMTDAV:resourcetype=<D:collection/>DAV:creationdate=2004-03-22T17:24:38ZDAV:getlastmodified=Mon, 22 Mar 2004 17:24:38 GMTDAV:getetag="b2805d-1000-926a8180“DAV:supportedlock=DAV:getcontenttype=httpd/unix-directory

On the molecule:http://www.emsl.pnl.gov/ecce:empiricalFormula=H4Chttp://www.emsl.pnl.gov/ecce:charge=0.000000http://www.emsl.pnl.gov/ecce:useSymmetry=falsehttp://www.emsl.pnl.gov/ecce:symmetrygroup=C1DAV:creationdate=2004-03-22T17:24:38ZDAV:getcontentlength=386DAV:getlastmodified=Mon, 22 Mar 2004 17:24:38 GMTDAV:getetag="b28064-182-926a8180“DAV:executable=FDAV:supportedlock=DAV:getcontenttype=chemical/x-ecce-mvm

Page 19: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

19

Example MVM fileExample MVM fileExample MVM fileExample MVM file

title: demotype: moleculenum_atoms: 1065atom_info: symbol cartatom_list: O -2.37400 -3.09100 13.5210H -1.91600 -2.20200 14.0480...pdb_list: H O5* RC 1 157D AH H5T RC 1 157D A…attr_list:-0.622300 1 1 0 0 0.429500 1 1 0 0…

atom_type_list:OH HO …num_bonds: 1028bond_list: 2 1 1.000001 3 1.00000…

Page 20: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

20

XML format for PropertiesXML format for PropertiesXML format for PropertiesXML format for Properties<?xml version="1.0" encoding="utf-8" ?><value name="CPUSEC" units="second">9.60000000000000e-01</value>

<?xml version="1.0" encoding="utf-8" ?><vector name="MLKNSHELL" rows="7" units="e" rowLabel="Unknown" rowLabels="1 2 3 4 5 6 7">1.99199825923126e+00 1.18803456337004e+00 3.08260463820159e+00 9.34340637068915e-019.34340635555820e-01 9.34340634042729e-01 9.34340632529639e-01</vector>

<?xml version="1.0" encoding="utf-8" ?><tsvectable name="GEOMTRACE" rows="5" units="Angstrom" columns="3" vectors="1" rowLabel="Atom,Coordinate" rowLabels="0 1 2 3 4" columnLabel="Coordinate" vectorLabel="Coordinate" columnLabels="X Y Z"><step number="1">0.000000000000000e+00 0.000000000000000e+00 0.000000000000000e+00 -6.755000000000000e-01-6.755000000000000e-01 6.755000000000000e-01 6.755000000000000e-01 6.755000000000000e-016.755000000000000e-01 6.755000000000000e-01 -6.755000000000000e-01 -6.755000000000000e-01-6.755000000000000e-01 6.755000000000000e-01 -6.755000000000000e-01</step><step number="2">6.767628142309400e-15 -6.950100046595310e-09 1.390021315920880e-08 -6.239857395114590e-01-6.239857464615680e-01 6.239857534116811e-01 6.239857568867110e-01 6.239857499366001e-016.239857707869190e-01 6.239857742619920e-01 -6.239857812120860e-01 -6.239857603617700e-01-6.239857916372510e-01 6.239857846871540e-01 -6.239857777370440e-01</step><step number="3">6.549446678833860e-15 1.124467050187860e-09 -2.248938851918010e-09 -6.252750669032320e-01-6.252750631744280e-01 6.252750594456050e-01 6.252750588833910e-01 6.252750626121890e-016.252750514257610e-01 6.252750508635410e-01 -6.252750471347340e-01 -6.252750583211300e-01-6.252750428437061e-01 6.252750465725070e-01 -6.252750503012980e-01</step></tsvectable>

Page 21: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

21

Vinoxy

6-31G*

NWChemInput File

B3LYP

Optimization and Frequencies

Input Parameters

NWChem Output File

B3LYP

PropertiesProperties

Vibrational ModeAnimated GIF

GaussianInput

Gaussian Output

QCISD

PropertiesProperties

Vinoxy

6-31G*

QCISD(T,FC)

Energy

Input Parameters

NWChemInput

NWChem Output

Vinoxy

G3MP2large

MP2(FC)

Energy

Input Parameters

PropertiesProperties

MP2

G3(MP2)B3LYP Hf Vinoxy NASA File

Crossing the Molecular to Crossing the Molecular to Thermodynamic Scales Data ModelThermodynamic Scales Data Model

NWChem

Ecce

CMCS

Active Tables

Pedigree - hasInput

Pedigree - hasOutput

Gaussian

Legend

Pedigree is imperative to moving data across scales.

Page 22: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

22

Ecce publishingEcce publishingEcce publishingEcce publishing

Page 23: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

23

The Multi-scale ChallengeThe Multi-scale Challengefor Chemical Sciencefor Chemical Science

The Multi-scale ChallengeThe Multi-scale Challengefor Chemical Sciencefor Chemical Science

Impact of chemical science relies upon flow of information across physical scales

Data from smaller scales supports models at larger scales

Critical science lies at scale interfaces Molecular properties, transport Mechanism validation, reduction Chemistry – fluid interactions

The pedigree of information matters The propagation of data pedigree across

scales is difficult Validation and data reliability is often a

post-publication process

Multi-scale science faces barriers Normal publication route is slow Numerous sub-disciplines employ different

applications, formats, models Centers of excellence are geographically

distributed

Page 24: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

24

Multi-scale Chemical Science DataMulti-scale Chemical Science DataMulti-scale Chemical Science DataMulti-scale Chemical Science Data

Unique terascale reacting flow simulation databases – collection of files @ N x t, and experimental data

Chemical Mechanisms – k, MB files in various formats containing collections of reaction rates and transport coefficients. Modeled using theory, validated against experiments

Kinetic rates – by measurement and computation. Tables collected, reviewed and annotated. NIST WebBook, publications

Thermo-Chemistry- Tables of ‘constant’ properties of all molecules (of interest w/data) derived from many experiments, computations, extrapolations

Quantum chemistry computations of molecular properties – data from one number to large potential energy surfaces - input to thermo-chemistry and reaction rate computations

Page 25: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

25

CMCS Spans Scales & CMCS Spans Scales & GeographyGeography

Biggest barrier is “language” and informatics

Page 26: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

26

Adaptive Informatics InfrastructureAdaptive Informatics InfrastructureAdaptive Informatics InfrastructureAdaptive Informatics Infrastructure

Infrastructure – a well designed, scalable, reusable, flexible set of tools, middleware, and services

Informatics – the emerging use of semi-automated means to derive new knowledge from the analysis of (large amounts of) heterogeneous data, annotating existing data with its newly discovered meaning

Adaptive – able to dynamically change to incorporate new knowledge and support new activities Low Barriers

Many access points Storage of data in original formats with dynamic metadata extraction and translation

Powerful Arbitrary formats (binary, ASCII, XML) Integrated data, metadata, pedigree across internal and external tools

Evolvable Schema can be changed/extended as needed Metadata, translations, viewers, portal, etc. can be dynamically configured

Page 27: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

27

CMCS Technical Choices Enable Adaptive, Long-CMCS Technical Choices Enable Adaptive, Long-lived Infrastructurelived Infrastructure

CMCS Technical Choices Enable Adaptive, Long-CMCS Technical Choices Enable Adaptive, Long-lived Infrastructurelived Infrastructure

CMCS Data/Metadata services SAM Translation, Annotation WebDAV implementation Notification (JMS, NED) Search Pedigree browsing Core XML schema Security (JAAS)

Chemical Science Portal Jetspeed (CHEF) CMCS Explorer Application portlets Community services

Application Integration Webservices WebDAV API Multi-scale data including NIST

access

ChemicalMechanisms

ReactingFlow

Local Services/Grid FabricStorage Security Event Services Directory Services

QuantumChemistry

Kineticist

Thermo-Chemistry

Kinetics

Shared Data Service

XML

Data Set

Annotation

Binary

Data Set

Scientific Annotation MiddlewareParsers Translators Annotators WebDAV

Annotation

XML

Data Set

Annotation

Text

Data Set

Multi-scale Chemical Science Portal

CommunityTools

KnowledgeManagement

Tools

Research SupportTools

Thermochemist

ChemistryApplications

A diagram representing the major conceptual elements of the CMCS Informatics Infrastructure.

Page 28: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

28

How Metadata is Populated in CMCSHow Metadata is Populated in CMCSHow Metadata is Populated in CMCSHow Metadata is Populated in CMCS

SAM Metadata Services Layer When data is put into WebDAV, SAM causes XSLTs to be executed to extract

metadata from XML files, based on MIME type. Similarly, Binary File Descriptor (BFD) provides an interface to extract

metadata from binary files. Other translators can be used as well.

CMCS data management/pedigree API to facilitate insertion and modification of metadata, in the proper XML format. Java code which allows software developers and scientists to easily write

programs to add/edit metadata. Scientists can use these APIs to integrate with existing or new chemical

science applications. Uses open source DAV and XML libraries.

Any WebDAV client application DAVExplorer: Java application CMCSExplorer: Integrated in the CMCS portal

Page 29: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

29

CMCS Metadata, Annotations, and PedigreeCMCS Metadata, Annotations, and PedigreeCMCS Metadata, Annotations, and PedigreeCMCS Metadata, Annotations, and Pedigree

Using Dublin Core for some basic pedigree properties of electronic publication: creator, dates, publisher, is-referenced-by, references, etc. Digital library standard for metadata http://www.dublincore.org

CMCS properties for Chemical Science to enable searching: species name, CAS, chemical properties, and chemical formula.

CMCS properties for defining scientific data: inputs, outputs, and is-part-of-project.

CMCS properties for scientific publication and peer review annotations: is-sanctioned-by.

Currently defined more than 35 elements in the core CMCS pedigree.

Flexible infrastructure for addition of new metadata. As new metadata is added to infrastructure,current apps will not break!

CMCS metadata is strongly encouraged, though not required, for all CMCS data, and CMCS metadata is highly extensible.

Page 30: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

30

Pedigree Browser Shows Input and Output Pedigree Browser Shows Input and Output RelationshipsRelationships

Pedigree Browser Shows Input and Output Pedigree Browser Shows Input and Output RelationshipsRelationships

Page 31: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

31

Pedigree BrowsingPedigree BrowsingPedigree BrowsingPedigree Browsing

The Browser enables metadata editing.

Data is linked to projects, references, inputs, and outputs

Page 32: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

32

Automatic Translation and Automatic Translation and Metadata ExtractionMetadata Extraction

Automatic Translation and Automatic Translation and Metadata ExtractionMetadata Extraction

Data translations provided automatically by SAM using previously registered XSLT’s for this file type.

Page 33: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

33

Adaptive Infrastructure Enables Adaptive Infrastructure Enables Application IntegrationApplication Integration

Adaptive Infrastructure Enables Adaptive Infrastructure Enables Application IntegrationApplication Integration

MCS Portal

Shared Data Repository

Grid Fabric

SAMWeb service

Active Table

CMCS/DAV

API

Notification

API

Browser, e-mail

Portlet

APIPortlet

API

NotificationWeb service

FitdatCMCS/DAV

API

Notification

API

DA

V

DA

V+S

AM

NS

ELN 5.0Ecce

NWChem/GRID RESOURCES

launch

REACTIONLAB

SAMMime-type Assignment

Metadata ExtractionTranslation

Pedigree Relationships

NIST KineticsDB

Federation ML

Browser,e-mail

Page 34: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

34

Initial “Automatic Reasoning” CapabilityInitial “Automatic Reasoning” CapabilityInitial “Automatic Reasoning” CapabilityInitial “Automatic Reasoning” Capability

Page 35: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

35

SummarySummarySummarySummary

Users just want to have ease of use and flexibility in viewing output – adaptive informatics infrastructure

“Standards” are useful, but it is necessary to be able to translate between diverse “schema” and “ontologies”

Metadata converts scientific data into knowledge

Page 36: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

36

Multi-disciplinary Ecce Development TeamMulti-disciplinary Ecce Development TeamMulti-disciplinary Ecce Development TeamMulti-disciplinary Ecce Development Team

Gary Black -- Project leadKaren Schuchardt -- Software architect leadBruce Palmer -- Chemist architectTodd Elsethagen -- Data management leadErich Vorpagel – Chemist consultantMichael Peterson -- Operations supportMahin Hackler -- Operations supportSue Havre -- Application developmentBrett Didier -- Application developmentCarina Lansing -- Application developmentSteve Matsumoto -- Online help leadColleen Winters -- Online helpDoug Rice -- Online help

Page 37: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

37

Multi-disciplinary CMCS TeamMulti-disciplinary CMCS TeamMulti-disciplinary CMCS TeamMulti-disciplinary CMCS TeamChemical Science Computer/Information Science

Larry Rahn*, SNL

Sandra Bittner, ANL

Brett Didier, PNLKaren Schuchardt, PNL

James D. Myers, PNL

Theresa Windus*, PNL

Renata McCoy, SNLMichael Lee, SNL

David Leahy, SNL

Carmen Pancerella, SNLChristine Yang, SNL

Reinhardt Pinzon, ANL

Gregor von Laszewski, ANL

Michael Minkoff, ANL

Branko Ruscic, ANL

Al Wagner*, ANL

Carina Lansing, PNLEric Stephan, PNL

David Montoya*, LANL Lili Xu, LANLYen-Ling Ho, LANL

Thomas C. Allison*, NIST

William H. Green, Jr. *, MIT

William Pitz*, LLNL

Baoshan Wang, ANL

Kaizar Amin, ANLSandeep Nijsure, ANL

Michael Frenklach*, UCB

SAM

National Collaboratory Program

Wendy Koegler, SNLJohn Hewson, SNL

Ed Walsh, SNL

Elena Mendoza, PNL

Page 38: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

38

This research was performed in part using the Molecular Science Computing Facility (MSCF) in the William R. Wiley Environmental Laboratory at the Pacific Northwest National Laboratory (PNNL). The MSCF is funded by the Office of Biological and Environmental Research in the U. S. Department of Energy (DOE). PNNL is operated by Battelle for the U. S. Department of Energy under contract DE-AC06-76RLO 1830. Funding is also provided by the Mathematics, Information and Computer Science and Basic Energy Sciences Division of DOE.

AcknowledgementsAcknowledgements

Page 39: Data Management and Representations in Ecce and CMCS Theresa L. Windus Pacific Northwest National Laboratory Environmental Molecular Sciences Laboratory.

39

EndEndEndEnd