Post on 19-Dec-2015
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 1
The power of bioinformatics tools in
cancer researchEarly Detection Research Network, JPL
Mentors: Dr. Chris Mattmann, Andrew Hart
Andrew ClarkSouthern California Bioinformatics
Summer Institute, 2009
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 2
Agenda Introduction
Biomarkers and cancer research
Early Detection Research Network (EDRN)
The NCI & JPL EDRN Infrastructure
Project objective eCAS Curator additions
The eCAS Catalog and Archive
Service Data curation
Architectural & design considerations Software engineering Meta-data processing
Results & conclusions
Acknowledgements
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 3
Introduction
Biomarkers and cancer researchConstant research is underway to
discover and identify reliable biomarkers of cancer in the human body.
What is a biomarker?“A biological molecule found in blood, other
body fluids, or tissues that is a sign of a normal or abnormal process, or of a condition or disease.”
source: http://www.cancer.gov/dictionary/?searchTxt=biomarker
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 4
Biomarker research
The more information that is collected and shared between research sites and medical laboratories:The more effective diagnosis will become. The more specialized treatments can be
devised to minimize the devastating effects of cancer on its host.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 5
The Early Detection Research Network
The NCI is concerned with managing biomarker research data and disseminating information to the public.
Formed the EDRN in 1999 “to provide up-to-date information on
biomarker research” to the scientific and medical communities and to the general public.
source: http://edrn.nci.nih.gov/about-edrn
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 6
The Jet Propulsion Laboratory
FFRDC, operated by Cal-Tech, for NASAJPL’s technology for cataloging and
managing extremely large sets of data provided the underlying infrastructure needed by the EDRN to accomplish its own mission.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 7
The EDRN Infrastructure
My mentors, Dr. Chris Mattmann and Andrew Hart, and their team continue ongoing development of the underlying software grid.
JPL software engineers work with bioinformatics experts to develop the public interface to the EDRN, a web-based portal available to the general public:
http://edrn.nci.nih.gov
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 8
Project objective
Overall:To participate as a bioinformatics
software engineer at JPL. To contribute to the EDRN software
infrastructure.Specifically:
Improve the functionality of the eCAS Curator.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 9
EDRN Catalog and Archive Service
JPL software customized for cataloging and archiving biomarker data, including specimen details, specimen images and related information.
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 10
eCAS data curationData ingested from research sites
undergoes a curation phase before its publication to the public portal.
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 11
eCAS CuratorThe curation activities would benefit
from additional software tools as part of the overall eCAS workflow.
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 12
Architectural & design considerations
Software engineering:EDRN tools are primarily web applicationsDesign and integrate modular
components Meta-data management:
Meta-data: information that describes the content of other information.
Meta-data management is crucial to the data curation and the operation of the EDRN system.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 13
Data curation with eCAS
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Internal EDRN policy files contain meta-data definitions and configuration details that describe the dataset expected from each research site.
1
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 14
Data curation with eCAS
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Curators edit and revise dataset meta-data to make the final product records complete and accurate.
2
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 15
Data curation with eCAS
A
B
C
EDRNStagingServer
EDRNPublicPortal
WWW
Research data
2. Curation-Meta-data edits-Pub. survey & cross reference-Expert review
1. Data Ingestion 3. Product Release
Pre-release data
Released dataxmlDataset meta-data
Curator
Accepted data made available through web portal. Meta-data definitions provide searchable fields and descriptions of dataset contents to portal users.
3
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 16
. . .
A dataset policy file
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 17
Dataset meta-data configuration
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 18
Curator tool
Browser based meta-data editor.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 19
Curator tool
Selecting datasets formetadata editing
Metadata items retrieved from backend.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 20
Results and conclusions
Final resultMeta-data management tool
integrated with the eCAS and curation functionality incorporated into the workflow.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 21
Conclusion
The goal of software engineering in bioinformatics should be to:support scientists’ activities facilitate better research and
collaborationsimplify/bring clarity to complex tasks
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 22
Conclusion
The combined effectiveness of software tools and expert curation make the EDRN a more powerful
scientific resource that helps drive progress in biomarker research.
Aug. 20, 2009 EDRN @ JPL, SoCalBSI '09 23
Acknowledgements
Thanks to my mentors and supporters at JPL: Chris Mattmann, Andrew Hart
Thanks to the SoCalBSI faculty and staff: Dr. Momand, Drs. Johnston, Dr. Sharp, Dr. Warter-
Perez, Ronnie Cheng
Thanks to the SoCalBSI funding sources: The National Science Foundation The National Institutes of Health Economic and Workforce Development