Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm &...
-
Upload
laureen-craig -
Category
Documents
-
view
214 -
download
0
Transcript of Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm &...
Collaborative Data Management for Longitudinal Studies
Stephen Brehm[coauthors: L. Philip Schumm & Ronald A. Thisted]
University of Chicago(Supported by National Institute on Aging Grant P01 AG18911-01A1)
Agenda
1. Background on Study1. Background on Study
2. Problem – Data Management Deficiencies2. Problem – Data Management Deficiencies
3. Solution – Collaborative Data Management3. Solution – Collaborative Data Management
4. STATA Programs – maketest & makedata4. STATA Programs – maketest & makedata
Background on Study
• NIH-funded Longitudinal Study
• Loneliness & Health• Thousands of
Measures– Loneliness– Depression
• 230 subjects• Repeated Yearly
Problem – Data Management Deficiencies
• Code Not Modular
…Difficult to manage the data cleaning code
…Limited code reuse from year to year …Difficult to collaborate among interns
• No Established Set of Data Cleaning Steps
…Difficult for research assistants (turn-over)
…Inconsistent data cleaning techniques
…Data cleaning code difficult to read
Problem – Data Management Deficiencies
ResearchAssistant
ResearchAssistant Research
Assistant
ResearchAssistant
ResearchAssistant
Core File Set
Solution – Collaborative Data Management
• Process– Established Steps– File System Layout– Automated Tests– Collaboration
• Concepts– Module– Batch– “Data Certification”
• STATA Programs– maketest– makedata
Solution – Collaborative Data Management
• Process– Established Steps– File System Layout– Automated Tests– Collaboration
• Concepts– Module Ex:loneliness– Batch– “Data Certification”
• STATA Programs– maketest– makedata
Solution – Collaborative Data Management
• Process– Established Steps– File System Layout– Automated Tests– Collaboration
• Concepts– Module Ex:loneliness– Batch Ex:yr1, yr2, yr3– “Data Certification”
• STATA Programs– maketest– makedata
Solution – Collaborative Data Management
Set of Files for Each Module
acquire-[module].do & fix-[module].dotest-[module].do
derive-[module].dolabel-[module].do
Acquire& Fix DeriveTest Label
Year-Specific 60% Code Reuse – Files Shared Between Years
STATA Program – maketest
• Purpose:– Auto-generation of Data Certifying Tests
• Functionality:– Tests Variable Type– Checks Consistency of Value Labels– Verifies Existence of Variable
STATA Program – maketest
• Syntax:– maketest [varlist] using, [REQuire(varlist)
append replace]
• Example:– maketest using filename.do, replace
• Options:– using: specifies file to write– REQ: requires presence of variables in list– append: add to existing test .do file– replace: overwrite existing .do file
STATA Program – makedata
“Bringing it all
together”
STATA Program – makedata• Syntax:
– makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly]
• Example:– makedata ats, p("acquire-*.do") b(yr1) clear
replace
• Options:– p: pattern – file naming convention– replace: overwrite existing data file– clear: clear current data in memory– Noisily: full output (default = summary)– b: batch – year, wave, center– TESTonly: only run tests step
Other Applications• Beyond Longitudinal Data• Teaching Data Cleaning with STATA
• Contact Information– Stephen Brehm:
[email protected]– L. Philip Schumm:
[email protected]– Ronald A. Thisted:
• Supported by National Institute on AgingGrant P01 AG18911-01A1