Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm &...

14
Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported by National Institute on Aging Grant P01 AG18911-01A1)

Transcript of Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm &...

Page 1: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Collaborative Data Management for Longitudinal Studies

Stephen Brehm[coauthors: L. Philip Schumm & Ronald A. Thisted]

University of Chicago(Supported by National Institute on Aging Grant P01 AG18911-01A1)

Page 2: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Agenda

1. Background on Study1. Background on Study

2. Problem – Data Management Deficiencies2. Problem – Data Management Deficiencies

3. Solution – Collaborative Data Management3. Solution – Collaborative Data Management

4. STATA Programs – maketest & makedata4. STATA Programs – maketest & makedata

Page 3: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Background on Study

• NIH-funded Longitudinal Study

• Loneliness & Health• Thousands of

Measures– Loneliness– Depression

• 230 subjects• Repeated Yearly

Page 4: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Problem – Data Management Deficiencies

• Code Not Modular

…Difficult to manage the data cleaning code

…Limited code reuse from year to year …Difficult to collaborate among interns

• No Established Set of Data Cleaning Steps

…Difficult for research assistants (turn-over)

…Inconsistent data cleaning techniques

…Data cleaning code difficult to read

Page 5: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Problem – Data Management Deficiencies

ResearchAssistant

ResearchAssistant Research

Assistant

ResearchAssistant

ResearchAssistant

Core File Set

Page 6: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Solution – Collaborative Data Management

• Process– Established Steps– File System Layout– Automated Tests– Collaboration

• Concepts– Module– Batch– “Data Certification”

• STATA Programs– maketest– makedata

Page 7: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Solution – Collaborative Data Management

• Process– Established Steps– File System Layout– Automated Tests– Collaboration

• Concepts– Module Ex:loneliness– Batch– “Data Certification”

• STATA Programs– maketest– makedata

Page 8: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Solution – Collaborative Data Management

• Process– Established Steps– File System Layout– Automated Tests– Collaboration

• Concepts– Module Ex:loneliness– Batch Ex:yr1, yr2, yr3– “Data Certification”

• STATA Programs– maketest– makedata

Page 9: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Solution – Collaborative Data Management

Set of Files for Each Module

acquire-[module].do & fix-[module].dotest-[module].do

derive-[module].dolabel-[module].do

Acquire& Fix DeriveTest Label

Year-Specific 60% Code Reuse – Files Shared Between Years

Page 10: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

STATA Program – maketest

• Purpose:– Auto-generation of Data Certifying Tests

• Functionality:– Tests Variable Type– Checks Consistency of Value Labels– Verifies Existence of Variable

Page 11: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

STATA Program – maketest

• Syntax:– maketest [varlist] using, [REQuire(varlist)

append replace]

• Example:– maketest using filename.do, replace

• Options:– using: specifies file to write– REQ: requires presence of variables in list– append: add to existing test .do file– replace: overwrite existing .do file

Page 12: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

STATA Program – makedata

“Bringing it all

together”

Page 13: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

STATA Program – makedata• Syntax:

– makedata [namelist], Pattern(string) [replace clear Noisily Batch(namelist) TESTonly]

• Example:– makedata ats, p("acquire-*.do") b(yr1) clear

replace

• Options:– p: pattern – file naming convention– replace: overwrite existing data file– clear: clear current data in memory– Noisily: full output (default = summary)– b: batch – year, wave, center– TESTonly: only run tests step

Page 14: Collaborative Data Management for Longitudinal Studies Stephen Brehm [coauthors: L. Philip Schumm & Ronald A. Thisted] University of Chicago (Supported.

Other Applications• Beyond Longitudinal Data• Teaching Data Cleaning with STATA

• Contact Information– Stephen Brehm:

[email protected]– L. Philip Schumm:

[email protected]– Ronald A. Thisted:

[email protected]

• Supported by National Institute on AgingGrant P01 AG18911-01A1