HRP 223 – 2008
description
Transcript of HRP 223 – 2008
![Page 1: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/1.jpg)
HRP223 2008
Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
HRP 223 – 2008
Topic 3 – Manipulating Data with SQL and EG
![Page 2: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/2.jpg)
HRP223 2008
Planning an Analysis
Think before you code. First write out all the steps as notes in an EG
project. Connect the nodes to the tasks and you have a
self documented program.
![Page 3: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/3.jpg)
HRP223 2008
BMI and Diabetes
![Page 4: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/4.jpg)
HRP223 2008
You want to have well labeled nodes in the flowchart.
Be sure to label your library icon with the same name as the library.
![Page 5: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/5.jpg)
HRP223 2008
Spaghetti and Spiderwebs
As I work on a project, I have the options set to show me the logs (and sometimes the generated code). Once I check the log I delete the log node from the flow.
![Page 6: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/6.jpg)
HRP223 2008
Be sure to check the log after you submit your code. If it has only notes saying data or views were created and the number of seconds each task took, then delete the log node and move on to the next step.
![Page 7: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/7.jpg)
HRP223 2008
BMI Analysis (take 2)
Begin with the notes and do each task. Check for a relationship between two
continuous variables.– Do a scatter plot.– Look at Karl Pearson’s correlation coefficient.• Why do some people report R and others use R-
squared?
![Page 8: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/8.jpg)
HRP223 2008
A better solution would be to drop this bad record early on.
Be careful of complete case analyses.
If you have lots of variables, you do analyses on very different subsets of subjects.
![Page 9: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/9.jpg)
HRP223 2008
![Page 10: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/10.jpg)
HRP223 2008
![Page 11: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/11.jpg)
HRP223 2008
Assignment 2
If you import the data using the import wizard, it looks like the two files are identical.
If you use import code, fix your registry, and specify mixed data in each column, you will notice that both columns come in as character variables!
Even if you spend a ridiculous amount of time looking (10-15 minutes), you probably will not spot the problems.
![Page 12: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/12.jpg)
HRP223 2008
Compare Two Files
Wouldn’t it be handy if SAS could compare two files and tell you what cells have actual character data?
If you import the data ignoring the possibility of characters, you will get data sets where the variables are numeric and the “cells” that originally had character data are set to missing.
The MIXED line is missing.
![Page 13: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/13.jpg)
HRP223 2008
Two Data Sets
Labs has two character variables. LabsNotMixed has two numeric variables (one
is formatted so it looks like a date).
![Page 14: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/14.jpg)
HRP223 2008
Manual or Automatic Change
You can manually change the format in this dataset or have it automatically make the change in a new dataset. If you make the manual change, the next time you import the data the format will reset to DDMMMYYYY.
![Page 15: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/15.jpg)
HRP223 2008
Reformatted
The formats are still not identical because the Excel file has no leading zeros when the month or day is less than 10. You need to make the same format in Excel….
![Page 16: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/16.jpg)
HRP223 2008
Custom Excel Formats
You can write your own custom formats in Excel. Tell it to make a format that shows two digits for month and date, and four digits for year.
![Page 17: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/17.jpg)
HRP223 2008
Same Dates (sort of)
Now the two dates display the same way except one is stored as a character variable and the other is numeric.
![Page 18: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/18.jpg)
HRP223 2008
Compare?
There is a great comparison procedure in SAS but it wants the variables to be of the same type.
![Page 19: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/19.jpg)
HRP223 2008
![Page 20: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/20.jpg)
HRP223 2008
Try to compare the two files.
After it imports, the numeric version is missing data. Convert it to be a character variable and then compare the two variables.
![Page 21: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/21.jpg)
HRP223 2008
The put Function Recall functions in SAS take 1 or more
arguments inside parentheses and they return a value. – theSin = sin(1)
There is a function called put that takes a numeric variable and returns a character string.
![Page 22: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/22.jpg)
HRP223 2008
put( variableName, format. )
![Page 23: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/23.jpg)
HRP223 2008
Numeric and Character
![Page 24: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/24.jpg)
HRP223 2008
Same Voodoo on Dates
Here we want to put the date as a character string after applying the MM/DD/YYYY format.
![Page 25: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/25.jpg)
HRP223 2008
Converting this Numeric to Character
In the case of the assignment 2 data we don’t want to use the best. format because we have to think about the number of decimal places. The value is displayed in Excel and imported as a character string rounded to three decimal places.
We also want to compress out any blank spaces:
![Page 26: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/26.jpg)
HRP223 2008
The Differences Excel and SAS use
different algorithms for rounding….
You can also spot the bad date plus the lab value that is below the detectable limit of the assay.
![Page 27: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/27.jpg)
HRP223 2008
Dealing with Problems
Say you input a numeric variable and it has some bad values in it. This causes it to import as a character variable if you are paying attention (using mixed). You could use the input function and convert it to numeric. This, if not done carefully, will result in missing values.– newNumeric = input(oldCharacter, 8.)
![Page 28: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/28.jpg)
HRP223 2008
Not Many Values If your variable has only a few values you can use the Describe
> One Way Frequency menu to see all the values, then change the scores with the Filter and Query tool.
![Page 29: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/29.jpg)
HRP223 2008
Filter and Query Recode
You can recode a problematic variable so that it will become a new numeric variable with the bad values recoded to different types of null.
![Page 30: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/30.jpg)
HRP223 2008
Recode
![Page 31: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/31.jpg)
HRP223 2008
After
Once the data has been recoded you can calculate statistics on it.
Be sure to add in a format so the values will display nicely.
![Page 32: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/32.jpg)
HRP223 2008
Distinct Records
There will be times when you want to get the distinct IDs (or a distinct set of variables) from a dataset. Just check on the select distinct rows checkbox.
![Page 33: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/33.jpg)
HRP223 2008
Dates
Rather than dealing with the headaches of dates, a lot of databases split the dates into three variables: month, day and year. If the dates are split, it allows you store incomplete dates. You can use the MDY function to combine the 3 parts into a date variable.
theDate = mdy(theMonth, theDay, theYear)
![Page 34: HRP 223 – 2008](https://reader035.fdocument.pub/reader035/viewer/2022062410/568157d6550346895dc55c17/html5/thumbnails/34.jpg)
HRP223 2008
Remember a Display Format
SAS does not apply a date format automatically.