STATA TUTORIAL: LAB 1. 1. STATA windows The command window The viewer/results window The review...

Post on 19-Dec-2015

264 views 2 download

Transcript of STATA TUTORIAL: LAB 1. 1. STATA windows The command window The viewer/results window The review...

STATA TUTORIAL: LAB 1

1. STATA windows

The command window The viewer/results window The review of commands window The variable window

2. Working with STATA

A. Opening DataB. Using a “log” fileC. Useful CommandsD. Using a “do” file

A. Opening Data

Shows you your data Check this frequently, especially after

commands you are unsure about

A. Opening your data

If your data is in STATA format, then: Go to “File”>”Open”>Browse Location

where data stored>double click In Command window type: use “Fill In Correct Path Name\filename.dta”

Practice with “Wage1.dta”

A. Opening your data-Data editor/browser Data editor/data browser shows you your

data Go to “Window”>”Data Editor” Click on “Data Editor” or “Data Browser” icons

(editor: can modify data by typing in cell...like Excel; browser: locked, so can’t make changes)

Good to look at data when load data or after commands so that can understand structure of data.

A. Opening your data-Variable Window

Now that you have data loaded, you can see the variables that are included in the data listed in the variable window.

Name...name of variable Label...description of what variable is Type/Format...how STATA stores the

variable format Click on variable and it appears in

command window.

A. Opening your data-What do the variables look like? wage educ exper tenure nonwhite female married numdep smsa northcen

south west construc ndurman trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq

What values do they take? Wage...tenure, numdep are actual #’s Nonwhite...servocc take values of 0 or

1...qualitative measures of some personal characteristics

Lwage...tenursq are transformations of other variables (ln, square)

A. Opening your data (advanced)

If your data is a comma delimited file: insheet using “filename.txt”

If your data is a raw data file: It must have a dictionary file and you must

use the “infile” command infile using “dictionaryname.dct” dictionary file will refer to data that has a

“.dat” or “.raw” extenstion

B. The “log” file The log file is an “output file” Creates and saves a log with all the actions

performed by STATA and all the results How to open/close?

Go to “File”>“log”>“begin” Go to “File”>”log”>”close”

How to view it later? Go to“File”>“log”>“view”, and search for

your filename, keeping in mind it has extension “.log”

C. Useful Commands

“describe”: STATA will list all the variables, their labels,

types, and tell you the # of observations Two types of variables:

1. Numerical2. String (usually appear in red in the data

browser) You can convert a string variable to

numerical using the “destring” command: ie. “destring var1, replace” or “destring var1, force replace”

C. Useful Commands

“summarize, sum, summ” tells STATA to compute summary statistics

(mean, standard deviations, and so forth) for all variables

useful to identify outliers and get an idea of your data

i.e. summarize (will do all variables) i.e. summarize wage educ (just does wage

and educ..note, no “,” between variables)

C. Useful Commands

How many observations are there? What is the average value of wage? What is the min and max of tenure?

C. Useful Commands

“tabulate, tab” Shows the frequency and percent of each

value of the variable in the dataset i.e. tabulate tenure i.e. tab wage (long list, to display all press

space bar) i.e. tab educ female (gives education by

gender)

C. Useful Commands

“generate, gen” Creates a new variable gen weeklywage=wage*40 tab weeklywage gen prevexper=exper-tenure gen lwage=ln(wage)...gen

newlwage=ln(wage) gen expersp=exper*exper or gen

expersq=(exper)^2

C. Useful Commands

“if” command allows you to use only a portion of the observations tab wage if female==1 sum exper if educ>=13 gen expermomwkid=exper-1 if female==1 gen expermomwkid=exper-1 if female==1

& numdep!=0

C. Useful Commands

“reg” • reg dependent variable independent

variable (s) reg wage educ

• Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54

• R sq=0.1648• When educ=0, wage is predicted to be -

$0.90.

C. SLR Wage regression

_cons -.9048516 .6849678 -1.32 0.187 -2.250472 .4407687 educ .5413593 .053248 10.17 0.000 .4367534 .6459651 wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 7160.41429 525 13.6388844 Root MSE = 3.3784 Adj R-squared = 0.1632 Residual 5980.68225 524 11.4135158 R-squared = 0.1648 Model 1179.73204 1 1179.73204 Prob > F = 0.0000 F( 1, 524) = 103.36 Source SS df MS Number of obs = 526

• Increase in education by 1 unit (year) is predicted to increase hourly wage by $0.54...increase by 6 years=6*$0.54=$3.24

• R sq=0.1648; variation in education explains 16.4% of variation in wages

• When educ=0, wage is predicted to be -$0.90.

• Variance of estimator is 0.0532

C. Reading the output table

SSTotal  --The total variability around the mean.  •

SSResidual  --The sum of squared errors: •

SSModel (aka SSE) •

Observe SSModel=SSTotal - SSResidual. Note that SSModel / SSTotal is equal to

0.1648, the value of R-Square

2

1

)(

n

i

YY

2

1

)ˆ(

n

i

YY

2

1

)ˆ(

n

i

YY

C. Reading the output table

Coefficients: wagePredicted = -0.9048516 + 0.5413539*educ

Statistics (Ch. 4) t and P>|t| - These columns provide the t-value and 2-

tailed p-value used in testing the null hypothesis that the coefficient (parameter) is 0.

[95% Conf. Interval] - This shows a 95% confidence interval for the coefficient.  (the coefficient will not be statistically significant if the confidence interval includes 0)

C. Reading the output table

After the regression, type: predict wagehat, xb

Tells us predicted value of wage, given that observations value of education

predict uhat, resid tells us portion of wage that is not explained by

the independent variable(s)

C. Useful Commands “replace”: replace value with a new one

replace wage=4 if wage<4 “drop”: drop entire variable or just some

observations drop prevexper drop if educ<=8

“keep” keep wage educ Keep if educ>=8 Be careful with these commands!!

C. Operators

< less than > greater than <= less than or equal to >= greater than or equal to == equal to !=. or ~= not equal to & and | or

E. The “do” file A text file that you can type all your

commands in and store. Helpful to keep a file of what commands

you run in case you want to re-run them later.

How to open/save a do file? Go to “Window”>”Do-File Editor” Or click on “New Do-File Editor” Save the do file (.do) To open saved do file, open a new do-file

and search for where you saved it.

E. The “do” file:

Comments in your do file: /* */ STATA ignores the text that comes after

* (does not execute them) these lines can be used to describe what

the commands are doing, or allows you to write comments.

/*the following command summarizes the variable wage*/

sum wage

E. The “do” file From the STATA do-file editor

click “do” for STATA to execute all commands can highlight and click “do” to execute only the

highlighted command lines click “run” for STATA to execute all commands,

but you won’t see results in viewer/results window

All the commands in a do-file can be typed into the command window and run from there, but this is helpful if you want to do same thing over and over.

E. The “do” file Each command must have it’s own line Stata will not run: sum wage sum educBut will run:sum wagesum educsum wage educ

F. Save your data Saving in Stata format:

save “Type in correct path name\file name.dta”

Go to “File”>”Save” or “Save As”

G. Other Commands Increasing memory, variables

“set memory 200m” “set maxvar 400”

Clear the file “clear”

For long commands # delimit ; tells STATA that each STATA command ends

with a semicolon...instead of line break Do not forget the “;” and write this even

after the comment lines that start with *.

G. Other Commands sort

i.e. sort educ i.e. sort educ female

by educ: summarize wage(Note, must sort first by educ before can

use by educ) Graphs

twoway (scatter wage educ ) histogram wage

H. MLR Wage Regression

_cons -2.575859 .8066152 -3.19 0.001 -4.160491 -.9912264 smsa 1.045125 .3053285 3.42 0.001 .4452936 1.644957 numdep .1716186 .1116665 1.54 0.125 -.0477552 .3909924 married .649392 .3036465 2.14 0.033 .0528647 1.245919 female -2.101436 .2694909 -7.80 0.000 -2.630863 -1.572009 exper .0597343 .0111783 5.34 0.000 .037774 .0816945 educ .5677906 .0543361 10.45 0.000 .4610449 .6745363 wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 7160.41429 525 13.6388844 Root MSE = 3.0326 Adj R-squared = 0.3257 Residual 4772.95524 519 9.19644556 R-squared = 0.3334 Model 2387.45905 6 397.909841 Prob > F = 0.0000 F( 6, 519) = 43.27 Source SS df MS Number of obs = 526

. reg wage educ exper female married numdep smsa

• Including other covariates doesn’t change estimate on wage by much.

• R sq increases• Variables have expected sign: Higher wage if have more

experience, are married or have family(because probably very devoted worker), and live in metropolitan area. Women generally get paid less than men.