IRS Stata

40
GETTING STARTED WITH STATA Sébastien Fontenay ECON - IRES

Transcript of IRS Stata

Page 1: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 1/40

GETTING STARTED WITH STATA 

Sébastien Fontenay

ECON - IRES

Page 2: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 2/40

THE SOFTWARE 

Software developed in 1985 by StataCorp

Functionalities

› Data management

› Statistical analysis› Graphics

Using Stata at UCL

› Computer labs

Socrate 30, 31-32, 33, 34, 54 and 68

Dupriez 143

Leclercq 74, 76, 77 and 78

› Student licence to install on your personal computer

valid during all your studies at the price of 20 euros www.uclouvain.be/438229.html

Page 3: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 3/40

Best documentation

› help command

› search keyword

Stata website : www.stata.com/support 

› Frequently Asked Questions› Video tutorials

› Statalist

Books

› Cahuzac, E., Bontemps, C. (2008). Stata par la pratique: Statistiques,

graphiques et éléments de programmation.

› Cameron, A.C., Trivedi, P.K. (2009). Microeconometrics using Stata.

› Becketti, S. (2013). Time series using Stata.

UCLA : www.ats.ucla.edu/stat/stata 

FINDING SUPPORT (1) 

Page 4: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 4/40

For all your questions related to data management or analysis using

Stata

› Website: http://www.uclouvain.be/411370 

› Email: [email protected] 

› By appointment only:

• Bâtiment Dupriez (office d010), 3 place Montesquieu

FINDING SUPPORT (2) 

Page 5: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 5/40

COURSE TOPICS 

• Working environment

• Writing commands

Quick tour ofStata

• Inputting data

• Transforming data

Datamanagement

• Descriptive statistics

• Linear regression

• Exporting results

Data analysis

Page 6: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 6/40

SECTION 1

• Working environment

• Writing commands

QUICK TOUR OF STATA

Page 7: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 7/40

The working environment is composed of 5 windows

WORKING ENVIRONMENT 

› Results

of commands

› Review

• of commands

› Command

• window

› Variables

list and labels

› Properties

• of variables and

dataset

Page 8: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 8/40

Three specific windows can be opened by clicking on the following icons

› Data editor/browser

• Display data in memory

› Viewer• Display log and help files

› Do-file Editor

• Text editor to save/execute commands

There are 3 main types of files used in Stata› .dta  data

› .do  commands (do-file)

› .smcl | .log  output (log file)

WORKING ENVIRONMENT 

Page 9: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 9/40

Data Graphics Statistics

WORKING ENVIRONMENT 

All software functionalities

are available from the drop-

down menus

› Useful when you are unsure

of commands to run orunfamiliar with available

options

Every command issued in

this manner is echoed to thereview and results windows

› e.g. sysuse auto.dta

Page 10: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 10/40

In order to use Stata effectively, you should always follow this

three-step process:

› Open a do-file

› Choose your working directory

• cd "C:\Users\Me"

• mkdir stata_training

• cd stata_training- You can see the current working directory at the bottom left of the main window

› Start a log file (saving commands and their output)• log using filename [, text append replace]

- log close 

- log off | on 

WORKING ENVIRONMENT 

Page 11: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 11/40

SECTION 1

• Working environment

• Writing commands

QUICK TOUR OF STATA

Page 12: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 12/40

Stata commands use a common syntax:

[ prefix  :] command  [varlist ] [= exp] [if] [in] [, options]

• The square brackets denote qualifiers that are optional

• Italicized words are to be substituted by the user

› varlist  denotes a list of variables

› exp is a mathematical expression

Stata is case sensitive! (i.e. UPPERCASE != lowercase)

WRITING COMMANDS 

Page 13: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 13/40

 

Arithmetic

+ addition- subtraction* multiplication / division^ raised to power

Relational

> greater than< less than

>= > or equal

<= < or equal

== equal

~= not equal

!= not equal

WRITING COMMANDS 

Operators may be used to manipulate numerical or string variables

Logical

& and| or

! not

~ not 

Pay attention that a double equal sign (==) is used for equalitytesting

Page 14: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 14/40

Logical and relational operators are particularly useful with if  

qualifiers to define the sample for analysis

The if qualifier at the end of a command means the command is touse only the data specified

› command  if  exp

• list make if  foreign==1

• list if make=="Volvo 260"

list make price if price>=5000 & price<=7000• list make price if price<5000 | price>7000

Note that character strings are enclosed in double quotes

WRITING COMMANDS 

Page 15: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 15/40

1/30 1 to 30

1/l 1 until last number

f/-5 first to 5th number before the end

-5/l last five numbers

WRITING COMMANDS 

You can refer to a list of numbers using the following shorthand

Numlists are particularly useful with the in qualifiers to specify a

range of observations to be used

› command  in range• list in f/10

• list in -10/l

• list make price in 74

Page 16: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 16/40

The by prefix repeats execution of a command on subsets of the data

› subsets are groups of observations that take the same value in a given

variable (often a categorical variable)• by varname: command

- by foreign: list make

› If the dataset is not sorted, you should use the bysort prefix instead

bysort varname: command

WRITING COMMANDS 

Page 17: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 17/40

SECTION 2

• Inputting data

• Transforming data

DATA MANAGEMENT

Page 18: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 18/40

To open a dataset in Stata format (.dta): use 

› use filename [, clear]

• sysuse - open example datasets installed with Stata

To save a dataset in Stata format: save 

› save filename [, replace]

Stata can also import/export Excel files (.xls or .xlsx )

› import excel filename [, firstrow]

› export excel filename [, firstrow(variables)]

By default, Stata opens/saves a dataset from/in the current working

directory but you can specify

› another directory: use | save "C:\Users\Me\Stata_training\dataset.dta"

› a web address: use http://sites.uclouvain.be/datasupport/data/wage.dta

INPUTTING DATA 

Page 19: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 19/40

Summary of the dataset 

› describe: information on dataset in memory

› codebook: detailed description of variables

Further explore data in memory 

› count: number of observations

› list: display data in the results window

Manipulate variables/observations 

› keep wage educ exper

› drop in 1/10

› sort wage

INPUTTING DATA 

Page 20: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 20/40

SECTION 2

• Inputting data

• Transforming data

DATA MANAGEMENT

Page 21: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 21/40

To create a new variable: generate

› generate newvar  = exp [if] [in]

• exp may be a number, a character string or a mathematical function

• generate constant = 1- Create a constant equal to 1

• generate constant_text = "text"- Create a constant that contains the character string "text"

• generate logwage = ln(wage)

- Create a variable equal to the natural logarithm of wage

• generate expersq = expr^2- Create a variable equal to the square of exper

TRANSFORMING DATA 

Page 22: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 22/40

To create specific variables using time series operators 

› generate lag_gdp = L.gdp

• Create a variable corresponding to the first lag of gdp

› generate lead_gdp = F.gdp

• Create a variable corresponding to the first lead of gdp

›generate diff_gdp = D.gdp

• Create a variable corresponding to the first difference of gdp

But before you should tell Stata that you are working with time series

data using the command: tsset 

› tsset time [, yearly monthly quarterly daily]

Using system variables

› generate gdp_growth = ((gdp[_n] - gdp[_n-1]) / gdp[_n-1])*100

• Create a variable equal to the growth rate of gdp

TRANSFORMING DATA 

Page 23: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 23/40

To modify an existing variable: replace

› replace wage=20 if wage>=20

To rename an existing variable: rename 

› rename wage hourly_wage 

You can also add a brief description to the variable using labels

›label variable educ "total years of education"

TRANSFORMING DATA 

Page 24: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 24/40

When transforming data, one must be careful with missing values

› Missing values in Stata are coded with a . (period)

Stata treats missing values as large numbers, higher than any other

values of a given variable 

›  In certain cases you should use the if qualifier to exclude missing values 

• generate rich = (wage>15) if wage<.

|or| 

generate rich = (wage>15) if wage!=.|or| 

• generate rich = (wage>15) if !missing(wage)

TRANSFORMING DATA 

Page 25: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 25/40

SECTION 3

• Descriptive statistics

• Linear regression

• Exporting results

DATA ANALYSIS

Page 26: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 26/40

Categorical variables

› One-way table of frequencies 

• tabulate female- The option [, missing] displays the total frequency of missing observations

› Two-way table of frequencies

tabulate female married

Continuous variables 

› summarize gives the number of observations, the mean, the standard

deviation, the minimum and maximum values

• summarize wage educ

- The option [, detail] displays the main quantiles, the highest and lowest five values, the

variance, as well as the skewness and kurtosis measures

  Pearson’s correlation coefficient

› correlate varlist  [, covariance]

DESCRIPTIVE STATISTICS 

Page 27: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 27/40

Page 28: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 28/40

SECTION 3

• Descriptive statistics

• Linear regression

• Exporting results

DATA ANALYSIS

Page 29: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 29/40

LINEAR  REGRESSION 

We seek to estimate the relationship between one dependent variable

and a set of independent variables

› using the Ordinary Least Squares (OLS) estimator

Classical linear model assumptions (Wooldridge, 2008): 

› Model is linear in parameters

› Data are random sample of the population

› No perfect collinearity between independent variables

› Zero conditional mean of error term› Homoskedasticity

› Normality of the residuals

Page 30: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 30/40

LINEAR  REGRESSION 

The model we want to estimate:

› log(wage) = 0 + 1education + 2experience + 3tenure + u

• where:

- wage is average hourly earnings in dollars

- education is the number of years of education

- experience is the number of years of labour market experience

- tenure is the number of years with the current employer

In Stata:› regress logwage educ exper tenure

Page 31: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 31/40

LINEAR  REGRESSION 

Stata output

Page 32: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 32/40

Analysis of variance

› Sum of Squares (SS)

Explained variance (model)• Residual variance

• Total variance

› Degrees of freedom (df)

› Mean Squares (MS)

• SS divided by df

LINEAR  REGRESSION 

Page 33: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 33/40

Page 34: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 34/40

Parameters estimates

› Dependent variable (1)

› Independent variables and intercept (2)

› Coefficients (3)

› Standard-errors (4)

› t-statistics (5)› p-values associated with the t-statistics (6)

• testing the null hypothesis that a given coefficient is 0 

› 95% confidence intervals (7)

LINEAR  REGRESSION 

(1) (3) (4) (5) (6) (7)

(2)

Page 35: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 35/40

LINEAR  REGRESSION 

Predicting fitted values and residuals› predict wage_fitted

• e.g. 1,304921 = 0,2843595 + 11*0,092029 + 2*0,0041211 + 0*0,0220672

› predict wage_resid, r

• e.g. -0,1735185 = 1,131402 – 1,304921

logwage educ exper tenure wage_fitted wage_resid

1 1,131402 11 2 0 1,304921 -0,1735185

2 1,175573 12 22 2 1,523506 -0,3479329

3 1,098612 11 2 0 1,304921 -0,2063083

4 1,791759 8 44 28 1,819802 -0,0280429

5 1,667707 12 7 2 1,461690 0,2060172

6 2,169054 16 9 8 1,970451 0,1986027

7 2,420368 18 15 7 2,157168 0,2631997

8 1,609438 12 5 3 1,475515 0,1339233

9 1,280934 12 26 4 1,584125 -0,3031912

10 2,900322 17 22 21 2,402928 0,4973939

Page 36: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 36/40

Incorporating categorical information into regression models

Dummy variables (coded as 0/1) can be included as such in the

regression› regress wage educ exper tenure female

Categorical variables with more than two categories must be included

using the i. prefix

› regress wage educ exper tenure i.region

• Stata will automatically create dummy variables for each category and

incorporate them in the regression except the reference category- You can use the prefix ib( x ). instead to change the reference category

LINEAR  REGRESSION 

Page 37: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 37/40

LINEAR  REGRESSION 

Post-estimation tests

› Multicollinearity (Wooldridge, 2008 - chapter 3, p99)• estat vif

- Rule of thumb, if variance inflation factor>10, multicollinearity problem

› Normality of the residuals

• sktest varname

- testing the null hypothesis that variable follows a standard normal distribution• swilk|sfrancia varname

- Shapiro-Wilk and Shapiro-Francia test

› Homoskedasticity (Wooldridge, 2008 - chapter 8)

• estat hettest- Breusch-Pagan test, testing the null hypothesis of homoskedasticity

• estat imtest, white- White test, testing the null hypothesis of homoskedasticity

• The [, robust] option after regress gives heteroskedasticity-robust standard errors

› F-test: testing that a group of variables has no effect on the dependent

variable – joint hypotheses test (Wooldridge, 2008 - chapter 4, p143)

• test var1 var2 

Page 38: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 38/40

SECTION 3

• Descriptive statistics

• Linear regression

• Exporting results

DATA ANALYSIS

Page 39: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 39/40

EXPORTING RESULTS 

outreg2 allows to easily export the results of one or several

regressions› to Microsoft Office applications: Word, Excel

› to LaTeX

outreg2 [estlist ] using filename [, word excel tex]

› [estlist ] refers to the list of estimation results previously saved using the

command: estimates store estname

Page 40: IRS Stata

8/16/2019 IRS Stata

http://slidepdf.com/reader/full/irs-stata 40/40

EXPORTING RESULTS 

regress logwage educ

estimates store est1

regress logwage educ exper tenure

estimates store est2

regress logwage educ exper tenure female

estimates store est3

outreg2 [est1 est2 est3] using output, word