IRS Stata
Transcript of IRS Stata
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 1/40
GETTING STARTED WITH STATA
Sébastien Fontenay
ECON - IRES
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 2/40
THE SOFTWARE
Software developed in 1985 by StataCorp
Functionalities
› Data management
› Statistical analysis› Graphics
Using Stata at UCL
› Computer labs
Socrate 30, 31-32, 33, 34, 54 and 68
Dupriez 143
Leclercq 74, 76, 77 and 78
› Student licence to install on your personal computer
valid during all your studies at the price of 20 euros www.uclouvain.be/438229.html
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 3/40
Best documentation
› help command
› search keyword
Stata website : www.stata.com/support
› Frequently Asked Questions› Video tutorials
› Statalist
Books
› Cahuzac, E., Bontemps, C. (2008). Stata par la pratique: Statistiques,
graphiques et éléments de programmation.
› Cameron, A.C., Trivedi, P.K. (2009). Microeconometrics using Stata.
› Becketti, S. (2013). Time series using Stata.
UCLA : www.ats.ucla.edu/stat/stata
FINDING SUPPORT (1)
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 4/40
For all your questions related to data management or analysis using
Stata
› Website: http://www.uclouvain.be/411370
› Email: [email protected]
› By appointment only:
• Bâtiment Dupriez (office d010), 3 place Montesquieu
FINDING SUPPORT (2)
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 5/40
COURSE TOPICS
• Working environment
• Writing commands
Quick tour ofStata
• Inputting data
• Transforming data
Datamanagement
• Descriptive statistics
• Linear regression
• Exporting results
Data analysis
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 6/40
SECTION 1
• Working environment
• Writing commands
QUICK TOUR OF STATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 7/40
The working environment is composed of 5 windows
WORKING ENVIRONMENT
› Results
of commands
› Review
• of commands
› Command
• window
› Variables
•
list and labels
› Properties
• of variables and
dataset
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 8/40
Three specific windows can be opened by clicking on the following icons
› Data editor/browser
• Display data in memory
› Viewer• Display log and help files
› Do-file Editor
• Text editor to save/execute commands
There are 3 main types of files used in Stata› .dta data
› .do commands (do-file)
› .smcl | .log output (log file)
WORKING ENVIRONMENT
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 9/40
Data Graphics Statistics
WORKING ENVIRONMENT
All software functionalities
are available from the drop-
down menus
› Useful when you are unsure
of commands to run orunfamiliar with available
options
Every command issued in
this manner is echoed to thereview and results windows
› e.g. sysuse auto.dta
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 10/40
In order to use Stata effectively, you should always follow this
three-step process:
› Open a do-file
› Choose your working directory
• cd "C:\Users\Me"
• mkdir stata_training
• cd stata_training- You can see the current working directory at the bottom left of the main window
› Start a log file (saving commands and their output)• log using filename [, text append replace]
- log close
- log off | on
WORKING ENVIRONMENT
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 11/40
SECTION 1
• Working environment
• Writing commands
QUICK TOUR OF STATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 12/40
Stata commands use a common syntax:
[ prefix :] command [varlist ] [= exp] [if] [in] [, options]
• The square brackets denote qualifiers that are optional
• Italicized words are to be substituted by the user
› varlist denotes a list of variables
› exp is a mathematical expression
Stata is case sensitive! (i.e. UPPERCASE != lowercase)
WRITING COMMANDS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 13/40
Arithmetic
+ addition- subtraction* multiplication / division^ raised to power
Relational
> greater than< less than
>= > or equal
<= < or equal
== equal
~= not equal
!= not equal
WRITING COMMANDS
Operators may be used to manipulate numerical or string variables
Logical
& and| or
! not
~ not
Pay attention that a double equal sign (==) is used for equalitytesting
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 14/40
Logical and relational operators are particularly useful with if
qualifiers to define the sample for analysis
The if qualifier at the end of a command means the command is touse only the data specified
› command if exp
• list make if foreign==1
• list if make=="Volvo 260"
•
list make price if price>=5000 & price<=7000• list make price if price<5000 | price>7000
Note that character strings are enclosed in double quotes
WRITING COMMANDS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 15/40
1/30 1 to 30
1/l 1 until last number
f/-5 first to 5th number before the end
-5/l last five numbers
WRITING COMMANDS
You can refer to a list of numbers using the following shorthand
Numlists are particularly useful with the in qualifiers to specify a
range of observations to be used
› command in range• list in f/10
• list in -10/l
• list make price in 74
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 16/40
The by prefix repeats execution of a command on subsets of the data
› subsets are groups of observations that take the same value in a given
variable (often a categorical variable)• by varname: command
- by foreign: list make
› If the dataset is not sorted, you should use the bysort prefix instead
•
bysort varname: command
WRITING COMMANDS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 17/40
SECTION 2
• Inputting data
• Transforming data
DATA MANAGEMENT
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 18/40
To open a dataset in Stata format (.dta): use
› use filename [, clear]
• sysuse - open example datasets installed with Stata
To save a dataset in Stata format: save
› save filename [, replace]
Stata can also import/export Excel files (.xls or .xlsx )
› import excel filename [, firstrow]
› export excel filename [, firstrow(variables)]
By default, Stata opens/saves a dataset from/in the current working
directory but you can specify
› another directory: use | save "C:\Users\Me\Stata_training\dataset.dta"
› a web address: use http://sites.uclouvain.be/datasupport/data/wage.dta
INPUTTING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 19/40
Summary of the dataset
› describe: information on dataset in memory
› codebook: detailed description of variables
Further explore data in memory
› count: number of observations
› list: display data in the results window
Manipulate variables/observations
› keep wage educ exper
› drop in 1/10
› sort wage
INPUTTING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 20/40
SECTION 2
• Inputting data
• Transforming data
DATA MANAGEMENT
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 21/40
To create a new variable: generate
› generate newvar = exp [if] [in]
• exp may be a number, a character string or a mathematical function
• generate constant = 1- Create a constant equal to 1
• generate constant_text = "text"- Create a constant that contains the character string "text"
• generate logwage = ln(wage)
- Create a variable equal to the natural logarithm of wage
• generate expersq = expr^2- Create a variable equal to the square of exper
TRANSFORMING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 22/40
To create specific variables using time series operators
› generate lag_gdp = L.gdp
• Create a variable corresponding to the first lag of gdp
› generate lead_gdp = F.gdp
• Create a variable corresponding to the first lead of gdp
›generate diff_gdp = D.gdp
• Create a variable corresponding to the first difference of gdp
But before you should tell Stata that you are working with time series
data using the command: tsset
› tsset time [, yearly monthly quarterly daily]
Using system variables
› generate gdp_growth = ((gdp[_n] - gdp[_n-1]) / gdp[_n-1])*100
• Create a variable equal to the growth rate of gdp
TRANSFORMING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 23/40
To modify an existing variable: replace
› replace wage=20 if wage>=20
To rename an existing variable: rename
› rename wage hourly_wage
You can also add a brief description to the variable using labels
›label variable educ "total years of education"
TRANSFORMING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 24/40
When transforming data, one must be careful with missing values
› Missing values in Stata are coded with a . (period)
Stata treats missing values as large numbers, higher than any other
values of a given variable
› In certain cases you should use the if qualifier to exclude missing values
• generate rich = (wage>15) if wage<.
|or|
•
generate rich = (wage>15) if wage!=.|or|
• generate rich = (wage>15) if !missing(wage)
TRANSFORMING DATA
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 25/40
SECTION 3
• Descriptive statistics
• Linear regression
• Exporting results
DATA ANALYSIS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 26/40
Categorical variables
› One-way table of frequencies
• tabulate female- The option [, missing] displays the total frequency of missing observations
› Two-way table of frequencies
•
tabulate female married
Continuous variables
› summarize gives the number of observations, the mean, the standard
deviation, the minimum and maximum values
• summarize wage educ
- The option [, detail] displays the main quantiles, the highest and lowest five values, the
variance, as well as the skewness and kurtosis measures
Pearson’s correlation coefficient
› correlate varlist [, covariance]
DESCRIPTIVE STATISTICS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 27/40
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 28/40
SECTION 3
• Descriptive statistics
• Linear regression
• Exporting results
DATA ANALYSIS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 29/40
LINEAR REGRESSION
We seek to estimate the relationship between one dependent variable
and a set of independent variables
› using the Ordinary Least Squares (OLS) estimator
Classical linear model assumptions (Wooldridge, 2008):
› Model is linear in parameters
› Data are random sample of the population
› No perfect collinearity between independent variables
› Zero conditional mean of error term› Homoskedasticity
› Normality of the residuals
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 30/40
LINEAR REGRESSION
The model we want to estimate:
› log(wage) = 0 + 1education + 2experience + 3tenure + u
• where:
- wage is average hourly earnings in dollars
- education is the number of years of education
- experience is the number of years of labour market experience
- tenure is the number of years with the current employer
In Stata:› regress logwage educ exper tenure
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 31/40
LINEAR REGRESSION
Stata output
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 32/40
Analysis of variance
› Sum of Squares (SS)
•
Explained variance (model)• Residual variance
• Total variance
› Degrees of freedom (df)
› Mean Squares (MS)
• SS divided by df
LINEAR REGRESSION
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 33/40
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 34/40
Parameters estimates
› Dependent variable (1)
› Independent variables and intercept (2)
› Coefficients (3)
› Standard-errors (4)
› t-statistics (5)› p-values associated with the t-statistics (6)
• testing the null hypothesis that a given coefficient is 0
› 95% confidence intervals (7)
LINEAR REGRESSION
(1) (3) (4) (5) (6) (7)
(2)
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 35/40
LINEAR REGRESSION
Predicting fitted values and residuals› predict wage_fitted
• e.g. 1,304921 = 0,2843595 + 11*0,092029 + 2*0,0041211 + 0*0,0220672
› predict wage_resid, r
• e.g. -0,1735185 = 1,131402 – 1,304921
logwage educ exper tenure wage_fitted wage_resid
1 1,131402 11 2 0 1,304921 -0,1735185
2 1,175573 12 22 2 1,523506 -0,3479329
3 1,098612 11 2 0 1,304921 -0,2063083
4 1,791759 8 44 28 1,819802 -0,0280429
5 1,667707 12 7 2 1,461690 0,2060172
6 2,169054 16 9 8 1,970451 0,1986027
7 2,420368 18 15 7 2,157168 0,2631997
8 1,609438 12 5 3 1,475515 0,1339233
9 1,280934 12 26 4 1,584125 -0,3031912
10 2,900322 17 22 21 2,402928 0,4973939
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 36/40
Incorporating categorical information into regression models
Dummy variables (coded as 0/1) can be included as such in the
regression› regress wage educ exper tenure female
Categorical variables with more than two categories must be included
using the i. prefix
› regress wage educ exper tenure i.region
• Stata will automatically create dummy variables for each category and
incorporate them in the regression except the reference category- You can use the prefix ib( x ). instead to change the reference category
LINEAR REGRESSION
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 37/40
LINEAR REGRESSION
Post-estimation tests
› Multicollinearity (Wooldridge, 2008 - chapter 3, p99)• estat vif
- Rule of thumb, if variance inflation factor>10, multicollinearity problem
› Normality of the residuals
• sktest varname
- testing the null hypothesis that variable follows a standard normal distribution• swilk|sfrancia varname
- Shapiro-Wilk and Shapiro-Francia test
› Homoskedasticity (Wooldridge, 2008 - chapter 8)
• estat hettest- Breusch-Pagan test, testing the null hypothesis of homoskedasticity
• estat imtest, white- White test, testing the null hypothesis of homoskedasticity
• The [, robust] option after regress gives heteroskedasticity-robust standard errors
› F-test: testing that a group of variables has no effect on the dependent
variable – joint hypotheses test (Wooldridge, 2008 - chapter 4, p143)
• test var1 var2
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 38/40
SECTION 3
• Descriptive statistics
• Linear regression
• Exporting results
DATA ANALYSIS
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 39/40
EXPORTING RESULTS
outreg2 allows to easily export the results of one or several
regressions› to Microsoft Office applications: Word, Excel
› to LaTeX
outreg2 [estlist ] using filename [, word excel tex]
› [estlist ] refers to the list of estimation results previously saved using the
command: estimates store estname
8/16/2019 IRS Stata
http://slidepdf.com/reader/full/irs-stata 40/40
EXPORTING RESULTS
regress logwage educ
estimates store est1
regress logwage educ exper tenure
estimates store est2
regress logwage educ exper tenure female
estimates store est3
outreg2 [est1 est2 est3] using output, word