Identi cation and Estimation of Production Function with...
Transcript of Identi cation and Estimation of Production Function with...
Identification and Estimation of Production Function
with Unobserved Heterogeneity
Hiroyuki Kasahara∗
Vancouver School of EconomicsUniversity of British Columbia
Paul SchrimpfVancouver School of EconomicsUniversity of British Columbia
Michio SuzukiFaculty of EconomicsUniversity of Tokyo
March 21, 2017
Abstract
This paper examines non-parametric identifiability of production function when production
functions are heterogenous across firms beyond Hicks-neutral technology terms. Using a finite
mixture specification to capture unobserved heterogeneity in production technology, we shows
that production function for each unobserved type is non-parametrically identified under regu-
larity conditions. We estimate a random coefficients production function using the panel data of
Japanese publicly-traded manufacturing firms and compare it with the estimate of production
function with fixed coefficients estimated by the method of Gandhi, Navarro, and Rivers (2013).
Our estimates for random coefficients production function suggest that there exists substantial
heterogeneity in production function coefficients beyond Hicks neutral term across firms within
narrowly defined industry.
1 Introduction
Estimation of production function is one of the most important topics in empirical economics.
Understanding how the input is related to the output is a fundamental issue in empirical industrial
organization (see, for example, Ackerberg, Benkard, Berry, and Pakes, 2007). In empirical trade
and macroeconomics, researchers are often interested in estimating production function to obtain
a measure of total factor productivity to examine the effect of trade policy on productivity and
∗Address for correspondence: Hiroyuki Kasahara, Vancouver School of Economics, University of British Columbia,6000 Iona Dr., Vancouver, BC, V6T 1L4 Canada.
1
to analyze the role of resource allocation on aggregate productivity (e.g., Pavcnik, 2002; Kasahara
and Rodrigue, 2008; Hsieh and Klenow, 2009).
As first discussed by Marschak and Andrews (1944), the ordinary least square estimates of
production function suffers from simultaneity bias because inputs are correlated with error term
when a firm makes an input decision based on their productivity level (Griliches and Mairesse, 1998).
Under the assumption that error terms could be decomposed into permanent and idiosyncratic
components, fixed effects estimator may be used but such an assumption could be violated in
practice, and, furthermore, the coefficient of inputs that are persistent over time could be severely
biased downward due to measurement errors (Griliches and Hausman, 1986). More recent literature
attempts to address the simultaneity issue by employing dynamic panel approach (Arellano and
Bond, 1991; Blundell and Bond, 1998; Blundell and Bond, 2000) or developing proxy variable
approach (Olley and Pakes, 1996 (OP, hereafter); Levinsohn and Petrin, 2003 (LP, hereafter);
Ackerberg, Caves, and Frazer, 2006, (ACF, hereafter); Wooldridge, 2009), which are now widely
used in empirical applications.
Despite their popularity, however, potential identification issues of proxy variable approach have
been pointed out in the literature. Bond and Sderbom (2005) and ACF discuss identification issue
due to collinearity under two flexible inputs (i.e., material and labor) in Cobb-Douglas specification.
Gandhi, Navarro, and Rivers (2013, GNR hereafter) argue that, if the firm’s decision follows a
Markovian strategy, then the conditional moment restriction implied by proxy variable approach
may not provide enough restriction for non-parametrically identifying gross production function.
GNR exploit the first order condition with respect to flexible input under profit maximization and
establish the identification of production function without making any functional form assumption.
Based on their identification strategy, GNR proposes an estimation procedure that does not suffer
from simultaneity bias.
This paper extends the identification result of GNR based on the first-order condition to the
case where production functions are heterogenous across firms beyond Hicks-neutral technology
terms. We consider a finite mixture specification in which there are J distinct time-varying pro-
duction technologies and each firm belongs to one of J types. Econometricians do not observe the
type of firms. Without making any functional form assumption on each type of production technol-
ogy, we establish nonparametric identification of J distinct production functions and a population
proportion of each type under the reasonable assumption.
Given that, except for the result of GNR, little formal identification result for production func-
tion estimation in the literature is available, our nonparametric identification result is an important
contribution to the literature. Our identification result on production function with unobserved
heterogeneity is also useful in practice as the random coefficient models for production function
become increasingly popular in empirical analysis (e.g., Mairesse and Griliches, 1990; Van Biese-
broeck, 2003; Doraszelski and Jaumandreu, 2014).
In estimation, we consider a random coefficient specification for production function and propose
two different estimation procedures. The first procedure follows our two-stage identification proof
2
and directly maximizes the log-likelihood function of a finite mixture model of production functions
under parametric assumptions, where the EM algorithm can be used to facilitates the computational
complication of maximizing the log-likelihood function of the finite mixture model. In the second
procedure, we first estimate the partial likelihood function under the normality assumption and
use the posterior distribution of type probabilities to classify each firm observation into one of the
J types, generating J data sets; using each of J data sets, we estimate the rest of the type-specific
parameters. The second procedure is computationally much simpler and requires less auxiliary
parametric assumptions than the first one although the second procedure could lead to a biased
estimator due to misclassification of types when T is small.
We provide empirical evidence that production functions are heterogeneous beyond Hicks-
neutral technology term to motivate the necessity of considering production functions with un-
observed heterogeneity in empirical applications. As analyzed by GNR, if Hicks-neutral technology
term is the only source of permanent unobserved heterogeneity in production function and if in-
termediate input is a flexible input, then we expect that the ratio of intermediate input cost to
output value after controlling for the difference in the input level of capital, labor, and intermedi-
ates should not exhibit any serial correlation. However, using the panel of Japanese manufacturing
firms that belongs to machine industry, we find that the serial correlation of the ratio of inter-
mediate input cost to output value is very high at 0.95 and that, even after controlling for the
difference in the input level of capital, labor, and intermediates, the majority of variation in the
ratio of intermediate input cost to output value can be explained by the firm-specific persistent
component rather than the idiosyncratic component. These findings strongly suggest the presence
of unobserved heterogeneity in production technology beyond Hicks-neutral term within the 3-digit
industry classification.
We estimate a random coefficients production function using the panel data of Japanese publicly-
traded manufacturing firms between 1980 and 2007 and compare the results with those from the
original GNR specification without unobserved heterogeneity. Our estimates suggest that there
exists substantial heterogeneity in production function coefficients beyond Hicks neutral term.
When we estimate production function without incorporating heterogeneity using the estimation
procedure suggested by GNR, we found that the majority of variations in total factor productivity
is coming from idiosyncratic ex-post shocks rather than serially correlated shocks. In contrast,
when we estimate production function with random coefficients, the majority of variations in total
factor productivity is explained by the variation in serially correlated shocks. Furthermore, the
estimated serial correlation in ex-post shocks of random coefficients model is substantially lower
than that of homogenous model. We also find that the correlation between estimated productivity
and investment is different across different types of firms, where the correlation is stronger among
a type of firms with capital intensive production technology than other types of firms.
3
2 The Model
Assume that we have panel data of firms i = 1, ..., N over periods t = 1, ..., T for output, the
number of workers, capital, intermediate inputs, and the average wage per worker, denoted by
(Yit,Kit, Lit,Mit,Wit) ∈ Y ×K×L×M×W, respectively. For brevity, let Xit := (Kit, Lit,Mit)′ ∈
X := K×L×M so that (Yit,Kit, Lit,Mit,Wit) = (Yit, Xit,Wit). Each firm’s observation {Yit, Xit,Wit}Tt=1
is randomly sampled from a population distribution P ({Yit, Xit,Wit}Tt=1).
We consider a possibility that firms are different in production technology beyond Hick’s neu-
tral productivity shock. Specifically, we use a finite mixture specification to capture permanent
unobserved heterogeneity in firm’s production technology. Define the latent random variable
Di ∈ {1, 2, ..., J} that represents the type of firm i so that Di = j when firm i has the j-th
type of technology. In the following, the superscript j indicates that functions are specific to type
j while the subscript t indicates that functions are specific to period t. In particular, for a random
variable Zit, we denote the probability distribution and the expectation conditional on Di = j as
P j(Zit) := P (Zit|Di = j) and Ej [Zit] := E[Zit|Di = j]. We assume that both Mit and Lit are
flexibly chosen after observing serially correlated productivity shock ωit and a wage shock vit. On
the other hand, Kit is predetermined at the end of last period. Denote the information available
to a firm for making decisions on Mit and Lit by Iit.
Assumption 1. (a) Each firm belongs to one of J types, where the probability of belonging to type
j is given by πj = P(Di = j), and J is known. (b) For the j-th type of production technology at
time t, the output is related to inputs as
Yit = eωit+εitF jt (Kit, eψjtLit,Mit) = eωit+εitF jt (Kit, Lit,Mit) (1)
where F jt (Kit, Lit,Mit) := F jt (Kit, eψjtLit,Mit), F
jt (·) is a twice continuously differentiable, strictly
increasing, and strictly concave function. Hit := eψjtLit is the labour input in effective unit of labour,
where eψjt ∈ Iit represents the quality of workers for the j-th types of firms relative to other types of
firms with∑J
j=1 πjeψ
jt = 1. (c) The average wage of workers is given by Wit = evit+ζitPH,tHit/Lit =
eψjt+vit+ζitPH,t, where evit+ζitPH,t is the wage per effective unit of labour which is given to firm i at
time t, where PH,t is common across firms.
Assumption 2. (a) (ωit, vit) ∈ Iit. For the j-th type, ωit ∈ Iit follows an exogenous first order
stationary Markov process given by
ωit = hj(ωit−1) + ηit (2)
where, conditional on Iit−1, ηit and vit are mean-zero i.i.d. random variables on R with the proba-
bility density functions gjη(·) and gjv(·), respectively. Furthermore, the unconditional expectation of
ωit is zero, i.e., Ej [ωit] = 0. (b) (εit, ζit) 6∈ Iit so that (εit, ζit) is not known when Lit and Mit are
chosen. For the j-th type, conditional on Iit, (εit, ζit) is a mean-zero i.i.d. random variable on R2
4
with the probability density function gjεζ,t(·).
Assumption 3. (a) Kit ∈ Iit but Kit 6∈ Iit−1. (b) the conditional distribution of Kit given It−1 is
type specific and only depends on Kit−1 and ωit−1, i.e., Pt(Kit|It−1, Di = j) = P jt (Kit|Kit−1, ωit−1).
Assumption 4. (a) Mit and Lit are chosen at time t by maximizing the expected profit conditional
on Iit as
(Mit, Lit) = (Mjt (Kit, ωit),Ljt (Kit, ωit, vit))
:= argmax(M,L)∈M×L
PY,tEj [eεit |Iit]eωitF jt (Kit, L,M)− PM,tM − Ej [eζit |Iit]eψ
jt+vitPH,tL.
(b) Mit is a type-specific deterministic function of Kit and ωit that can be written as Mit =
Mjt (Kit, ωit), where Mj
t is strictly increasing in ωit for any Kit. (c) Lit is a type-specific deter-
ministic function of Kit, ωit, and vit that can be written as Lit = Ljt (Kit, ωit, vit), where Ljt is
strictly decreasing in vit for any (Kit, ωit).
Assumption 5. (a) A firm is a price taker. (b) The intermediate input price PM,t and the output
price PY,t at time t are common across firms. (c) (PM,t, PY,t, PH,t) ∈ Iit and (PM,t, PY,t) is known
to an econometrician.
In Assumption 1, as indicated by the subscript t in F jt (·), type-specific production function
could be different across periods because of type-specific aggregate shocks or type-specific biased
technological changes. The quality of workers also differ across types and periods as captured by the
parameter ψjt , which leads to the systematic difference in the average wage of workers across types.
The restriction∑J
j=1 πjeψ
jt = 1 is necessary for identification of PH,t. The firms are subject to
productivity shocks and wage shocks represented by ωit+ εit and vit+ζit, respectively. Assumption
2 assume that (ωit, vit) is known when Lit and Mit are chosen while (εit, ζit) is not known when
Lit and Mit are chosen. The presence of i.i.d. wage shock vit provides an additional source of
variation for Lit beyond ωit and Kit; consequently, Lit and Mit are not collinear, preventing the
identification problem discussed by Bond and Sderbom (2005) and ACF. The assumption that
Ej [ωit] = 0 is necessary for identification because the production function, F jt (·), differs across
times.
Assumption 3(a) assumes that Kit is determined at time t − 1 so that (ηit, ωit, vit) is not
known when Kit is chosen. Assumption 3(b) can be justified by explicitly considering the dy-
namic model of investment decisions. Assumption 4(b) is a consequence of the strict concavity
assumption in Assumption 1, implying that there exists one-to-one relationship between (Mit, Lit)
and (ωit, vit) conditional on the value of Kit. We may write ωit = Mjt
−1(Kit,Mit) and vit =
Ljt−1
(Kit,Mjt
−1(Kit,Mit), Lit), where Mj
t
−1and Ljt
−1are inverse functions of Mj
t and Ljt with
respect to ωit and vit, respectively.
Under Assumption 5(b), the intermediate input price PM,t cannot be used for instrumentingMit;
when intermediate prices are exogenous and heterogenous across firms, production function could
5
be identified using the intermediate input prices as instruments (see Doraszelski and Jaumandreu,
2014). In Assumption 5(c), we may alternatively assume that a firm is subject to idiosyncratic
price shock ξit such that, for example, PY,it = exp(ξit)PY,t with ξit 6∈ Iit, then ξit plays the similar
role to εit. We may assume that (PM,t, PY,t) is not known to econometrician by treating PM,t/PY,t
as parameters to be estimated; in such a case, we may identify the production function up to scale.
Under Assumptions 1-5, the information set Iit is given by Iit = {ωit, vit,Kit, PH,t, PM,t, PY,t, Vit−1, Vit−2, ...},where Vit = {ζit, εit, ωit, vit,Kit, PH,t, PM,t, PY,t}.
Let gε,t(ε) :=∫gεζ,t(ε, ζ)dζ and gζ,t(ζ) :=
∫gεζ,t(ε, ζ)dε. Under Assumptions 1, 2, 3(a), 4(a),
and 5, the first order conditions with respect to Mit and Lit give
PY,tFjM,t(Xit)E
jt (e
ε)eωit = PM,t, PY,tFjL,t(Xit)E
jt (e
ε)eωit = PH,tEjt (e
ζ)eψjt+vit , (3)
where F jM,t(X) :=∂F jt (X)∂M , F jL,t(X) :=
∂F jt (X)∂L , Ejt [e
ε] :=∫eεgjε,t(ε)dε, and Ejt [e
ζ ] :=∫eζgjζ,t(ζ)dζ.
Equations (1) and (3) give a system of equations
lnYit = lnF jt (Xit) + ωit + εit,
lnSmit = ln(GjM,t(Xit)E
jt [e
ε])− εit,
lnS`it = ln(GjL,t(Xit)E
jt [e
ε]/Ejt [eζ ])− εit + ζit,
(4)
where
Smit :=PM,tMit
PY,tYit, S`it :=
WitLitPY,tYit
, GjM,t(Xit) :=F jM,t(Xit)Mit
F jt (Xit), and GjL,t(Xit) :=
F jL,t(Xit)Lit
F jt (Xit).
In place of Assumption 5, we may alternatively consider the case where firms produce differen-
tiated products and face a demand function with constant price elasticity as follows.
Assumption 6 (Constant Demand Elasticity). (a) A firm faces an inverse demand function with
constant elasticity given by PY,it = Y−1/σjYit eε
jd,it, where εd,it /∈ Iit is an i.i.d. ex-post shock that is
not known when Mit is chosen at time t. (b) A firm is a price taker for intermediate and labour
inputs and the intermediate and labour input prices at time t, PM,t and PL,t, are common across
firms. (c) (PL,t, PM,t, PY,t) ∈ I`it ⊂ Iit. (d) PY,it and Yit are not separately observed in the data.
Under Assumption 6, the “revenue” production function is given by PY,itYit = F jt (Xit)eωit+εit ,
where F jt (Xit) := [F jt (Xit)]
σjY
−1
σjY , ωit :=
σjY −1
σjYωit, ζit :=
σjY −1
σjYζit, and εit := εdit +
σjY −1
σjYεit. Then, in
place of (4), we have
lnPY,itYit = ln F jt (Xit) + ωit + εit,
lnSmit = ln(GjM,t(Xit)E
jt [e
ε])− εit,
lnS`it = ln(GjL,t(Xit)E
jt [e
ε]/Ejt [eζ ])− εit + ζit,
(5)
6
where GjM,t(Xit) :=F jM,t(Xit)Mit
F jt (Xit)and GjL,t(Xit) :=
F jL,t(Xit)Lit
F jt (Xit). When PY,it and Yit are not separately
observed in the data, the observable implication of (5) are the same as that of (4). In particular, we
cannot separately identify the parameter σjY and the production function F jt . Therefore, we focus
on the identification analysis under Assumption 5 although we should be careful in interpreting the
empirical result because the unobserved heterogeneity in revenue production function could partly
reflect in difference in demand elasticity.
3 Nonparametric identification
In this section, we establish the non-parametric identification of production functions with unob-
served heterogeneity using the second and third equations of (4) as an additional restriction. For
notational brevity, we drop the subscript i in this section and denote St = (Smt , S`t ). Note that, by
definition of S`t and Smt , we have Wt =S`tPM,tMt
Smt Ltand Yt =
PM,tMt
Smt PY,tso that the value of Wt and Yt
is known given (St, Xt) under Assumption 5. Therefore, we consider {St, Xt}Tt=1 as our data. Let
Zt := (St, Xt) ∈ S × X .
We first establish the nonparametric identification of model structures when J = 1 as follows.
Proposition 1. Suppose that J = 1 and Assumption 1-5 holds with T ≥ 3. Then, (a) θ1 :=
{gv(·), gεζ,t(·), GM,t(·), GL,t(·), PH,t}Tt=1 is uniquely determined from P({Zt}Tt=1). (b) θ2 := {{Ft(·)}Tt=2, h(·), gη(·)}is uniquely determined from P({Zt}Tt=1) and θ1.
Remark 1. Proposition 1 extends the identification result of GNR to the setting where Lit is
contemporaneously determined rather than predetermined.
When J ≥ 2, the distribution of {Zt}Tt=1 follows an J-term mixture distribution
P({Zt}Tt=1) =J∑j=1
πjPj1(Z1)T∏t=2
Pjt (Zt|{Zt−s}t−1s=1). (6)
Proposition 2. Suppose that Assumptions 1-5 hold. Then, the distribution of {Zt}Tt=1 defined in
(6) can be written as
P({Zt}Tt=1) =J∑j=1
πj
(Pj1(S1|X1)
T∏t=2
Pjt (St|Xt)
)×
(Pj1(X1)
T∏t=2
Pjt (Xt|Xt−1)
). (7)
Therefore, {Zt}Tt=1 follows a first order Markov process within subpopulation specified by type.
The result of Proposition 2 allows us to establish the nonparametric identification of {πj , {Pjt (Zt)}Tt=1}Jj=1
by extending the argument in Kasahara and Shimotsu (2009) and Hu and Shum (2012).
Assumption 7. Let Wt be the support of Wt. For every (z2, z3) ∈ Z2 ×Z3, there exists (z2, z3) ∈Z2 × Z3, (a1, ..., aJ) ∈ ZJ1 and (b1, ..., bJ−1) ∈ ZJ−1
4 such that (a) Lz3, Lz3, Lz2, and Lz2 defined
7
in (33) are nonsingular, (b) P j(Z3 = z3|Z2 = z2) 6= 0 and P j(Z3 = z3|Z2 = z2) 6= 0 hold for
j = 1, ..., J , and (c) all the diagonal elements of Dz2,z2,z3,z3 defined in (34) take distinct values.
Proposition 3. Suppose that Assumptions 1-5, and 7 hold and T ≥ 4. Then,
{πj ,Pj1(Z1), {Pjt (Zt|Zt−1)}Tt=2}Jj=1 is uniquely determined from P({Zt}Tt=1).
Remark 2. Under the additional assumption of the stationarity, i.e., Pjt (Zt|Zt−1) = Pj(Zt|Zt−1)
for t = 2, ..., T , Kasahara and Shimotsu (2009) establishes the nonparametric identification of the
model (7) when T = 6 while Hu and Shum (2013) shows that T = 4 suffices for identification.
Remark 3. Considering serially correlated continuous unobserved variables {X∗t }, Hu and Shum
(2013) analyze the nonparametric identification of the model
P({Zt}Tt=1) =
∫P1(Z1, X
∗1 )
T∏t=2
Pt(Zt, X∗t |Zt−1, X
∗t−1)d({X∗t }Tt=1).
Given the panel data {Zt}Tt=1 with T = 5, Theorem 1 and Corollary 1 of Hu and Shum (2013) state
that, under their Assumptions 1-4, P3(Z3, X∗3 ), P4(Z4, X
∗4 |Z3, X
∗3 ), and P5(Z5, X
∗5 |Z4, X
∗4 ) are non-
parametrically identified but the identification of P1(Z1, X∗1 ), P2(Z2, X
∗2 |Z1, X
∗1 ), and P3(Z3, X
∗3 |Z2, X
∗2 )
remains unresolved. Our Proposition 3 shows that, for a model in which unobserved heterogeneity
is discrete and finite, we can nonparametrically identify the type-specific distribution of {Zt}Tt=1
including the first two periods of the data from T = 4 periods of panel data without imposing
stationarity.
Remark 4. Assumption 7 assumes the rank condition of matrices Lz3, Lz3, Lz2, and Lz2 defined
in (33), of which elements are constructed by evaluating Pj4(Z4|Z3) and πjPj2(Z2|Z1)Pj1(Z1) at dif-
ferent points. These conditions are similar to the assumption stated in Proposition 1 of Kasahara
and Shimotsu (2009). Please refer to Remark 2 of Kasahara and Shimotsu (2009) for their inter-
pretations. One needs to find only one pair of values (Z2, Z3) ∈ Z2 ×Z3 and one set of J − 1 and
J points of Z1 and Z4 to construct nonsingular Lz3, Lz3, Lz2, and Lz2 for each (Z2, Z3) ∈ Z2×Z3
and these rank conditions are not stringent when Wt has continuous support. The identification
of P j4 (Z4|Z3 = Z3) and πjP j2 (Z2 = Z2|Z1)P j1 (Z1) at all other points of Z4 and Z1, respectively,
follows without any further requirement on the rank condition.
Once the type-specific distribution of {Zt} is identified, we may apply the argument in the proof
of Proposition 1 to prove the nonparametric identification for each type’s model structure.
Proposition 4. Suppose that Assumptions 1-5, and 7 hold and T ≥ 4. Then, (a) θ1 :=
{πj , gjv(·), {gjεζ,t(·), , GjM,t(·), G
jL,t(·), PH,t, ψ
jt }Tt=1}Jj=1 is uniquely determined from P({Zt}Tt=1). (b)
θ2 := {{{F jt (·)}Tt=1}Jj=2, hj(·), gjη(·)} is uniquely determined from P({Zt}Tt=1) and θ1.
Therefore, type-specific production functions as well as the distribution of unobserved variables
can be non-parametrically identified. In estimation, we focus our attention to the case where
type-specific function is given by Cobb-Douglas production function with random coefficients.
8
Example 1 (Random Coefficients Model). Consider a Cobb-Douglas production function with
time-varying random coefficients:
lnF jt (Xt) = βj0,t + βjk,tkt + βj`,t`t + βjm,tmt, (8)
where the intermediate and labor cost share equations are given by
lnSmt = ln(βjm,t) + lnEjt [eε]− εt,
lnS`t = ln(βj`,t) + ln(Ejt [e
ε]/Ejt [eζ ])− εt + ζt.
Under Assumptions 1-5, and 7, {πj , hj(·), gjη(·), gv(·), {βj`,t, βjm,t, g
jε,t(·), g
jζ,t(·), ψ
jt }4t=1, {β
j0,t, β
jk,t}
4t=2}
for j = 1, ..., J is nonparametrically identified from the panel data {St, Xt}4t=1.
In the appendix, we discuss the conditions under which Assumption 7 holds when the production
function is Cobb-Douglas. The following corollary shows that type-specific distribution of St can
be identified from the joint distribution of {St}Tt=1 for Cobb-Douglas specification.
Corollary 1. Suppose that Assumptions 1-5, and 7 hold and T ≥ 4. Suppose that production
function is Cobb-Douglas given by (8). Then, {πj , {Pjt (St)}Tt=1}Jj=1 is uniquely determined from
P({St}Tt=1).
4 Estimation of production function with random coefficients
We consider a random sample of N independent observations {{Yit, Xit, Sit,Wit}Tt=1}Ni=1 from the
J-component mixture model∑J
j=1 πjPjt ({Yit, Xit, Sit,Wit}Tt=1) =
∑Jj=1 π
jPjt ({Sit, Xit}Tt=1).
We impose the following parametric assumptions for estimation.
Assumption 8. (a) Assumption 1 holds with
Yit = F jt (Kit, Lit,Mit)eωit+εit with F jt (Kit, Lit,Mit) = exp(βj0,t + βjkkit + βj` `it + βjmmit). (9)
(b) Assumption 2 holds with
(εit
ζit
)∣∣∣∣∣Di = jd∼ N
((0
0
),
((σjε )2 ρjεζσ
jεσ
jζ
ρjεζσjεσ
jζ (σjζ)
2
)).
In (9), because logHit = ψjt + `it, the parameter βj0,t contains β`ψjt and captures the differ-
ence in the quality of workers across types. The normality assumption in Assumption 8(b) can
be potentially relaxed, for example, using the maximum smoothed likelihood estimator of finite
mixture models of Levine et al. (2011) in which the type-specific distribution of εit and ζit is non-
parametrically specified. Kasahara and Shimotsu (2015) develop a likelihood-based procedure for
testing the number of components in normal mixture regression models.
Denote the log values of (Yit, Lit,Kit,Mit, Smit , S
`it,Wit) by (yit, `it, kit,mit, s
mit , s
`it, wit) and let
sit := (s`it, smit ) and xit := (`it, kit,mit). Under Assumptions 3-5, 8, the first order conditions for the
9
expected profit maximization imply that
smit = lnβjm + 0.5(σjε )2 − εit and s`it = lnβj` + 0.5
{(σjε )
2 − (σjζ)2}− εit + ζit. (10)
We propose two different estimation procedures. The first procedure directly maximizes the log-
likelihood function of a finite mixture model of production functions under additional parametric
assumptions on the law of motion for kit and the initial distribution of (kit, ωit), where the likelihood
function is a parametric version of (7). Because the maximum likelihood estimator utilizes the
distributional information, it is consistent even when T is small as long as T ≥ 4. Our estimation
procedure follows the two-stage identification proof of Proposition 3. The EM algorithm can be
used to facilitates the computational complication of maximizing the log-likelihood function of the
finite mixture model.
In the second procedure, we first estimate the partial likelihood function of the input share
equations (10) under the normality assumption and use the posterior distribution of type probabil-
ities to classify each firm observation into one of the J types under the assumption that T → ∞.
This generates J data sets, where a firm’s production technology becomes increasingly homoge-
nous within each of the J data sets as T → ∞. In the second stage, we estimate the rest of the
type-specific parameters by using each of J data sets.1
The first procedure can consistently estimate the parameter even when T is small as long
as T ≥ 4 and N → ∞ but it is computationally more complicated and requires more auxiliary
parametric assumptions than the second one. We introduce the second procedure because it is
computationally much simpler than the first one although, when T is small, the second procedure
leads to a biased estimator due to misclassification of types.
4.1 Maximum likelihood estimator
We make the parametric distributional assumptions and develop parametric maximum likelihood
estimator.
Assumption 9. (a) T is fixed at T ≥ 4 and N →∞. (b) Assumption 2 holds with hj(ωit) = ρjωωit
so that
ωit = ρjωωit−1 + ηit, (11)
gjη(η) = φ(η/σjη)/σjη, and gjv(v) = φ(v/σjv)/σ
jv. (c) Assumption 3 holds with the additional assump-
tion that, conditional on being type j, kit given (kit−1, ωit−1) is normally distributed with mean
ρjk0 + ρjkkkit−1 + ρjkωωit−1 and variance (σjk)2 while the distribution of (ki1, ωi1) follows a bivariate
normal distribution with mean µj1 and variance Σj1.
1Note that the identification of production function immediately follows from T → ∞ without appealing toProposition 3 because, in principle, each firm’s production function can be identified from the time-series data ofeach firm.
10
Collect the model parameters into θ1, and θ2 as follows. Let
θ1 = (π′, θ11, ..., θ
J1 )′, θ2 = ((θ1
2)′, ..., (θJ2 )′)′, and θj = ((θj1)′, (θj2)′)′ where
θj1 = (βjm, βj` , (σ
jε )
2, (σjζ)2)′ and θj2 = (βj2, ..., β
jT , β
jk, (µ
j1)′, vech(Σj
1)′, ρjk0, ρjkk, ρ
jkω, σ
2k, ρ
jω, σ
jη)′.
In view of Proposition 2 and equation (10), we may write the probability density function of
{sit, xit}Tt=1 for type j as
fjt ({sit, xit}Tt=1) =
T∏t=1
fjt (sit; θj1)︸ ︷︷ ︸
=L1i(θj1)
× fj1(xi1; θj)
T∏t=2
fjt (xit|xit−1; θj)︸ ︷︷ ︸=L2i(θ
j1,θ
j2)
,(12)
where the exact expression for L1i(θj1) and L2i(θ
j2, θ
j1) is derived below.
Given the decomposition (12), we estimate the model by two-stage maximum likelihood estima-
tion procedure. In the first stage, we estimate π and θ1 by maximizing∑N
i=1 log(∑J
j=1 πjL1i(θ
j1))
over π and θ1. In the second stage, we estimate θ2 given the first stage estimate π and θ1 by
maximizing∑N
i=1 log(∑J
j=1 πjL1i(θ
j1)L2i(θ
j1, θ
j2)) over θ2.
From equation (10), we can compute εit and ζit as
ε∗(sit; θj1) := −smit + lnβjm + 0.5(σjε )
2 (13)
ζ∗(sit; θj1) := s`it − smit − ln(βj`/β
jm) + 0.5(σjζ)
2. (14)
In the first stage, we estimate θ1 by the maximum likelihood estimator given by
θ1 = argmaxθ1
N∑i=1
ln
J∑j=1
πjL1i(θj1)
with
L1i(θj1) :=
T∏t=1
1√1− (ρjεζ)
2σjεσjζ
φ
(ε∗(sit; θ
j1)
σjε
)φ
ζ∗(sit; θj1)− ρjεζ(σjζ/σ
jε )ε∗(sit; θ
j1)√
1− (ρjεζ)2σjζ
.
In the second stage, from (9), εt = E[smt |xt]− sm, and yt + smt = mt + ln(PM,t/PY,t), we have
ωit = ω∗t (mit, `it −mit, kit; θj) := (1− βjm − β
j` )mit − βjt − β
j` (`it −mit)− αjkkit, (15)
where βjt = β0,t − ln(PM,t/PY,t). Furthermore, because vit = wit − (ψjt + lnPH,t + ζ∗(sit; θj1)) and
s`it − smit = (wit + `it)− (lnPM,t +mit), we have
vit = v∗t (`it −mit; θj) := −(`it −mit + ψjt − ln(βj`/β
jm) + 0.5(σjζ)
2), (16)
where ψjt := ψjt + ln(PH,t/PM,t).
11
By the change of variables in equation (15), we can relate the density function of mit conditional
on `it − mit and kit to the density function of ωit, denoted by gω,t, as fjt (mit|`it − mit, kit) =
(1−βjm−βj` )gjω,t(ω
∗t (mit, `it−mit, kit; θ
j)). Similarly, we can relate the density function of `it−mit
to that of vit. Then, from (15)-(16) and Assumptions 2-3, we have
fj1(mi1|`i1 −mi1, ki1; θj) = (1− βjm − βj` )g
jω|k,1(ω∗i1(θj)|ki1), (17)
fjt (mit|`it −mit, kit, xit−1; θj) = (1− βjm − βj` )g
jη(η∗it(θ
j)) for t ≥ 2, (18)
fjt (`it −mit|kit, xit−1; θj) = fjt (`it −mit|kit; θj) = gjv(v∗it(θ
j)) for t ≥ 1, (19)
fjt (kit|xit−1; θj) = gjk,t(kit|kit−1, ω∗i,t−1(θj)) for t ≥ 2, (20)
where gjω|k,1(ωi1|ki1) is the density function of ωi1 conditional on ki1, gjk,t(kit|kit−1, ωit−1) is the
density function of kit given (kit−1, ωit−1), ω∗it(θj) := ω∗t (mit, `it −mit, kit; θ
j), v∗it(θj) := v∗t (`it −
mit; θj), and
η∗it(θj) := ω∗it(θ
j)− ρjωω∗i,t−1(θj). (21)
Therefore, under Assumption 9, it follows from (12) and (17)-(20) that
L2i(θj) = fj1(mi1|`i1 −mi1, ki1; θj)fj1(`i1 −mi1|ki1; θj)fj1(ki1; θj)
×T∏t=2
fjt (mit|`it −mit, kit, xit−1; θj)fjt (`it −mit|kit, xit−1; θj)fjt (kit|xit−1; θj)
=
(T∏t=1
gjv(v∗it(θ
j))
)×
((1− βjm − β
j` )T gjωk,1(ω∗i1(θj), ki1)
T∏t=2
gjη(η∗it(θ
j))gjk,t(kit|kit−1, ω∗i,t−1(θj))
),
where
gjv(v∗it(θ
j)) =1
σjvφ
(v∗it(θ
j)
σjv
), gjη(η
∗it(θ
j)) =1
σjηφ
(η∗it(θ
j)
σjη
),
gjωk,1(ω∗i1(θj), ki1) = (2π)−3/2|Σj1|−1/2 exp
(−1
2
((ki1
ω∗i1(θj)
)− µj1
)′(Σj
1)−1
((ki1
ω∗i1(θj)
)− µj1
)),
gjk,t(kit|kit−1, ω∗i,t−1(θj)) =
1
σjkφ
(kit − (ρjk0 + ρjkkkit−1 + ρjkωωit−1)
σjk
).
Given the first stage estimate θ1, the parameter π and θ2 can be estimated by maximizing the
log-likelihood function as
(π, θ2) = argmaxπ,θ2
N∑i=1
log
J∑j=1
πjL1i(θj1)L2i(θ
j1, θ
j2)
.
In practice, we use EM algorithm to estimate θ1, θ2, and π as discussed in the Appendix.
12
4.2 Estimation by classifying each observation into one of the J types
Given the first stage estimate θ1, define the posterior probability of being type j for each firm i by
πji =πjL1i(θ
j1;T )∑J
k=1 πkL1i(θk1 ;T )
for j = 1, ..., J , (22)
where we explicitly write the dependence of the likelihood on the length of panel data T in
L1i(θk1 ;T ). We classify each firm into one of the J types by taking the type that gives the highest
posterior probability as its type. Then, for each i, our estimator of Di is given by
Di = argmaxj=1,..,J
{πji }.
Denote the true value of θj1 by θj∗1 . We assume that T → ∞ but require that T goes to ∞ at
much slower rate than N .
Assumption 10. N,T →∞ and√N
exp(ajT )/√T→ 0 for j = 1, ..., J , where aj = mink 6=j E[lnL1it(θ
j∗1 )−
lnL1it(θk∗1 )|i ∈ Ij ] > 0.
Proposition 5. For each i ∈ Ij, πji − 1 = op(N−1/2) under Assumption 10.
Proposition 5 implies that, when Assumption 10 holds, the possible classification error across
types does not affect our inference.
In the second stage, we compute the estimate of ηjit for t = 2, ..., T for each a candidate value of
θj2 given the first stage estimate θj1 as in (21) using the subsample of firms for which Di = j. Then,
stacking the moment conditions implied by E[η∗it(θj1, θ
j2)|kit, xit−1] = 0 for t = 2, ..., T , we can use
standard GMM procedure to estimate θj2 as
θj2 = argminθ2
1
#{i : Di = j}
∑i∈{i:Di=j}
gi(θ2)
1
#{i : Di = j}
∑i∈{i:Di=j}
gi(θ2)
′ for j = 1, ..., J ,
where #{i : Di = j} is the number of firms with Di = j while gi(θ2) := (η∗i2(θj1, θj2)Zi2(θj1, θ
j2)′, ...,
η∗iT (θj1, θj2)ZiT (θj1, θ
j2)′)′ with Zit(θ
j1, θ
j2) := (1, kit, ω
∗it−1(θj1, θ
j2))′.
5 Empirical applications
5.1 Data
We use Japanese publicly traded manufacturing firms, 1980-2007. The data set compiled by the
Development Bank of Japan (DBJ) contains detailed corporate balance sheet/income statement
data for the firms listed on the Tokyo Stock Exchange.2 The initial value of capital (K) is defined
2Because firm’s financial data do not necessarily refer to a calendar year, we assign year t to an observation if thegiven firm’s closing date is between June of year t and May of year t + 1. If firms change their closing dates, the
13
as fixed asset less land from the firm’s balance sheet and the subsequent values of capital are
constructed by perpetual inventory method. The labor input (L) is the number of employees. The
intermediate input (M) is defined as the sum of energy input, material input, transportation cost,
outsourcing cost, and changes in input inventories. The output (Y ) is defined as the value of total
sales plus the changes in inventories of finished goods. The machine investment rate (Im,itKm,it
) is
defined as the ratio of machine investment to machine capital stock. In this preliminary version,
we focus on a sample from Machine industry. Table 1 presents summary statistics for the variables
we use in our empirical analysis.
Table 1: Summary statistics
Statistic N Mean St. Dev. Min Max
lnYit 5602 17.108 1.368 12.191 21.785lnMit 5602 16.314 1.472 9.003 21.306lnLit 5602 6.647 1.189 2.890 10.978lnKit 5602 15.926 1.415 12.223 21.328
lnPM,tMit
PY,tYit5602 -0.777 0.510 -6.930 1.708
IitKit
5602 0.100 0.151 -0.491 2.849
5.2 Evidence for unobserved heterogeneity
In the data, the material share is heterogenous across firms and persistent over time within firm.
Figure 1 presents the histogram ofPM,tMit
PY,tYitacross all observations that belongs to Machine industry,
which shows a large variation in material shares. In the model, the variation in material shares is
coming from idiosyncratic ex-post shocks εit. We may eliminate most of idiosyncratic components
by considering the firm-level average of material shares over 28 years; however, in Figure 2, the
persistent component of the ratio of intermediate inputs to total sales substantially varies across
firms. As shown in Figures 3 and 4, we also observe a large variation in the persistent component
in the ratio of intermediate cost to the sum of intermediate cost and total wage bills,PM,tMt
PM,tMit+WtLit,
of which variation is not likely to be driven by a variation in markups.
Figure 5 plots each firm’s material share, output, and inputs from 1980 to 2007. The presence
of heterogeneity across firms and the persistence within each firm in materials shares are apparent
in the upper left panel of Figure 5. It also appears that labour and capital inputs are changing over
time more smoothly than material input, suggesting that material input responds to idiosyncratic
shocks more than labour and capital inputs do within a short period of time. As shown in Figure
6, the heterogeneity in material shares across firms do not disappear even when we examine firms
within the subindustries of Machine industry, which roughly corresponds to 4-digit ISIC.
data after the change may refer to less than 12 months. When it occurs, we multiply the data xit by 12/m where mrepresents the number of months to which the data refer.
14
In view of the intermediate share equation in (4), a large cross-sectional variation in the per-
sistent component of the ratio of intermediate inputs to total sales suggests either heterogeneity in
production function or the persistence in inputs over time. To examine further, we regress lnSit
on the third order polynomials of (lit, kit,mit) to get residuals, denoted by eit, and decompose eit
into permanent components and idiosyncratic components as ξi := T−1∑T
t=1 eit and ζit := eit− ξi.Comparing the variance of ξi with that of ζit, we found that Var(ξi)
Var(ξi)+Var(ζit)= 0.612. Therefore, the
majority of variation is coming from the permanent component even after controlling the observed
input (lit, kit,mit), suggesting that production function is heterogeneous beyond Hicks-neutral term.
We also estimated the value added Cobb-Douglas specification
yvait = αvat + αva` `it + αvak kit + ωvait + εvait ,
by the approach developed by Levinsohn and Petrin, where yvait is the logarithm of value added.
When we compute the serial correlation of estimated values of εvait , we found that the correlation
coefficient of 0.85.3 One possible reason for this high correlation of estimated values of εvait is the
presence of unobserved heterogeneity in (αvat , αva` , α
vak ).
5.3 Estimation of production function
Given the relatively long length of our panel data, we apply our proposed estimation method based
on classifying each firm into one of the J types. Table 2 presents the parameter estimates for the
number of components equal to J = 1, 3, and 5. Setting J = 1 gives the homogenous production
function specification considered by GNR.
The estimated coefficients across different types when J = 3 and 5 suggest that there are
substantial differences in the output elasticities with respect to materials, labor, and capital across
firms. For the model with J = 3, the material share is lowest for Type 1 and highest for Type 3
while Type 1 is more labor intensive than Type 2 or 3. For the model with J = 5, the material
share of Type 1 is the highest while the material share of Type 2 is the lowest among three types.
The degree of capital intensity is also different across five types, where Type 5 is the most capital
intensive while Type 2 is the most labor intensive.
Figure 7 shows the distribution of posterior type probabilities, defined in (22), across firms for
the model with J = 3 and 5, respectively. The posterior probabilities for each type are concentrated
on around 0 or 1, which is consistent with the result of Proposition 5 where Assumption 10 could
be roughly applied here given T = 28 in our data set. We assign one of the J types to each firm
based on its posterior type probability that achieves the highest value across J types.
Figure 8 plots each firm’s material share and the log of output from 1980 to 2007, where different
colours represent different types for the model with J = 3. From the left panel of Figure 8, it is
clear that each firm’s type is identified with its average material share. On the other hand, it does
3Using the OP approach with the value-added specification, Fox and Smeets (2007) also report the high serialcorrelation of estimated values of idiosyncratic shocks.
15
Table 2: Estimates of Production Function (9): Machine Industry in Japan, 1980-2008
Estimation by ClassificationGNR Random Coefficients ModelJ = 1 J = 3 J = 5
Type 1 Type 2 Type 3 Type 1 Type 2 Type 3 Type 4 Type 5
βjm 0.340 0.623 0.184 0.438 0.112 0.642 0.387 0.267 0.509
βj` 0.422 0.215 0.845 0.416 0.954 0.200 0.577 0.611 0.357
βjk 0.260 0.162 0.134 0.195 0.097 0.154 0.089 0.230 0.162
βjm + βj` + βjk 1.021 1.000 1.163 1.049 1.163 0.995 1.054 1.109 1.028
βjk/βj` 0.617 0.753 0.159 0.470 0.102 0.772 0.155 0.376 0.455
πj 1.000 0.467 0.200 0.333 0.082 0.373 0.155 0.113 0.277
No. of Obs. 5602No. of firms 240
not appear that there is any systematic difference across types in terms of the distribution of firm
sizes measured by outputs.
Table 3 shows a fraction of firms belonging to each type of the J types within subindustries
of Machine industry for the model with J = 3 and 5. See Figure 6 for the definition of subindus-
tries. The distribution of types is quite different across subindustries. Figure 9 plots each firm’s
material share from 1980 to 2007 where different colours identify firm’s type for the model with
J = 3, suggesting that our framework flexibly captures the unobserved heterogeneity in production
technology within more narrowly defined subindustries.
The first two rows of Table 4 present the standard deviations of ωit + αjt and εit. To compute
the standard deviations of ωit + αjt and εit, we compute ωit, ξi, and εit by assigning one of the
J = 3 types to each firm based on its posterior type probabilities. As the number of components
J increases from 1 to 3, and then to 5, the standard deviations of ωit + αjt and εit within each
type decrease on average across types. This indicates a possibility of substantial upward bias in
the estimated variation in ex-post shocks in homogenous model with J = 1 as a result of ignoring
unobserved heterogeneity.
The third row of Table 4 reports the serial correlation in εit. The serial correlation in εit is very
high at 0.951 when J = 1. Given that the presence of high serial correlation indicates a possibility
of misspecificaiton of the model, a smaller value of serial correlation is more desirable. When the
number of components increases from J = 1 to J = 3, and then to J = 5, the average serial
correlation in εit across types decreases from 0.951 to 0.762, and then to 0.693, indicating that the
very high serial correlation in εit when J = 1 is partly due to ignoring unobserved heterogeneity
in production function coefficients. On the other hand, the level of serial correlation in εit is still
high at 0.693 when J = 5. To examine this issue, we also consider a more flexible model in which
the material share parameter is firm-specific. In this case, the material share equation is given by
firm-fixed effects model: sit = αm,i − εit. The estimated serial correlation in εit for this firm-fixed
16
Table 3: A fraction of firms for each type by subindustry for the model with J = 3 and 5
J = 3 J = 5Type 1 Type 2 Type 3 Type 1 Type 2 Type 3 Type 4 Type 5
2511 0.890 0.000 0.110 0.000 0.890 0.110 0.000 0.0002521 0.734 0.122 0.144 0.045 0.614 0.000 0.077 0.2642522 0.599 0.288 0.114 0.220 0.379 0.000 0.068 0.3332523 0.120 0.776 0.104 0.000 0.120 0.104 0.776 0.0002529 0.000 0.417 0.583 0.000 0.000 0.583 0.417 0.0002531 0.564 0.178 0.258 0.000 0.195 0.000 0.178 0.6282532 0.231 0.391 0.378 0.338 0.102 0.120 0.053 0.3872533 0.481 0.000 0.519 0.000 0.481 0.193 0.000 0.3262534 0.762 0.140 0.098 0.140 0.738 0.000 0.000 0.1222535 0.377 0.449 0.174 0.247 0.237 0.130 0.154 0.2332536 0.506 0.090 0.404 0.008 0.364 0.138 0.082 0.4082537 0.462 0.120 0.418 0.071 0.399 0.165 0.049 0.3162541 0.154 0.215 0.631 0.046 0.133 0.471 0.170 0.181
Notes: Subindustries are 2511: Boiler prime mover, 2521: Metal machine tools, 2522: Metalworking machinery, 2523: Machinery tool, 2531: Textile machinery, 2532: Agricultural machines,2533: Construction and mining equipment, 2534: Chemical machinery, 2535: Office machinery,2536: Special industrial machinery, 2537: General industrial machinery, 2541: General MechanicalComponents.
17
effects model remains quite strong at 0.773, suggesting that the material share parameter could
change over time persistently within the same firm.
Ignoring unobserved heterogeneity may lead to substantial biases in the measurement of produc-
tivity growth. To examine this issue, we take a specification with J = 5 as the true model and com-
pute the bias in the measurement of productivity growth when we use a misspecified model with J =
1. Specifically, let ∆ωit := ∆yit−(αjt+αjm∆mit+α
j`∆`it+α
jk∆kit+∆εjit) for j = 1, 2, ..., 5 be an esti-
mated productivity growth when J = 5 and let ∆ωit := ∆yit−(αt+αm∆mit+α`∆`it+αk∆kit+∆εit)
be an estimated productivity growth when J = 1, where {αjt , αjm, α
j` , α
jk}
5j=1 and {αt, αjm, αj` , α
jk}
denote estimated coefficients when J = 5 and J = 1, respectively. Then, we compute the bias as
∆ωit = ∆ωit + (βm − βjm)∆mit + (β` − βj` )∆`it + (βjk − βjk)∆kit + (∆εit −∆εjit)︸ ︷︷ ︸
:=Biasit
.
Table 5 reports the ratio of the average absolute value of bias to the average productivity growth
within each of five subsamples, classified by types. The magnitude of the bias is high on average and
substantially different across different types. In particular, the bias in the measured productivity
growth when J = 1 is larger than 40 percent of the average productivity growth for Type 1
and 2, indicating that ignoring unobserved heterogeneity could result in serious bias in estimated
productivity growth.
As an example of using the estimated productivity growth in empirical analysis, we now ex-
amine whether unobserved heterogeneity captured by type-specific production function parameter
is important for investment decision. Specifically, for each of subsample classified by types, we
estimate the following linear investment model
IitKit
= α0 + αωωit + quadratic of `it and kit + ζit,
where Iit/Kit denotes the ratio of investment to capital stock.
Table 6 reports the estimated coefficients of ωit across different specifications and different types
for J = 1, 3, and 5. The coefficient of ωit is estimated significantly at 0.016 when J = 1. For the
model with J = 3 and 5, the estimated coefficients of ωit are substantially different across different
types of firms. For J = 3, the coefficient of ωit for Type 2 is insignificant at -0.007 while the
coefficients of ωit are estimated significantly at 0.090 and 0.068 for Type 1 and 3, respectively. As
reported in the second and third rows of Table 6, the material shares of Type 1 and 3 are higher
than that of Type 2 while Type 1 and 3 is more capital intensive than Type 2 for the model with
J = 3. For J = 5, the coefficients of ωit are high at 0.094 and 0.083 for Type 2 and 5, respectively,
both of which have relatively higher material shares and more capital intensive technology than
other three types. Therefore, we find that the correlation between productivity and investment
is stronger among a type of firms with high material shares and high capital intensive production
technology than other types of firms.
18
Tab
le4:
Ran
dom
Coeffi
cien
tsM
od
el(9
):M
ach
ine
Ind
ust
ryin
Jap
an,
1980
-200
8
Est
imat
ion
byC
lass
ifica
tion
J=
1J=
3J=
5Type1
Type2
Type3
Ave†
Type1
Type2
Type3
Type4
Type5
Ave†
Std.Dev.ωit+αj t
0.472
0.195
0.620
0.241
0.295
0.895
0.185
0.307
0.359
0.190
0.283
Std.Dev.ε it
0.510
0.138
0.703
0.178
0.264
1.021
0.126
0.170
0.236
0.130
0.220
Corr(ε it,ε it−
1)
0.951
0.663
0.923
0.806
0.762
0.924
0.602
0.733
0.859
0.658
0.693
†Thecolumnsunder
“Ave.”
reporttheaverageacross
types
usingtheestimated
mixingproportion
sπjas
weigh
ts.
19
Table 5: Bias in the measurement of productivity: Machine Industry in Japan, 1980-2008
Estimation by ClassificationJ = 5
Type 1 Type 2 Type 3 Type 4 Type 5Mean of |Biasit|Mean of |∆ωit| 0.403 0.453 0.170 0.123 0.257
Table 6: The Effect of ωit on Investment
by Classification
J = 1 J = 3 J = 5Type 1 Type 2 Type 3 Type 1 Type 2 Type 3 Type 4 Type 5
αω 0.016 0.090 -0.007 0.068 -0.016 0.094 0.054 0.010 0.083( 0.004 ) ( 0.018 ) ( 0.006 ) ( 0.014 ) ( 0.007 ) ( 0.022 ) ( 0.014 ) ( 0.013 ) ( 0.020 )
βjm 0.340 0.623 0.184 0.438 0.112 0.642 0.387 0.267 0.509
βjk/βj` 0.617 0.753 0.159 0.470 0.102 0.772 0.155 0.376 0.455
Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01
References
[1] Ackerberg, Daniel, C. Lanier Benkard, Steven Berry, and Ariel Pakes (2007) “Econometric
Tools For Analyzing Market Outcomes.” In Handbook of Econometrics, vol. 6, edited by
James J. Heckman and Edward E. Leamer. Amsterdam: Elsevier, 4171-4276.
[2] Ackerberg, Daniel A., Caves, Kevin, and Garth Frazer (2006) “Structural Identification of
Production Functions,” working paper, UCLA.
[3] Arellano, M. and S. Bond (1991) “Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations,” The Review of Economic Studies
58: 277- 297
[4] Blundell, R. and Bond, S. (1998) “Initial Conditions and Moment Restrictions in Dynamic
Panel Data Models,” Journal of Econometrics, 87: 115-143
[5] Blundell, R.W. and S.R. Bond (2000) “GMM Estimation with Persistent Panel Data: an
Application to Production Functions,” Econometric Reviews, 19(3): 321-340.
[6] Bond, Stephen and Mans Sderbom (2005) “Adjustment Costs and the Identification of Cobb
Douglas Production Functions,” The Institute for Fiscal Studies, Working Paper Series No.
05/4.
[7] Basu, Susanto, and John G. Fernald (1997) “Returns to Scale in U.S. Production: Estimates
and Implications,” Journal of Political Economy, 105(2): 249-283.
20
[8] Doraszelski, Ulrich, and Jordi Jaumandreu (2014) “Measuring the Bias of Technological
Change,” working paper, University of Pennsylvania.
[9] Gandhi, A., S. Navarro, and D. Rivers (2013) “On the Identification of Production Functions:
How Heterogeneous id Productivity?,” working paper, Western University.
[10] Griliches, Z. and J. Hausman (1986) “Errors in Variables in Panel Data,” Journal of Econo-
metrics, 31:
[11] Griliches, Zvi and Jacques Mairesse (1998)“Production Functions: The Search for Identifica-
tion.” In Econometrics and Economic Theory in the Twentieth Century: The Ragnar Frisch
Centennial Symposium. New York: Cambridge University Press.
[12] Hsieh, Chang-Tai and Peter J. Klenow (2009) “Misallocation and Manufacturing TFP in
China and India.” Quarterly Journal of Economics 124 (4):1403-1448.
[13] Hu, Yingyao and Matthew Shum (2012) “Nonparametric identification of dynamic models
with unobserved state variables,” Journal of Econometrics, 171(1): 32-44.
[14] Kasahara, Hiroyuki and Joel Rodrigue (2008)“Does the Use of Imported Intermediates In-
crease Productivity? Plant-level Evidence.” Journal of Development Economics 87 (1):106-
118.
[15] Kasahara, Hiroyuki and Katsumi Shimotsu(2009) “Nonparametric Identification of Finite
Mixture Models of Dynamic Discrete Choices,” Econometrica, 77: 135–175.
[16] Kasahara, Hiroyuki and Katsumi Shimotsu (2015) “Testing the Number of Components in
Normal Mixture Regression Models,” Journal of the American Statistical Association (Theory
and Methods), forthcoming.
[17] Levine, M., Hunter, D. R., and Chauveau, D. (2011). “Maximum smoothed likelihood for
multivariate mixtures.” Biometrika, 98, 403-416.
[18] Levinsohn, James and Amil Petrin (2003) “Estimating Production Functions Using Inputs
to Control for Unobservables,” Review of Economic Studies, 70: 317-341.
[19] Marschak, J. and Andrews, W.H. (1944) “Random Simultaneous Equations and the Theory
of Production,” Econometrica, 12(3,4): 143-205.
[20] Mairesse, Jacques, and Zvi Griliches (1990) “Heterogeneity in Panel Data: Are There Stable
Production Functions?” In Essays in Honor of Edmond Malinvaud, Volume 3: Empirical
Economics, edited by Paul Champsaur et al., pp. 192-231, Cambridge, MA: MIT Press,
[21] Olley, S. and Pakes, A. (1996) “The Dynamics of Productivity in the Telecommunications
Equipment Industry,” Econometrica, 65(1): 292-332.
21
[22] Pavcnik, Nina (2002) “Trade Liberalization, Exit, and Productivity Improvements: Evidence
From Chilean Plants,” Review of Economic Studies, 69(1): 245-276.
[23] Van Biesebroeck, Johannes (2003) “Productivity Dynamics with Technology Choice: An
Application to Automobile Assembly,” Review of Economic Studies 70: 167-198.
A Appendix
A.1 Proof of Proposition 1
We drop the superscript j in this proof because J = 1. Let (s`t, smt ) = (lnS`t , lnS
mt ) and let
∆st := s`t − smt . From the definition of S`t and Smt , let lnWt(st, Xt) := ∆st + ln(Mt/Lt) + lnPM,t
and lnYt(st, Xt) := lnMt − smt + ln(PM,t/PY,t). Denote the density functions of (smt ,∆st, Xt) by
pt(smt ,∆st, Xt), which can be identified from Pt(St, Xt). Because E[smt |Xt] = ln (GM,t(Xt)Et[e
ε])
and E[∆st|Xt] = ln(GL,t(Xt)Et[e
ζ ])− ln (GM,t(Xt)Et[e
ε]), we have smt = E[smt |Xt]− εt and ∆st =
E[∆st|Xt] − ζt. Then, we may identify gεζ,t(·) as gεζ,t(ε, ζ) =∫pt(E[smt |Xt = X] − ε, E[∆st|Xt =
X] − ζt, X)dX. Similarly, because vt = lnWt(st, Xt) − E[lnWt(st, Xt)] − ζt = lnWt(st, Xt) −E[lnWt(st, Xt)]−(E[∆st|Xt]−∆st), we may identify gv(v) from the density function of (st, Xt). Fur-
thermore, from E[smt |Xt] = lnGM,t(Xt) + ln∫eεgε,t(ε)dε, we may identify GM,t(Xt) as GM,t(Xt) =
exp(E[smt |Xt]− ln
∫eεgε,t(ε)dε
). Similarly, GL,t(Xt) = exp
(E[∆st|Xt]− ln
∫eεgε,t(ε)dε+ ln
∫eζgζ,t(ζ)dζ
).
This proves part (a).
We proceed to prove part (b). Fix (L0,M0) ∈ L×M such that L0 < Lt and M0 < Mt. BecauseGL,t(Xt)
Lt= ∂ lnFt(Xt)
∂Ltand
GM,t(Xt)Mt
= ∂ lnFt(Xt)∂Mt
, we have
lnFt(Kt, Lt,Mt) =
∫ Lt
L0
GL,t(Kt, L,Mt)
LdL+
∫ Mt
M0
GM,t(Kt, L0,M)
MdM + lnFt(Kt, L0,M0). (23)
It follows from (1), (23), εt = E[smt |Xt]− sm, and lnYt + smt = lnMt + ln(PM,t/PY,t). that
ωt = yt(Xt; θ1)− lnFt(Kt, L0,M0), where (24)
yt(Xt; θ1) := lnMt + ln(PM,t/PY,t)−{∫ Lt
L0
GL,t(Kt, L,Mt)
LdL+
∫ Mt
M0
GM,t(Kt, L0,M)
MdM − E[smt |Xt]
}.
Substituting the right-hand side of (24) to ωt = h(ωt−1) + ηt and rearranging terms give
yt(Xt; θ1) = lnFt(Kt, L0,M0) + h (yt−1(Xt−1; θ1)− lnFt−1(Kt−1, L0,M0)) + ηt, (25)
where the second term on the right hand side only depends on Xt−1. Fix K0 ∈ K and let Ct :=
lnFt(K0, L0,M0). Then, from (25) and E[ηt|It−1] = 0, lnFt(Kt, L0,M0) is identified up to constant
Ct as
lnFt(Kt, L0,M0) = Ct + E[yt(Xt; θ1)|Kt, Xt−1]− E[yt(Xt; θ1)|Kt = K0, Xt−1]. (26)
22
It follows from the moment restriction E[ωt] = 0 with (24) and (26) that we may identify Ct as
Ct = E {yt(Xt; θ1)− E[yt(Xt; θ1)|Kt, Xt−1] + E[yt(Xt; θ1)|Kt = K0, Xt−1]} .
Therefore, lnFt(Kt, L0,M0) is identified from (26), and the identification of lnFt(Lt,Kt,Mt) for
t ≥ 2 follows from (23) given that the first two terms on the right hand side of (23) is identified
from and GL,t(Xt) and GM,t(Xt).
Finally, we prove the identification of gη(·) and h(·). Because ωt = yt(Xt; θ1)−lnFt(Kt, L0,M0),
we may identify the joint density function of ωt and ωt−1, denoted by pω(ωt, ωt−1), from the joint
distribution of (Xt, Xt−1) for t ≥ 3. Then, h(ωt−1) is identified as h(ωt−1) = Et[ωt|ωt−1] =∫ωtpω(ωt|ωt−1)dωt, where pω(ωt|ωt−1) = pω(ωt, ωt−1)/
∫pω(ωt, ωt−1)dωt, while the density function
of ηt is identified as gη(η) =∫pω(h(ωt−1) + η, ωt−1)dωt−1. This proves part (b).
A.2 Proof of Proposition 2
The distribution of {St, Xt}Tt=1 for type j is given by
Pjt ({St, Xt}Tt=1) = Pj1(S1, X1)
T∏t=2
Pjt (St, Xt|{St−s, Xt−s}t−1s=1)
= Pj1(S1|X1)Pj1(X1)
T∏t=2
Pjt (St|Xt, {St−s, Xt−s}t−1s=1)Pjt (Xt|{St−s, Xt−s}t−1
s=1).
(27)
In view of the second and the third equations of (4), we have
Pjt (St|Xt, {St−s, Xt−s}t−1s=1) = Pjt (St|Xt). (28)
Furthermore,
Pjt (Xt|{St−s, Xt−s}t−1s=1) = Pjt (Kt, ωt, vt|{St−s,Kt−s, ωt−s, vt−s}t−1
s=1)
= Pjt (ωt, vt|Kt, {St−s,Kt−s, ωt−s, vt−s}t−1s=1)Pjt (Kt|{St−s,Kt−s, ωt−s, vt−s}t−1
s=1)
= Pjω(ωt|ωt−1)Pjv(vt)Pjt (Kt|Kt−1, ωt−1)
= Pjt (Kt, ωt, vt|Kt−1, ωt−1)
= Pjt (Kt, ωt, vt|Kt−1, ωt−1, vt−1)
= Pjt (Xt|Xt−1),
(29)
where the first equality and the last equality hold because there is a one-to-one mapping between
Xt and (Kt, ωt, vt) in view of Assumption 4(b), the third equality follows from Assumptions 2 and
3, the second to the last equality holds because vt is i.i.d.. Therefore, the stated result follows from
(27)-(29).
23
A.3 Proof of Proposition 3
We apply the argument of Kasahara and Shimotsu (2009) and Hu and Shum (2012) under the
assumption that unobserved heterogeneity is permanent and discrete. The proof is constructive.
Consider the case that T = 4. Fix (Z2, Z3) at (Z2, Z3) and choose (Z2, Z3) ∈ Z2 × Z3,
(a1, ..., aJ) ∈ ZJ1 and (b1, ..., bJ−1) ∈ ZJ−14 that satisfy Assumption 7. Evaluating (7) at (Z2, Z3) =
(Z2, Z3) gives
P ({Zt}4t=1) =J∑j=1
πjP j4 (Z4|Z3)P j3 (Z3|Z2)P j2 (Z2|Z1)P j1 (Z1)
=
J∑j=1
λj4(Z4|Z3)λj3(Z3|Z2)λj2(Z1, Z2),
(30)
where λj4(Z4|Z3) := P j4 (Z4|Z3 = Z3), λj3(Z3|Z2) := P j3 (Z3 = Z3|Z2 = Z2), and λj2(Z1, Z2) :=
πjP j2 (Z2 = Z2|Z1)P j1 (Z1). Integrating out Z4 from (30) gives
P ({Zt}3t=1) =J∑j=1
λj3(Z3|Z2)λj2(Z1, Z2). (31)
Let fZ2,Z3(a, b) := P ((Z1, Z2, Z3, Z4) = (a, Z2, Z3, b)) and fZ2,Z3(a) := P ((Z1, Z2, Z3) = (a, Z2, Z3)).
Evaluating (30) at Z1 = a1, ..., aJ and Z4 = b1, ..., bJ−1 gives M(M − 1) equations while evaluating
(31) at Z1 = a1, ..., aJ gives M equations. Collecting these M(M − 1) + M = M2 equations and
denoting them using matrix notation, we have
PZ2,Z3 = L′z3DZ3|Z2Lz2 , (32)
where
PZ2,Z3 :=
fZ2,Z3(a1) fZ2,Z3(a2) · · · fZ2,Z3(aJ)
fZ2,Z3(a1, b1) fZ2,Z3(a2, b1) · · · fZ2,Z3(aJ , b1)...
... . . ....
fZ2,Z3(a1, bJ−1) fZ2,Z3(a2, b1) · · · fZ2,Z3(aJ , bJ−1)
,
Lz3 :=
1 λ1
4(b1|Z3) · · · λ14(bJ−1|Z3)
...... . . .
...
1 λJ4 (b1|Z3) · · · λJ4 (bJ−1|Z3)
, Lz2 :=
λ1
2(a1, Z2) · · · λ12(aJ , Z2)
...... . . .
λJ2 (a1, Z2) · · · λJ2 (aJ , Z2)
,(33)
and DZ3|Z2:= diag
(λ1
3(Z3|Z2), ..., λJ3 (Z3|Z2)). Evaluating (32) at four different points, (Z2, Z3),
(Z2, Z3), (Z2, Z3), and (Z2, Z3) gives
PZ2,Z3 = L′z3DZ3|Z2Lz2 , PZ2,Z3
= L′z3DZ3|Z2Lz2 ,
PZ2,Z3= L′z3DZ3|Z2
Lz2 , PZ2,Z3= L′z3DZ3|Z2
Lz2 .
24
Then, under Assumption 7,
A := PZ2,Z3(PZ2,Z3)−1PZ2,Z3
(PZ2,Z3)−1 = L′z3DZ2,Z2,Z3,Z3
(L′z3)−1,
where
DZ2,Z2,Z3,Z3:= DZ3|Z2
(DZ3|Z2)−1DZ3|Z2
(DZ3|Z2)−1. (34)
Because AL′z3 = L′z3DZ2,Z2,Z3,Z3, the eignvalues of A determine the diagonal elements of
DZ2,Z2,Z3,Z3while the right eigenvectors of A determine the columns of L′z3 up to multiplicative
constant. Denote the right eigenvectors of A by L′z3C, where C is some diagonal matrix. Now we
can determine the diagonal matrix DZ2,Z2,Z3,Z3C from the first row of AL′z3C = L′z3DZ2,Z2,Z3,Z3
C
because the first row of L′z3 is a vector of ones. Then, L′z3 is determined uniquely from AL′z3C and
DZ2,Z2,Z3,Z3C as L′z3 = (AL′z3C)(DZ2,Z2,Z3,Z3
C)−1 in view of AL′z3 = L′z3DZ2,Z2,Z3,Z3. Therefore,
Lz3 is identified. Repeating the above argument for all values of Z3 ∈ Z3 identifies {P j4 (Z4|Z3 =
Z3)}Jj=1 for each Z3 ∈ Z3 for Z4 = (b1, ..., bJ−1) that satisfies Assumption 7(a).
Evaluating P (Z4, Z3|Z2) at (Z2, Z3) = (Z2, Z3), we have
P (Z4, Z3 = Z3|Z2 = Z2) =
J∑j=1
πjZ2P j4 (Z4|Z3)P j3 (Z3|Z2) =
J∑j=1
λj4(Z4|Z3)λj3(Z3|Z2), (35)
where πjZ2:=
πjP j2 (Z2=Z2)P2(Z2=Z2) and λj3(Z3|Z2) := πjZ2
P j3 (Z3 = Z3|Z2 = Z2). Then, evaluating (35)
at Z4 = b1, ..., bJ−1 and collecting them into a vector together with P (Z3 = Z3|Z2 = Z2) =∑Jj=1 λ
j3(Z3|Z2) gives
pZ3|Z2= L′z3dZ3|Z2
,
where dZ3|Z2= (λ1
3(Z3|Z2), ...., λJ3 (Z3|Z2))′ and pZ3|Z2= (P (Z3 = Z3|Z2 = Z2), P ((Z4, Z3) =
(b1, Z3)|Z2 = Z2), ..., P ((Z4, Z3) = (bJ−1, Z3)|Z2 = Z2))′. Therefore, we uniquely determine
πjZ2P j3 (Z3 = Z3|Z2 = Z2) from dZ3|Z2
= (L′z3)−1pZ3|Z2. Repeating the above argument across
all possible values of (Z2, Z3) ∈ Z2 × Z3 determines the value of πjZ2P j3 (Z3 = Z3|Z2 = Z2)
for every (Z2, Z3) ∈ Z2 × Z3. Then, πjZ2and P j3 (Z3 = Z3|Z2 = Z2) are uniquely identified as
πjZ2=∫Z3πjZ2
P j3 (Z3|Z2 = Z2)dZ3 and P j(Z3 = Z3|Z2 = Z2) = [πjZ2P j3 (Z3 = Z3|Z2 = Z2)]/πjZ2
.
Therefore, {P j3 (Z3|Z2)}Jj=1 is identified.
Evaluating P j3 (Z3|Z2) at (Z2, Z3) = (Z3, Z2) for j = 1, ..., J identifies DZ3|Z2and, from (32),
Lz2 is identified as Lz2 = (DZ3|Z2)−1(L′z3)−1PZ2,Z3 . Once DZ3|Z2
and Lz2 are identified, we
can determine Lz3(ζ) = (λ14(ζ|Z3), ..., λJ4 (ζ|Z3))′ for any ζ ∈ Z4 by constructing pZ2,Z3(ζ) =
(fZ2,Z3(a1, ζ), fZ2,Z3(a2, ζ), ..., fZ2,Z3(aJ , ζ)) from the observed data, and using the relationship
Lz3(ζ) = (DZ3|Z2)−1(L′z2)−1pZ2,Z3(ζ)′. Similarly, we can determine Lz2(ξ) = (λ1
2(ξ, Z2), ..., λJ2 (ξ, Z2))′
for any ξ ∈ Z1 by constructing pZ2,Z3(ξ) = (fZ2,Z3(ξ), fZ2,Z3(ξ, b1), fZ2,Z3(ξ, b2), ..., fZ2,Z3(ξ, bJ−1))′
and using the relationship Lz2(ξ) = (DZ3|Z2)−1(L′z3)−1pZ2,Z3(ξ). Therefore, {P j4 (Z4|Z3 = Z3), πjP j2 (Z2 =
Z2|Z1)P j1 (Z1)}Jj=1 is identified. Repeating this argument for all possible values of (Z2, Z3) ∈
25
Z2 × Z3 identifies {P j4 (Z4|Z3), πjP j2 (Z2|Z1)P j1 (Z1)}Jj=1. Finally, {πj , P j2 (Z2|Z1), P j1 (Z1)}Jj=1 is
identified from {πjP j2 (Z2|Z1)P j1 (Z1)}Jj=1 as πj =∫Z1
∫Z2
[πjP j2 (Z2|Z1)P j1 (Z1)]dZ2dZ1, P j1 (Z1) =
[∫Z2
[πjP j2 (Z2|Z1)P j1 (Z1)]dZ2]/πj , and P j2 (Z2|Z1) = [πjP j2 (Z2|Z1)P j1 (Z1)]/[πj×P j1 (Z1)]. This proves
the stated result.
A.4 Proof of Proposition 4
We first show that PH,t and {ψjt }Jj=1 are identified from {πj , P jt (Wt)}Jj=1. Because Ejt [lnWt] =
ln(PH,teψjt ), we may have ψjt = Ejt [lnWt]− PH,t for j = 1, ..., J , where Ejt [lnWt] is identified from
P jt (Wt). Then, PH,t is identified from∑J
j=1 πjeE
jt [lnWt]−lnPH,t = 1 as lnPH,t = ln
(∑Jj=1 π
jeEjt [lnWt]
).
Once PH,t and {ψjt }Jj=1 are identified, then repeating the argument in the proof of Proposition 1
for each type proves the stated result.
A.5 Proof of Proposition 5
Consider i ∈ Ij so that j = j∗(i). For each T , let πjT := πj∗L1i(αj∗m ,σ
j∗ε ;T )∑J
k=1 πk∗L1i(αk∗m ,σk∗ε ;T )
, where (πj∗, αj∗m , σj∗ε )
is the true value of (πj , αjm, σjε ). Then,
πji − 1 = (πji − πjT ) + (πjT − 1). (36)
For the first term, πji − πjT = Op(N
−1/2) as N → ∞ because the maximum likelihood estimator
(πj , αjm, σjε ) is a root-N consistent estimator of (πj∗, αj∗m , σ
j∗ε ) when the number of components J is
correctly specified.
For the second term of (36), define ξjkit := lnL1it(αj∗m , σ
j∗ε )−lnL1it(α
k∗m , σ
k∗ε ) and ajk := E[ξjkit |i ∈
Ij ] > 0, and we have
πjT =1
1 +∑
k 6=j (πk∗/πj∗) exp(−∑T
t=1 ξjkit
) . (37)
For i ∈ Ij , k 6= j,
exp
(−
T∑t=1
ξjkit
)=
{exp
(−
T∑t=1
ξjkit
)− exp(−ajkT )
}+ exp(−ajkT )
= exp(−ajkT )
{exp
(−
T∑t=1
(ξjkit − ajk)
)− 1
}︸ ︷︷ ︸
Op(T 1/2)
+ exp(−ajkT )
= Op
(exp(−ajkT )T 1/2
)as T → ∞. It follows that
∑k 6=j
(πk
∗/πj∗
)exp
(−∑T
t=1 ξjkit
)is Op
(exp(−ajT )T 1/2
), where aj :=
mink 6=j ajk. Therefore, in view of (37), the consistency of πjT as T →∞ and the mean value theorem
26
give
πjT − 1 = Op
(exp(−ajT )T 1/2
). (38)
Then, the stated result follows from (36), (38), and πji−πjT = op(N
−1/2) becauseOp(exp(−ajT )T 1/2
)=
op(N−1/2) as N,T →∞ under Assumption 10.
A.6 Assumption 7 under Cobb-Douglas production function
In the following, we discuss the conditions under which Assumption 7 holds when the production
function is Cobb-Douglas.
Example 1 (continued). For random coefficients model (8), we may write Lz3, Lz2, and DZ3|Z2as
follows. Throughout the analysis, we fix the value of {Yt}Tt=1 at, say, {yt}Tt=1 so that the variation in
the values of aj’s and bj’s are due the variation in the values of Z1 and Z4. Denote Z3 = (y3, s3, x3)
and bh = (y3, bsh, x4) for h = 1, ..., J − 1. Then,
λjZ3
(bh) = P j4 (S4 = bsh|X4 = x4)P j4 (X4 = x4|X3 = x3) = cj4gjε (ln(αjm,4E
j)− ln bsh),
where cj4 = P j4 (X4 = x4|X3 = x3). Therefore, we have
Lz3 = diag{c14, ...., c
J4 }
1 g1
ε (ln(α1m,4E1)− ln bs1) · · · g1
ε (ln(α1m,4E1)− ln bsJ−1)
...... . . .
...
1 gJε (ln(αJm,4EJ)− ln bs1) · · · gJε (ln(αJm,4EJ)− ln bsJ−1)
.Similarly, denote Z2 = (s2, x2) and ah = (ash, x1) for h = 1, ..., J . Then,
λjZ2
(ah) = P j2 (S2 = s2|X2 = x2)P j1 (S1 = ash|X1 = x1)P j1 (X1 = x1) = cj2gjε (ln a
sh − ln(αjmE)),
where cj2 = P j2 (S2 = s2|X2 = x2)P j1 (X1 = x1). Then, we have
Lz2 = diag{c12, ...., c
J2 }
g1ε (ln(α1
m,1E1)− ln as1) · · · g1ε (ln(α1
m,1E1)− ln asJ)... . . .
...
gJε (ln(αJm,1EJ)− ln as1) · · · gJε (ln(αJm,1EJ)− ln asJ)
.For Assumption 7(a), we choose x4, x3, x2, x1, and s2 so that cj2 6= 0 and cj3 6= 0 for any j
and find (as1, ..., asJ) and (bs1, ..., b
sJ−1) such that Lz3 and Lz2 are nonsingular. Because each point
of (as1, ..., asJ) and (bs1, ..., b
sJ−1) refers to a value of lnS1 and lnS4, the full rank condition of Lz3
and Lz2 holds if the value of probability density function of lnS1 and lnS4 changes heterogeneously
across types when we change the value of lnS1 and lnS4.
27
Let Z3 = (s3, x3) and Z2 = (s2, x2). Then,
λj(Z3|Z2) = πjgjε (ln s3 − ln(αjm,3Ej))P j3 (X3 = x3|X2 = x2). (39)
Pick Z3 = (s3, x3) and Z2 = (s2, x2). Assumption 7(b) holds if P j3 (x3|x2) 6= 0 and P j3 (x3|x2) 6= 0
for all j. Then, we have
DZ3|Z2(DZ3|Z2
)−1DZ3|Z2(DZ3|Z2
)−1 = diag
{P 1
3 (x3|x2)
P 13 (x3|x2)
P 13 (x3|x2)
P 13 (x3|x2)
, ...,P J3 (x3|x2)
P J3 (x3|x2)
P J3 (x3|x2)
P J3 (x3|x2)
}.
Therefore, Assumption 7(c) requires thatP j3 (x3|x2)
P j3 (x3|x2)
P j3 (x3|x2)
P j3 (x3|x2)takes different values across different
j’s, namely, the transition probability of X3 given X2 changes heterogeneously across types when
we change the value of X2.
28
Figure 1: Histogram ofPM,tMit
PY,tYit
0D
ensity
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1Intermediate Share
Figure 2: Histogram of(PM,tMit
PY,tYit
)i
over 28 years
0D
ensity
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1Average Intermediate Share
Figure 3:PM,tMit
PM,tMit+WtLit
0D
en
sity
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1PM/(PM+WL)
Figure 4:(
PM,tMit
PM,tMit+WtLit
)i
over 28 yrs
0D
en
sity
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1Average PM/(PM+WL)
29
Figure 5: Trends in the log ofPM,tMit
PY,tYit, output, and inputs in Machine industry
lnPM,tMit
PY,tYitlnYit
-5.0
-2.5
0.0
1980 1990 2000year
lnmY_it
12.5
15.0
17.5
20.0
1980 1990 2000year
y_it
lnKit lnLit
12
14
16
18
20
1980 1990 2000year
k_it
3
6
9
1980 1990 2000year
l_it
lnMit
9
12
15
18
21
1980 1990 2000year
m_it
Notes: This figure shows each firms’ inputs and outputs in each year. Each line represents a different firm.
30
Figure 6: Trends inPM,tMit
PY,tYitfor subindustries in Machine industry
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
2511: Boiler prime mover 2521: Metal machine tools 2522: Metal working machinery
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
2523: Machinery tool 2531: Textile machinery 2532: Agricultural machines
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
2533: Construction and mining equipment 2534: Chemical machinery 2535: Office machinery
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
2536: Special industrial machinery 2537: General industrial machinery 2541: General Mechanical Components
Figure 7: Posterior probabilities for J = 3 and J = 5
J = 3posterior.1 posterior.2 posterior.3
0
50
100
150
200
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8value
count
J = 5posterior.1 posterior.2 posterior.3 posterior.4 posterior.5
0
50
100
150
200
0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8value
count
31
Figure 8: Trends inPM,tMit
PY,tYitand lnYit in Machine industry for J = 3
PM,tMit
PY,tYit
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
Type 1 2 3
lnYit
12.5
15.0
17.5
20.0
1980 1990 2000year
y_it
Type 1 2 3
32
Figure 9: Trends inPM,tMit
PY,tYitin subindustries when J = 3
2511 2521 2522 2523
2529 2531 2532 2533
2534 2535 2536 2537
2541
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
1980 1990 2000year
exp(lnmY_it)
Type 1 2 3
33