Testing Conditional Independence using Conditional ...sis of asymptotic power comparison using the...
Transcript of Testing Conditional Independence using Conditional ...sis of asymptotic power comparison using the...
Testing Conditional Independence using Conditional MartingaleTransforms
Kyungchul Song1
Department of Economics, University of Pennsylvania
March 13, 2007
Abstract
This paper investigates testing conditional independence between Y and Z given
λθ(X) for some θ ∈ Θ ⊂ Rd, for a function λθ(·) known upto a parameter θ ∈ Θ.
First, the paper proposes a new method of conditional martingale transforms un-
der which tests are asymptotically pivotal and asymptotically unbiased against√n-converging Pitman local alternatives. Second, the paper performs an analy-
sis of asymptotic power comparison using the limiting Pitman efficiency. From
this analysis, it is found that there is improvement in power when we use non-
parametric estimators of conditional distribution functions in place of true ones.
When both Y and Z are continuous, the improvement in power due to the non-
parametric estimation disappears after the conditional martingale transform, but
when either Y or Z is binary, this phenomenon of power improvement reappears
even after the conditional martingale transform. We perform Monte Carlo simu-
lation studies to compare the martingale transform approach and the bootstrap
approach.
Key words and Phrases: Conditional independence, Asymptotic pivotal tests,
conditional martingale transforms, limiting Pitman efficiency, approximate Ba-
hadur efficiency.
JEL Classifications: C12, C14, C52.
1I am grateful to Yoon-Jae Whang and Guido Kuersteiner who gave me valuable advice at the initial stageof this research. I also thank Hyungsik Roger Moon, Qi Li, Joon Park, Bernard Salanié, Norm Swansonand seminar participants at a KAEA meeting and workshops at Greater New York Metropolitan Econo-metrics Colloquium, Rutgers University, Texas A&M, UC Riverside, UCLA for their valuable comments.All errors are mine. Address correspondence to: Kyungchul Song, Department of Economics, University ofPennsylvania, 528 McNeil Building, 3718 Locust Walk, Philadelphia, Pennsylvania 19104-6297.
1
1 Introduction
Suppose that Y and Z are random variables, and let λθ(X) be a real valued function of a
random vector X indexed by a parameter θ ∈ Θ. This paper investigates the problem of
testing the conditional independence of Y and Z given λθ(X) for some θ ∈ Θ, i.e., (using
the notation of Dawid (1980))
Y ⊥ Z|λθ(X) for some θ ∈ Θ. (1)
The function λθ(·) is known upto a finite dimensional parameter θ ∈ Θ.
The conditional independence restriction is often used as part of the specification of
an econometric model such as an identifying restriction. It is crucial for identification of
parameters of interest in nonseparable models. (e.g. Altonji and Matzkin (2005) and Vytlacil
and Yildiz (2006).) The restriction is also frequently used in the literature of program
evaluations. (e.g. Rubin (1978), Rosenbaum and Rubin (1983), and Hirano, Imbens, and
Ridder (2003)). The restriction is, sometimes, a direct implication of economic theory. For
example, in the literature of insurance, presence of positive conditional dependence between
coverage and risk is known to be a direct consequence of adverse selection. (e.g. Chiappori
and Salanié (2000)).
The literature of testing conditional independence for continuous variables appears rather
recent and, apparently, involves relatively few researches as compared to that of other non-
parametric or semiparametric tests. Linton and Gozalo (1999) proposed tests that are slightly
stronger than tests of conditional independence. Delgado and Gonzáles Manteiga (2001) also
proposed a bootstrap-based test of conditional independence. Su and White (2003a, 2003b,
2003c) studied several methods of testing conditional independence. Angrist and Kuersteiner
(2004) suggested distribution-free tests of conditional independence.
This paper’s framework of hypothesis testing is based on an unconditional-moment formu-
lation of the null hypothesis using indicator functions that run through an index space. Tests
in our framework have the usual local power properties that are shared by other empirical-
process based approaches such as Linton and Gozalo (1999), Delgado and Gonzáles Manteiga
(2001), and Angrist and Kuersteiner (2004). In contrast with Linton and Gozalo (1999) and
Delgado and Gonzáles Manteiga (2001), our tests are asymptotically pivotal (or asymp-
totically distribution-free), which means that asymptotic critical values do not depend on
unknowns, so that one does not need to resort to a simulation-based method like bootstrap
to obtain approximate critical values.
This latter property of asymptotic pivotalness is shared, in particular, by Angrist and
2
Kuersteiner (2004) whose approach is close to ours in two aspects.2 First, their test has local
power properties that are similar to ours. Second, both their approach and ours employ
a method of martingale transform to obtain asymptotically pivotal tests. However, they
assume that Z is binary and the conditional probability P{Zi = 1|Xi} is parametricallyspecified, whereas we do not require such conditions. This latter contrast primarily stems
from completely different approaches of martingale transforms employed by the two papers.
In particular, Angrist and Kuersteiner (2004) employ martingale transforms that apply to
tests involving estimators of finite-dimensional parameters, whereas this paper uses condi-
tional martingale transforms that are amenable to testing semiparametric restrictions that
involve nonparametric estimators (Song (2007)).
The martingale transform approach was pioneered by Khmaladze (1988,1993) and has
been used both in the statistics and economics literature. For example, Stute, Thies, and Zhu
(1998) and Khmaladze and Koul (2004) studied testing nonlinear parametric specification of
regressions. Koul and Stute (1999) proposed specification tests of nonlinear autoregressions.
Recently in the econometrics literature, Koenker and Xiao (2002) and Bai (2003) employed
the martingale transform approach for testing parametric models, and Angrist and Kuer-
steiner (2004) investigated testing conditional independence as mentioned before. See also
Bai and Ng (2003) for testing conditional symmetry in a similar vein. Song (2007) devel-
oped a method of conditional martingale transform that applies to tests of semiparametric
conditional moment restrictions. Although the conditional independence restriction can be
viewed as a special case of semiparametric conditional moment restrictions, the restriction
lies beyond the scope of Song (2007) and the manner the conditional martingale transforms
are employed is quite different.
One of the most distinct aspects of the hypothesis testing framework of focus in this
paper is that we allow the conditioning variable λθ(X) to depend on unknown parameter
θ. This is particularly important when the dimension of the random vector X is large. For
example, the number of variables in X used in Chiappori and Salanié (2000) is 55. In such
a situation, instead of testing
Y ⊥ Z|X,
it is reasonable to impose a single-index restriction upon X and test the hypothesis of the
form in (1). One can check the robustness of the result by varying the parameterization used
for the single-index λθ(X).
2Several tests suggested by Su andWhite are also asymptotically pivotal, and consistent, but have differentasymptotic power properties. Their tests are asymptotically unbiased against Pitman local alternatives thatconverge to the null at a rate slower than
√n. However, when we extend the space of local alternatives
beyond Pitman local alternatives, the rate of convergence of local alternatives that is optimal is typicallyslower than
√n.(e.g. Horowitz and Spokoiny (2002)).
3
This paper contains also an extensive analysis of asymptotic power properties of tests by
using the limiting Pitman efficiency. First, we find that using the nonparametric estimator
of the conditional distribution function of Y given λθ(X) in the test statistic improves the
asymptotic power of the test as compared to the case where one uses a true one. This
may strike one as counter-intuitive because even when the true nonparametric function is
known, one does better by estimating the function. We affirm this phenomenon in terms of
the limiting Pitman efficiency, and provide intuitive explanation. In fact, this phenomenon
can be viewed precisely as a hypothesis testing version of what Hirano, Imbens, and Ridder
(2003) have found in estimation.
When Y and Z are continuous, our approach suggests applying conditional martingale
transforms to the indicator functions involving Y and Z. Since the conditional martingale
transforms remove the additional local shift caused by the nonparametric estimation of
the conditional distribution function, the phenomenon of improved asymptotic power by
nonparametric estimation errors does not arise after we apply the conditional martingale
transforms. However, when Z is binary, this phenomenon of improved local asymptotic
power reappears, because in this case it suffices to take conditional martingale transform
only on the continuous variable Y.
A conditional martingale transform alters the asymptotic power properties of the original
test. Under the setup of a nonseparable model: Y = ξ(Z, ε), we compute the limiting Pitman
efficiency of two Kolmogorov-Smirnov type tests , one using the original empirical process
and the other using the transformed empirical process. We characterize the set of alternatives
under which the conditional martingale transform weakly improves the asymptotic power of
the original test.
We perform Monte Carlo simulations to compare the approach of bootstrap and that of
martingale transforms. We find that the martingale transform-based tests overall appear to
perform as well as the bootstrap-based test.
In the next section, we define the null hypothesis of conditional independence and discuss
examples. In Section 3, we present the asymptotic representation of the semiparametric
empirical process. In Section 4, we introduce conditional martingale transforms and in
Section 5, we focus on the case where Z is binary. Section 6 is devoted to the asymptotic
power analysis of a variety of tests. In Section 7, we present and discuss results from
simulation studies and in Section 8, we conclude.
4
2 Testing Conditional Independence
2.1 The Null and the Alternative Hypotheses
Suppose that we are given with a random vector (Y,Z,X) distributed by P and an unknown
real valued function λθ(·) on RdX , θ ∈ Θ ⊂ Rd. We assume the following for (Y, Z).
Assumption 1 : (i) Y and Z are continuous random variables. (ii) Θ is compact in Rd.
When either Y or Z is binary, the development of this paper’s thesis becomes simpler,
as we will see when Z is binary in a later section. We introduce notations for indicator
functions: for a random variable S and a real number s,
γs(S) , 1{S ≤ s}.
Hence we write γz(Z) = 1{Z ≤ z}, γy(Y ) = 1{Y ≤ y}, and γλ(λθ(X)) , 1{λθ(X) ≤ λ}.The null hypothesis in (1) requires to check the conditional independence of Y and Z
given λθ(X) for all θ ∈ Θ until we find one θ that satisfies the conditional independence
restriction. The following assumption simplifies this problem by making it suffice to focus on
a specific θ0 in Θ.
Assumption 2 : There exists a unique parameter θ0 ∈ Θ such that (i) Y ⊥ Z|λθ0(X),whenever the null hypothesis of (1) holds, and (ii) for some δ > 0, λθ(X) is continuous for
all θ ∈ B(θ0, δ) , {θ ∈ Θ : ||θ − θ0|| < δ}.
For example, the unique parameter θ0 can be defined as a unique solution to the following
problem
θ0 = argminθ∈Θ
sup(y,z,λ)∈R3
¯E[γλ(λθ(X))γz(Z)(γy(Y )−E
£γy(Y )|λθ(X)
¤¯, (2)
when a unique solution exists. Under Assumption 2, we can write the null hypothesis
equivalently as (see Theorem 9.2.1 in Chung (2001), p.322, and Lemma 1 in Song (2007))
H0 : P©E(γy(Y )|U,Z) = E(γy(Y )|U), for all y ∈ [0, 1]
ª= 1,
where U , Fθ0(λθ0(X)) and Fθ denotes the distribution function of λθ(X). The alternative
hypothesis is given by the negation of the null:
H1 : P©E(γy(Y )|U,Z) = E(γy(Y )|U), for all y ∈ R
ª< 1.
5
The next subsection provides examples in which this testing framework is relevant.
2.2 Examples
2.2.1 Nonseparable Models
The conditional independence restriction is often used in nonseparable models. The first
example is a model of weakly separable endogenous dummy variables. Suppose Y and Z are
generated by
Y = g(ν1(X,Z), ε) and Z = 1 {ν2(X) ≥ u} .
For example, this model can be useful when a researcher attempts to analyze the effect of
job training of a worker upon her earnings prospects. Here Y denotes the earnings outcome
of a worker, and Z denotes the indicator for the job training. It will be of interest to
a researcher whether the indicator for the job training remains still endogenous even after
controlling for covariatesX, because in the presence of endogeneity, the researcher might need
to include some additional covariate into the specification of the endogenous job training that
has a sufficient degree of variation independent from X. (e.g. Vytlacil and Yildiz (2006)).
The exogeneity of Z for the outcome Y given the covariate X will be represented by the
conditional independence restriction:
Y ⊥ Z|ν1(X,Z), ν2(X).
A second example is from Altonji and Matzkin (2005) who considered the following
nonseparable regression model Y = m(Z, ε). One of the assumption used to identify the
local average response was that ε is independent of Z given an instrumental variable X. This
gives the following testable implication of conditional independence restriction, Y ⊥ Z|X.
2.2.2 Heterogeneous Treatment Effects
In the literature of program evaluations, often the restriction of conditional independence
(Y0, Y1) ⊥ Z | X
is considered.(e.g. Rubin (1978), Rosenbaum and Rubin (1983), and Hirano, Imbens, and
Ridder (2003) to name but a few.)3 Here Y0 represents the outcome when the agent is not
treated and Y1, the outcome when treated. The variable X represents the agent’s covariates.
3It is also worth noting that Heckman, Ichimura, and Todd (1998) point out that a weaker condition ofmean independence suffices for the identication of the average treatment effect.
6
It is well-known (Rosenbaum and Rubin (1983)) that when p(X) , P (Z = 1|X) ∈ (0, 1),
(Y0, Y1) ⊥ Z | p(X).
This restriction is not directly testable because we do not observe Y0 and Y1 simultaneously
for each individual. However, as pointed out by Heckman, Smith, and Clemens (1997), we
have the testable implications: (1 − Z)Y ⊥ Z | p(X) or ZY ⊥ Z | p(X). When we specifyp(·) parametrically, the testing framework belongs to (1).
2.2.3 Testing No Conditional Positive Dependence
In the literature of contract theory, there has been an interest in testing relevance of asym-
metric information. Under the asymmetric information, it is known that the risk is positively
related to the coverage of the contract conditional on all publicly observed variables. (See
Cawley and Phillipson (1999), Chiappori and Salanié (2000), Chiappori, Jullien, Salanié and
Salanié (2002), and references therein.) In this case, the null hypothesis is conditional inde-
pendence of the risk and the coverage, but the alternative hypothesis is restricted to positive
dependence of the risk (or probability of claims in the insurance data) and the coverage of
the insurance contract conditional on all observable variables. This paper’s framework can
be used to deal with this case of one-sided testing. At the end of the paper, we provide a
table of critical values for one-sided Kolmogorov-Smirnov statistics.
3 Asymptotic Representation of a Semiparametric Em-
pirical Process
3.1 Asymptotic Representation
The null hypothesis of (1) is a conditional moment restriction. We can obtain an equivalent
formulation in terms of unconditional moment restrictions. Let us define
gYy (u) , E(γy(Y )|U = u) and gZz (u) , E(γz(Z)|U = u)
and S , R2 × [0, 1]. Then the null hypothesis can be written as:
H0 : E[γu(U)γz(Z)(γy(Y )− gYy (U))] = 0, ∀(y, z, u) ∈ S.
7
Test statistics we analyze in the paper are based on the sample analogue of the above moment
restrictions.
Suppose that we are given a random sample {Si}ni=1 = {(Yi, Zi, Ui)}ni=1 of S = (Y,Z, U). Letus define
Uθ,i = Fn,θ,i(λθ(Xi)),
where Fn,θ,i(·) denotes the empirical distribution function of {λθ(Xi)}ni=1 with the omissionof the i-th data point λθ(Xi). We assume the following.
Assumption 3: (i) (Yi, Zi, Xi)ni=1 is a random sample from P where E||Y ||p < ∞ and
E||Z||p <∞ for some p > 4.
(ii) There exists an estimator θ of θ0 in Assumption 2 such that ||θ − θ0|| = oP (n−1/4).
(iii) Λ , {λθ(·) : θ ∈ Θ} is uniformly bounded and there exists δ > 0, such that for any
θ1, θ2 ∈ B(θ0, δ),
|λθ1(x)− λθ2(x)| ≤ C||θ1 − θ2||
for some constant C > 0.
The estimator θ in (ii) can be obtained from a sample version of the problem in (2). The
consistency can be obtained using an approach of Chen, Linton, and van Keilegom (2003).
The uniform boundedness condition in (iii) for Λ is innocuous. For any strictly increasing
function Φ on [0, 1], we can redefine λ0θ = Φ ◦ λθ so that λ0θ is now uniformly bounded.
The the null hypothesis and the alternative hypothesis, and the random variable Uθ,i remain
invariant to this redefinition of λθ.
If the function gYy (·) were known and the quantile transformed data (Ui)ni=1 for U were
observed, a test statistic could be constructed as a functional of the following stochastic
process:
νn(r) ,1√n
nXi=1
γu(Ui)γz(Zi)(γy(Yi)− gYy (Ui)), r = (y, z, u) ∈ S. (3)
In this section, we introduce a series estimator of gYy and investigate the asymptotic
properties of the resulting empirical process that involves the estimator. We approximate
gYy (u) by pK(u)0πy which is constituted by a basis function vector pK(u) and a coefficient
vector πy. Given an appropriate estimator λ(·) of λ(·), define
Ui ,1
n
nXj=1,j 6=i
1 {λθ(Xj) ≤ λθ(Xi)} . (4)
The series-based estimator is defined as gYy (u) , pK(u)0πy, where πy , [P 0P ]−1P 0ay with ay
8
and P being defined by
ay ,
⎡⎢⎢⎣γy(Y1)...
γy(Yn)
⎤⎥⎥⎦ and P ,
⎡⎢⎢⎣pK(U1)
0...
pK(Un)0
⎤⎥⎥⎦ . (5)
Using the estimated conditional mean function gYy , we can construct a feasible version of the
process νn(r) :
νn(r) =1√n
nXi=1
γu(Ui)γz(Zi)(γy(Yi)− gYy (Ui)). (6)
A test statistic can be constructed as a known functional of the process νn(r). We assume
the following for the distribution of λθ(X), θ ∈ B(θ0, δ), for some δ > 0. In the following Sθdenotes the support of λθ(X).
Assumption 4 : (i) There exists δ > 0 such that (a) the density fθ(λ) of λθ(X) satisfies thatsupθ∈B(θ0,δ) supλ∈Sθ fθ(λ) <∞, (b) for someC > 0, supθ∈B(θ0,δ)supλ∈Sθ
¯Fθ(λ+ δ0)− Fθ(λ− δ0)
¯<
Cδ0, for all δ0 > 0, and (c) there exists C > 0 such that for each u ∈ U , {Fθ ◦ λθ : θ ∈B(θ0, δ)}, and each δ0 > 0,
sup(u1,u)∈[0,1]2:|u2−u1|<δ0
¯fY,X|u(X)(y, x|u1)− fY,X|u(X)(y, x|u2)
¯≤ ϕu(y, x)δ
0,
where fY,X|u(X) denotes the conditional density function of (Y,X) given u(X), and ϕu(·, ·)is a real function that satisfies supx∈RdX
Rϕu(y, x)dy < C and
Rϕu(y, x)dx < CfY (y) with
fY (·) denoting the density of Y.(ii) The conditional distribution functions gYy (u) and gZz (u) are second-order continuously
differentiable with uniformly bounded derivatives.
The conditions in Assumption 4 are primarily used to resort to a general result of Escan-
ciano and Song (2007) which Theorem 1 below relies on. A condition similar to Condition
(i)(a) was used by Stute and Zhu (2005). Condition (i)(b) says that the distribution function
of λθ(X) should be Lipschitz continuous with a uniformly bounded Lipschitz coefficient. Con-
dition (i)(c) requires Lipschitz continuity of conditional density of (Y,X), given Fθ(λθ(X)) in
the evaluating point of the conditioning variable.
Theorem 1 : Suppose Assumptions 1-4 hold. Furthermore, assume that the basis function
9
pK in (5) satisfies Assumption A1 in the Appendix. Then the following holds:
sup(y,z,u)∈S
¯¯νn(y, z, u)− 1√
n
nXi=1
γu(Ui)©γz(Zi)− gZz (Ui)
ª©γy(Yi)− gYy (Ui)
ª¯¯ = oP (1). (7)
The result of Theorem 1 provides a uniform asymptotic representation of the semipara-
metric empirical process νn(r). This asymptotic representation motivates the construction
of proper conditional martingale transforms that we develop in a later section. The repre-
sentation also constitutes the basis upon which we derive the asymptotic power properties
of a test constructed using νn(r). Indeed, observe that the process
1√n
nXi=1
γu(Ui)©γz(Zi)− gZz (Ui)
ª©γy(Zi)− gYy (Ui)
ªis a zero mean process only when the null hypothesis of conditional independence is true.
From the asymptotic representation, we can derive the asymptotic properties of the process
νn(r) as follows. Let D(S) denote the space of bounded functions on S that is right-
continuous and have left-limits, and we endow D(S) with a uniform norm || · ||∞. Thenotation à denotes weak-convergence in D(S) in the sense of Hoffman-Jorgensen (e.g. vander Vaart and Wellner (1996)). Let us define the conditional inner product h·, ·iu by
hf, giu ,Z(fg)(s)dF (s|u)
where F (s|u) is the conditional distribution function of S given U = u.
Corollary 1 : (i) (a) Under the null hypothesis, νn(r) Ã ν(r) in D(S), where ν is a
Gaussian process whose covariance kernel is given by
c(r1; r2) =
Z u1∧u2
−∞C(w1, w2;u)du
with C(w1, w2;u) = gZz1∧z2(U){gYy1∧y2(U)− gYy1(U)gYy2(U)}, wj = (zj, yj), j = 1, 2.
(b) Under fixed alternatives, n−1/2νn(r)Ã hγu, γz(γy − gYy )i in D(S).(ii) (a) Under the null hypothesis, νn(r) Ã ν1(r) in D(S) where ν1 is a Gaussian process
whose covariance kernel is given by
c1(r1; r2) =
Z u1∧u2
−∞C1(w1, w2;u)du
with C1(w1, w2;u) = {gZz1∧z2(U)− gZz1(U)gZz2(U)}{gYy1∧y2(U)− gYy1(U)g
Yy2(U)}.
10
(b) Under fixed alternatives, n−1/2νn(r)Ã hγu, γz(γy − gYy )i in D(S).
The results of Corollary 1 lead to the asymptotic properties of tests based on the processes
νn(r) and νn(r). The limiting processes ν(r) and ν1(r) under the null hypothesis are Gaussian
processes with different covariance kernels. Under fixed alternatives, the two empirical
processes νn(r) and νn(r) weakly converge to the same shift term after normalization by√n, providing the basis for local power properties against
√n-converging Pitman local al-
ternatives. The result in Corollary 1 gives rise to a surprising consequence that using the
nonparametrically estimated distribution function gYy (u) instead of the true one gYy (u) can
improve the asymptotic power of the tests. In particular, the covariance kernel of the limit
process under the null hypothesis "shrinks" due to the use of gYy (u) instead of gYy (u) whereas
the limit shift hγu, γz(γy − gYy )i under the alternatives remains intact. In a later section, weformalize this observation.
It is worth noting that the estimation error in λ does not play a role in determining the
limit behavior of the process νn(r). In other words, the results of Theorem 1 and Corollary 1
do not change when we replace λθ by λθ0 . A similar phenomenon is discovered and analyzed
by Stute and Zhu (2005) in the context of testing single-index restrictions using a kernel
method. This convenient phenomenon is due to the use of the empirical quantile transform
of {λθ(Xi)}ni=1.We construct the following two sided test statistics:
TKS = supr∈S
|νn(r)| and TCM =
ZSνn(r)
2dr. (8)
The test statistic TKS is of Kolmogorov-Smirnov type and the test statistic TCM is of Cramér-
von Mises type. The asymptotic properties of the tests based on TKS and TCM follow from
Theorem 1. Indeed, under the null hypothesis,
TKS Ã supr∈S
|ν(r)| and TCM ÃZSν1(r)
2dr, in D(S) (9)
and under the Pitman local alternatives Pn such that for some ay,z(·), En(γz(Z)(γy(Y ) −gYy (U))|U = u) = ay,z(u)/
√n+ o(n−1/2), we have
νn(r)Ã ν1(r) + hγu, ay,zi, in D(S).
The presence of the non-zero shift term hγu, ay,zi renders the test to have nontrivial asymp-totic power against such local alternatives.
The testing procedure based on (9) is not feasible. In particular, we cannot tabulate
11
critical values from the limiting distribution, because the test statistics depend on unknown
components of the data generating process and in consequence the tests are not asymptoti-
cally pivotal.
4 Conditional Martingale Transform
A recent work by the author (Song (2007)) proposes the method of conditional martingale
transform which can be useful to obtain asymptotically pivotal semiparametric tests. The
method of conditional martingale transform proposed in the work does not apply to the
testing problem of conditional independence for two reasons. First, the generalized residual
function ρy,z , γz(γy − gYy ) here is indexed by (y, z) ∈ R2 whereas the generalized residual
function ρτ,θ in Song (2007) does not depend on the index of the empirical process. Sec-
ond, while Song (2007) requires that the conditional distribution of instrumental variables
conditional on the variable inside the nonparametric function should not be degenerate, this
condition does not hold here. In the current situation of testing conditional independence,
the instrumental variable in the conditional moment restriction and the variable inside the
nonparametric function are identically λθ0(X). Therefore, the nondegeneracy of the con-
ditional distribution fails. Third, the instrumental variable λθ0(X) here depends on the
unknown parameter θ0.
The main idea of this paper is that first, we take into account the null restriction of
conditional independence when we search for a proper transformation of the semiparamet-
ric empirical process νn(r) using conditional martingale transforms, and then analyze the
asymptotic behavior of the semiparametric empirical process after the transform.4
4.1 Preliminary Heuristics
In this section, we provide some heuristics of the mechanics by which conditional martingale
transforms work. Suppose that we are given with an operator K on the space of indicator
functions such that the transformed indicator functions γKz , Kγz and γKy , Kγy satisfythe orthogonality condition and the isometry condition with respect to the conditional inner
4By noting that the limiting processes ν and ν1 are variants of a 4-sided tied down Kiefer process, we mayconsider a martingale transform for a 4-sided tied down Kiefer process that was suggested by McKeague andSun (1996). The martingale transform for this case is represented as two sequential martingale transforms.However, the sequential martingale transforms will be very complicated in practice because it involves takingrepreated martingale transforms of a function.
12
product almost surely:
Conditional Orthogonality : hγKz , 1iU = 0 and hγKy , 1iU = 0, andConditional Isometry : hγKz1, γKz2iU = hγz1, γz2iU and hγ
Ky1, γKy2iU = hγy1, γy2iU .
An operator K that satisfies these two properties can be used to construct a transformed
empirical process from which asymptotically pivotal tests can be generated. Recall that
by Theorem 1, the process νn(r) has the following asymptotic representation (using our
conditional inner product notation: hγy, 1iUi , E£γy(Yi)|Ui
¤= gYy (u))
νn(r) =1√n
nXi=1
γu(Ui) {γz(Yi)− hγz, 1iUi}©γy(Yi)− hγy, 1iUi
ª+ oP (1).
When we replace the indicator functions γz and γy by the transformed ones γKz and γKy ,
the right-hand side asymptotic representation turns into the following (by the conditional
orthogonality of the transform):
1√n
nXi=1
γu(Ui)©γKz (Yi)− hγKz , 1iUi
ª©γKy (Yi)− hγKy , 1iUi
ª=
1√n
nXi=1
γu(Ui)γKz (Zi)γ
Ky (Yi).
(10)
We can show that the last term is equal to
√nE£γu(U)hγKz , γKy iU
¤+ νK(r) + oP (1) (11)
where νK(r) is a certain Gaussian process. The asymptotic behavior of the leading two terms
determine the asymptotic properties of tests that are based on the transformed empirical
process. Suppose that we are under the null hypothesis. Then the first term in (11) becomes
zero by the conditional independence restriction. As shown in the appendix, the Gaussian
process νK(r) has a covariance function of the form
c(r1, r2) , E[γu1∧u2(U)γz1∧z2(Z)γy1∧y2(Y )]. (12)
This form is obtained by using the conditional isometry of the operator K and the null
hypothesis of conditional independence. Hence the resulting Gaussian process is a time-
transformed Brownian sheet and we can rescale it into a standard Brownian sheet by using
an appropriate normalization. Against alternatives P1 such that P1{|hγKz , γKy iU | > 0} > 0, atest properly based on the transformed process will be consistent.
The interesting phenomenon that the nonparametric estimation error improves the power
13
disappears for tests based on transformed processes of the kind in (10). Consider the as-
ymptotic representation of the empirical process νn(r) that uses the true hγy, 1iUi :
νn(r) =1√n
nXi=1
γu(Ui)γz(Yi)©γy(Yi)− hγy, 1iUi
ª+ oP (1).
When we apply the transform K to the indicator functions of γy and γz, the right-hand sideis equal to
1√n
nXi=1
γu(Ui)γKz (Yi)
©γKy (Yi)− hγKy , 1iUi
ª+ oP (1)
=1√n
nXi=1
γu(Ui)γKz (Yi)γ
Ky (Yi) + oP (1).
Hence the asymptotic representation of the process after the transform K coincides with
that in (10). The asymptotic power properties remain the same regardless of whether we
use gYy (u) or gYy (u) once the transform K is applied.
4.2 Conditional Martingale Transforms
In this section, we discuss a method to obtain the operator K that satisfies the conditionalorthogonality and the conditional isometry properties. Khmaladze (1993) proposes using
a class of isometric projection operators on L2-space. This class of isometric projection
operators are designed to be an isometry in the usual inner-product in L2-space. Song (2007)
extends this isometric projection to conditional isometric projection by using conditional
inner-product in place of the usual inner-product and finds that this class of projection
operators can be useful to generate semiparametric tests that are asymptotically pivotal.
Just as the isometric projection is called a martingale transform, Song (2007) calls the
conditional isometric projection a conditional martingale transform. See Song (2007) for the
details.
By applying the conditional martingale transform in Song (2007) to our indicator func-
tions, we can obtain the transform K as follows:
γKy (U, Y ) , γy(Y )−E£γy(Y )hy(U, Y )|U = u
¤y=Y
and (13)
γKz (U,Z) , γz(Z)−E [γz(Z)hz(U,Z)|U = u]z=Z ,
14
where
hy(U, Y ) ,1{Y ≤ y}
1− FY |U(Y |U)and hz(U,Z) ,
1{Z ≤ z}1− FZ|U(Z|U)
. (14)
Observe that if hy(U, Y ) and hz(U,Z) were taken to be a constant 1, the operator K is just anorthogonal projection onto the orthogonal complement of a constant function in L2(P ). The
factors hy(U, Y ) and hz(U,Z) as defined in (14) render the transforms isometric.
As compared to the conditional martingale transform in Song (2007), the transform in
this paper corresponds to the case of function q0 in Song (2007) being equal to 1 and hence is
simpler. This is primarily because the estimation error of λ is no longer needed to be taken
care of in the transform, as it does not play a role in determining the asymptotic represen-
tation in Theorem 1 due to the sample quantile transform of the conditioning variable.
The following result shows that the transform K defined in (13) is a martingale transformthat we have sought for. Eventually, we focus on test statistics based on the following two
processes νKn (r) and νKn,E(r) defined by
νKn (r) =1√n
nXi=1
γu(Ui)γKz (Zi)γ
Ky (Yi) and (15)
νKn,E(r) =1√n
nXi=1
γu(Ui)γKz (Zi)(γy(Yi)− gKy (Ui)),
where gKy (u) denotes the nonparametric estimator gYy (u) that uses γ
Ky in place of γy.
Theorem 2 : (i) For all y1, y2, z1 and z2 in R,
hγKy1 , 1iU = 0 and hγKy1 , γKy2iU = hγy1, γy2iU a.s. and
hγKz1 , 1iU = 0 and hγKz1 , γKz2iU = hγz1 , γz2iU a.s.
(ii) Suppose Assumptions 1-4 hold.
(a) Under H0, the processes νKn (r) and νKn,E(r) defined in (15) satisfy that
νKn (r)Ã νK(r) and νKn,E(r)Ã νK(r), in D(S), (16)
where νK is a Gaussian process on S whose covariance kernel is given by
c3(r1; r2) ,Z u1∧u2
−∞C3(w1, w2;u)dF (u), rj = (u,wj), j = 1, 2,
and the function C3(w1, w2;u) is defined by C3(w1, w2;u) , gZz1∧z2(u)gYy1∧y2(u).
15
(b) Under the fixed alternatives,
n−1/2νKn (r)Ã E[γu(U)hγKz , γKy iU ] and n−1/2νKn,E(r)Ã E[γu(U)hγKz , γKy iU ], in D(S),
where γKz and γKy are defined in (13).
The first statement of the theorem is immediately adapted from Theorem 6.1 of Khmaladze
and Koul (2004) to the "conditional inner product". This result tells us that the transform
satisfies the orthogonality condition and the isometry condition. From the results of (i)
and (ii), we can derive the asymptotic properties of tests that are constructed based on the
transformed processes νKn (r) and νKn,E(r).
The limiting Gaussian process νK(r) is a time-transformed Brownian sheet whose kernel
still depends on the unknown nonparametric functions gZz (u) and gYy (u). Following a similar
idea in Khmaladze (1993), we can turn the process into a standard Brownian sheet by
replacing γKz (u, z) and γKy (u, y) by
γKz (u, z) ,γKz (u, z)pfZ|U(z|u)
and γKy (u, y) ,γKy (u, y)pfY |U(y|u)
,
where fZ|U and fY |U denote conditional density functions. Then the weak limit νK(r) in (16)
has a covariance function
c(r1, r2) , (u1 ∧ u2) (z1 ∧ z2) (y1 ∧ y2), r1, r2 ∈ S,
which is that of a standard Brownian sheet. Observe that the test based on this conditional
martingale transform is not yet asymptotically pivotal because the range S of the standardBrownian sheet depends on the distribution of (Y, Z). This can be dealt with by rescaling
the marginals of Y and Z appropriately. We will explain this later in a section devoted to
the feasible conditional martingale transform.
4.3 Feasible Conditional Martingale Transforms
The method of conditional martingale transform proposed in the previous section is not
yet feasible for two reasons. First the conditional martingale transform involves unknown
components of the data generating process. Second, the range of index S is not known.Suppose F Y
n and FZn are the empirical quantile transform of (Yi)ni=1 and (Zi)
n.i=1. Let us
16
define two functions in the following way:
V Yi , F Y
n (Yi) and V Zi , FZ
n (Zi). (17)
We define a feasible transform as follows. Let us define P as in (5) and let
β(y, y) ,
⎡⎢⎢⎢⎢⎢⎣1{c0 ≤ V Y
1 ≤ y ∧ y}/½q
f(V Y1 , U1)[1− pK(U1)
0π(V Y1 )]
¾...
1{c0 ≤ V Yn ≤ y ∧ y}/
½qf(V Y
n , Un)[1− pK(Un)0π(V Y
n )]
¾⎤⎥⎥⎥⎥⎥⎦ (18)
where π(y) , [P 0P ]−1P 0b(y), b(y) being the column vector of i-th entry equal to 1{0 <
V Yi ≤ y}, and f(v, u) denotes a nonparametric estimator of the joint density of V Y
i and Ui.5
Thus the feasible transform is obtained by
(γKy )(u, y) =1{y ≤ y}qf(y, u)
− pK(u)0[P 0P ]−1P 0β(y, y). (19)
We can similarly obtain (γKz )(u, z) by exchanging the roles of VYi and V Z
i . An estimated
version of the transformed process is now defined by
νKn (r) ,1√n
nXi=1
γu(Ui)γKz (Ui, V
Zi )γ
Ky (Ui, V
Yi ).
Due to the random rescale of Yi and Zi in (17), the process is indexed by r ∈ S0, whereS0 , [0, 1)2 × [0, 1]. Hence the dependence of the process upon the unknown support S hasdisappeared.
We can construct the transformed two-sided test statistics in the following way:
TKKS = supr∈S0
¯νKn (r)
¯and TKCM =
ZS0νKn (r)
2dr. (20)
In the context of testing semiparametric conditional moment restrictions, Song (2007) delin-
eated conditions under which the feasible conditional martingale transforms are asymptoti-
cally valid. We can establish the asymptotic validity of the feasible conditional martingale
5Note that since Ui is uniformly distributed, we have f(v|u) = fV Y ,U (v, u) where fV Y ,U (v, u) is a joint-density of V Y
i and Ui. Hence we can use simply the estimator of fV Y ,U (v, u) instead of f(v|u).
17
transforms in a similar manner, so that we obtain, combined with Theorem 2,
TKKS →d supr∈S0
|W (r)| and TKCM →d
ZS0|W (r)|2dr,
where W is a standard Brownian sheet on S. We will sketch the derivation of asymptoticvalidity in the appendix.
When we are interested in testing the null of conditional independence against alternatives
of conditional positive dependence, we can construct the following test statistics
TKKS,1 = supr∈S0
νKn (r)→d supr∈S0
W (r) and
TKCM,1 =
ZS0|max{νKn (r), 0}|2dr→d
ZS0|max{W (r), 0}|2dr.
5 When Zi is a Binary Variable
In some applications, either Y or Z is binary. Specifically, we consider the case where Z is a
binary variable taking zero or one, and the variables Y and X are continuous. This setting
is similar to Angrist and Kuersteiner (2004), but the difference is that we do not assume
a parametric specification of the conditional probability P{Z = 1|X}. The application ofconditional martingale transform in this case becomes even simpler. First, we do not need
to transform the indicator function of Z. Second, the dimension of the index space for the
empirical process in the test statistic reduces from 3 to 2.
First, observe that the null hypothesis becomes
E[γu(U)1{Z = z}{γy(Y )−E[γy(Y )|U ]}] = 0
for all (u, y, z) ∈ [0, 1]2 × {0, 1}. From the binary character of Z, we observe that the null
hypothesis is equivalent to6
E[γu(U)Z{γy(Y )−E[γy(Y )|U ]}] = 0.
This latter formulation of the null hypothesis is convenient because the index for the indicator
of Zi is fixed to be one and hence the corresponding empirical process has an index running
over [0, 1]2 instead of [0, 1]2×{0, 1}. The corresponding feasible empirical process in the test6Equivalently, we can write the null hypothesis as E[γu(U)(1− Z){γy(Y )−E[γy(Y )|U ]}] = 0.
18
statistic before the martingale transform are given by
νn(u, y) =1√n
nXi=1
γu(Ui)hZi{γy(Yi)− E[γy(Yi)|Ui]
iqP {Z = 1|Ui}
, (u, y) ∈ [0, 1]2.
Using similar arguments in the proof of Theorem 1 and assuming regularity conditions for
the estimator P {Z = 1|Ui}, we obtain that
νn(u, y) =1√n
nXi=1
γu(Ui)(Zi −E [Zi|Ui])pP {Z = 1|Ui}
{γy(Yi)−E(γy(Yi)|Ui)}+ oP (1).
Hence, it turns out that in this case of binary Z, we have only to take a conditional martingale
transform on the indicator function of γy. Let us consider the following three martingale
transformed processes:
νKna(u, y) , 1√n
nXi=1
Ziγu(Ui)(γKy )(Ui, Yi)p
P {Zi = 1|Ui}, (21)
νKnb(u, y) , 1√n
nXi=1
(1− Zi)γu(Ui)(γKy )(Ui, Yi)p
P {Zi = 0|Ui}, and
νKn,E(u, y) , 1√n
nXi=1
Ziγu(Ui){(γKy )(Ui, Yi)− E[(γKy )(Ui, Yi)|Ui]}qP {Zi = 1|Ui}− P {Zi = 1|Ui}2
.
Then in view of the previous results, these processes will weakly converge inD([0, 1]×[0, 1)) toa standard Brownian sheet under the null hypothesis, i.e., a Gaussian process with covariance
function:
c(u1, y1;u2, y2) , (u1 ∧ u2)(y1 ∧ y2).
In a next subsection, we compare the asymptotic power properties of tests based on the three
processes.
Let us consider the concrete procedure of constructing a test statistic using a feasible
conditional martingale transform. For brevity, we consider only the processes νKna(u, y) and
νKnb(u, y). Let P (Zi = 1|Ui) denote a nonparametric estimator of P {Zi = 1|Ui} and let γKy
19
be as defined in (19). Then we define the processes
νKna(u, y) =1√n
nXi=1
Zi1{Ui ≤ u}γKy (Ui, VYi )q
P (Zi = 1|Ui)and
νKnb(u, y) =1√n
nXi=1
(1− Zi)1{Ui ≤ u}γKy (Ui, VYi )q
P (Zi = 0|Ui).
We can construct the following three Kolmogorov-Smirnov type test statistics:
TK1a = sup(u,y)∈[0,1]×[0,1)
¯νK1na(u, y)
¯TK1b = sup
(u,y)∈[0,1]×[0,1)
¯νK1nb(u, y)
¯and
TK1S = sup(u,y)∈[0,1]×[0,1)
{¯νK1na(u, y)
¯+¯νK1nb(u, y)
¯}/√2.
The three tests have the same asymptotic critical values. However, as we shall see, their
asymptotic power properties are quite different as we can see in the next section and the
section devoted to simulation studies. The asymptotic power of TK1a can be large or small
depending on whether Z and Y conditionally positively or negatively dependent given U.
The symmetrized version TK1S does not show such asymmetric power properties depending
on the direction of alternatives.
In the actual computation of the test statistic, we use grid points. We have found from
our simulation studies that we discuss in the next section, the equally spaced grid points of
10 by 10 in a unit square appear to work well. The limiting distribution under the null is
the Kolmogorov-Smirnov functional of a two-parameter Brownian sheet. Its critical values
are replicated from Table 3 of Khmaladze and Koul (2004) in the following table.
significance level 0.5 0.25 0.20 0.10 0.05 0.025 0.01
critical values 1.46 1.81 1.91 2.21 2.46 2.70 3.03
The critical values are based on the computation of Brownrigg and can be found at his
website: http://www.mcs.vuw.ac.nz/~ray/Brownian. Therefore, one reject the null hypoth-
esis of conditional independence at the level of 0.05 if Tn > 2.46.
We can similarly construct one-sided Kolmogorov-Smirnov type test statistics:
TK1a,1 = sup(u,y)∈[0,1]×[0,1)
νK1na(u, y) and
TK1b,1 = sup(u,y)∈[0,1]×[0,1)
νK1nb(u, y).
20
The statistic TK1a,1 is for testing conditional independence against the positive conditional
dependence given λθ(X) and the test statistic is TK1b,1 is for testing conditional independence
against the negative conditional dependence.
6 Asymptotic Power Analysis
In this section, we embark on a formal comparison of a variety of tests in which we analyze the
effects of the nonparametric estimation error and the conditional martingale transform upon
the asymptotic power of the tests. The basic tool of comparison that we employ is the liming
Pitman efficiency which we can compute without difficulty due to the well-known results
of Bahadur (1960) and Wieand (1976). In particular, Wieand (1976) specified sufficient
conditions under which the limiting Pitman efficiency for asymptotically non-normal tests
can be computed via the approximate Bahadur efficiency (Bahadur (1960)).
First, we find that the nonparametric estimation error improves the asymptotic power of
the test. This finding may strike one as counter-intuitive because it is tantamount to saying
that even when perfect knowledge of the nonparametric function is available, it is better to
use its estimator. A similar observation was made in estimation by Hahn (1998) and Hirano,
Imbens, and Ridder (2003). We provide some discussions regarding this phenomenon later.
Second, we analyze the effect of the conditional martingale transform upon the asymptotic
power properties. We find that when both Y and Z are continuous, the improvement of
asymptotic power by nonparametric estimation error disappears under the conditional mar-
tingale transform, but when either Y or Z is binary, the improvement is partially preserved
even under the conditional martingale transform. We also characterize the set of local al-
ternatives in which the conditional martingale transform improves the asymptotic power of
the tests.
6.1 Limiting Pitman Efficiency
First, we define the limiting Pitman efficiency as follows (c.f. Nikitin (1995), p.2-3.). Sup-
pose that P is the space of probabilities P for (Y, Z,X), and we are given a sequence of
test statistics Tn for testing the null of conditional independence. We choose a parametric
submodel in P and define the limiting Pitman efficiency under the parametric submodel.
And then, we demonstrate that certain bounds of limiting Pitman efficiency do not depend
on the specific parametrization.
We first consider a parametric submodel {Pτ ⊂ P : τ ∈ [0, τ 0]} for some τ0 > 0 in whichP0 represents a probability under the null hypothesis and Pτ with τ ∈ (0, τ 0] represents a
21
probability under the alternative hypothesis. Then, for any β ∈ (0, 1) and τ ∈ [0, τ 0], a realsequence cTn(β, τ) is chosen to be such that
Pτ{Tn > cTn(β, τ)} ≤ β ≤ Pτ{Tn ≥ cTn(β, τ)}.
Then, the number αTn(β, τ) , P0{Tn ≥ cTn(β, τ)} denotes the minimal size of the test basedon {Tn} such that the power at Pτ is not less than β. For any given two test statistics Tnand Sn, the relative efficiency of Tn with respect to Sn is defined as
eT,S(α, β, τ) =min {n : αSn(β, τ) ≤ α for all m ≥ n}min {n : αTn(β, τ) ≤ α for all m ≥ n} .
When eT,S > 1, the test based on {Tn} is preferable to that based on {Sn} because therequired sample size to achieve power β at Pt with the given level α is smaller for the test
Tn than that for the test Sn. In general, the exact computation of eT,S(α, β, τ) is extremely
difficult in many cases, and there has been proposed a variety of notions of asymptotic relative
efficiency. In this paper, we focus on the limiting Pitman efficiency. For other notions, the
reader is referred to Nikitin (1995).
Following Wieand (1976), we define the upper and lower Pitman efficiencies as follows:
e+T,S(α, β) = sup{limsupj→∞eT,S(α, β, τ j) : {τ j} ∈ T1, τ j ↓ 0} ande−T,S(α, β) = inf{liminfj→∞eT,S(α, β, τ j) : {τ j} ∈ T1, τ j ↓ 0},
where T1 denotes the space of sequences {τ j}j∈N+, τ j ∈ (0, τ 0]. When limα→0 e+T,S(α, β)
and limα→0 e−T,S(α, β) exist and coincide, we call the common number the limiting Pitman
efficiency and denote it by e∗T,S(β). This limiting Pitman efficiency is the criterion that we
employ to compare the tests. When the test statistics are asymptotically normal, Bahadur
(1960) showed that the limiting Pitman efficiency coincides with the limit of the approximate
Bahadur efficiency as τ ↓ 0. Wieand (1976) showed that such coincidence is preserved evenunder a more general setting once a stronger condition on the asymptotic behavior of the
test statistic under the local alternatives is satisfied. In this latter case, the limiting Pitman
efficiency is independent of β and hence we simply denote it by e∗T,S.
6.2 The Effect of Nonparametric Estimation Error
We consider the situation of Section 3. We analyze how our use of a nonparametric estimator
of gYy (u) in the process νn(r) affects the power of the test. We find that the power of the
test improves by using the estimator instead of a true one. In other words, the test becomes
22
better when we use estimator gYy (u) even if we know the true conditional distribution function
gYy (u).
As for the stochastic processes νn(r) and νn(r) defined in (3) and (6), define
Tn = supr∈S |νn(r)| and Sn = supr∈S |νn(r)|,
and denote e∗T,S(β) to be the limiting Pitman efficiency of Tn with respect to Sn. Let gZz (·; τ)
and gYy (·; τ) denote the conditional distribution functions of Z and Y given U under Pτ . Then,
we obtain the following result.
Corollary 2 : Suppose the conditions of Theorem 1 hold. Then
e∗T,S = limτ↓0sup(z,y)∈R2
R 10(gZz (u; τ)− gZz (u; τ)
2)(gYy (u; τ)− gYy (u; τ)2)du
supy∈RR 10(gYy (u; τ)− gYy (u; τ)
2)du≤ 14,
if the limit above exists.
The result in Corollary 2 is well expected from the result of Theorem 1. The main
reason is because while the nonparametric estimator gYy (u) in place of gYy (u) has an effect of
"shrinking" the kernel of the limiting Gaussian process, the local shift of the test statistics
under the local alternatives remains the same regardless of whether one uses gYy (u) or gYy (u).
This yields the counter-intuitive result that even when we know the conditional distribution
function gYy (u), it is better to use its estimator gYy (u) in the test statistic.
7
In the literature of estimation, under certain circumstances, the estimation of nonpara-
metric functions improves the efficiency of the estimators upon that when true functions are
used. (e.g. Hahn (1998), and Hirano, Imbens, and Ridder (2003) and see references therein.)
This efficiency gain arises due to the fact that the nonparametric estimation utilizes the
additional restrictions in a manner that is not utilized when we use the true ones.
The primary source of the improvement of the asymptotic power in Corollary 2 stems from
the following asymptotic representation (for a general result, see Lemma 1U of Escanciano
and Song (2007))
1√n
nXi=1
γu(Ui)γz(Zi){γy(Yi)− gYy (Ui)}
=1√n
nXi=1
γu(Ui){γz(Yi)− gZz (Ui)}{γy(Yi)− gYy (Ui)}+ oP (1).
7Khmaladze and Koul (2004) also discuss the case when the power is increased by using estimators insteadof true ones in the context of parametric tests.
23
As noted in the remarks after Theorem 1, the variance of the leading sum on the right-hand
side is smaller than that of 1√n
Pni=1 γu(Ui)g
Zz (Zi){γy(Yi)−gYy (Ui)} under the null hypothesis.
In fact, when we replace γz(Z) by gZz (U) on the left-hand side, we obtain the following result
1√n
nXi=1
γu(Ui)gZz (Ui){γy(Yi)− gYy (Ui)} = oP (1). (22)
This fact provides a contrast to a result obtained using the true nonparametric function gYy :
1√n
nXi=1
γu(Ui)gZz (Ui){γy(Yi)− gYy (Ui)} = OP (1). (23)
The rate of convergence for the sample version of E£γu(U)g
Zz (U){γy(Y )− gYy (U)}
¤is faster
with the nonparametric estimator gYy than the true one. Actually we can obtain the same
contrast between (22) and (23) when we replace γu(U)gZz (U) by any function ϕ(U). In
other words, by using the nonparametric estimation in the test statistic, we utilize the
conditional moment restriction E£γy(Y )− gYy (U)|U
¤= 0 more effectively than by using the
true function gYy . The conditional moment restriction holds both under the null hypothesis
and under the alternatives. Under the alternatives, this effective utilization of the conditional
moment restriction by nonparametric estimation has an effect of order which is dominated
by the√n-diverging shift term that remains the same regardless of whether we use gYy or
gYy . Combined with this, the first order effect of nonparametric estimation under the null
hypothesis results in improvement in power over the test using the true one gYy .
6.3 The Effect of Nonparametric Estimation Error After the Mar-
tingale Transform
In this section, we analyze the effect of nonparametric estimation error for the test statistics
under the conditional martingale transforms. We divide the discussion into the case of Y
and Z being continuous and the case of either Y or Z being binary.
6.3.1 When Y and Z Are Continuous
We consider the situation of Section 4.2. Recall the definitions of the processes νKn (r) and
νKn,E(r) in (15). Now, we compare the two tests defined by
Tn = supr∈S |νKn,E(r)| and Sn = supr∈S |νKn (r)|,
24
and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn. The test
Tn uses the estimated version gKy (U) = E£γKy (Y )|U
¤and the test Sn uses the true one
gKy (U) = E£γKy (Y )|U
¤which is zero by Theorem 2(i). Then we obtain the following result.
Corollary 3 : Suppose the conditions of Theorems 1 and 2 hold. Then we have
e∗T,S = 1.
The result of Corollary 3 shows that the effect of nonparametric estimation error disappears
under the conditional martingale transform. This was expected from the result of Theorem
2 in which we saw that the asymptotic behavior of the two tests Tn and Sn are the same both
under the null hypothesis and the local alternatives. The main reason is that the conditional
martingale transform completely wipes out the additional term due to the nonparametric
estimation error that is responsible for the improvement in asymptotic power. Therefore,
under the conditional martingale transform, one may prefer the simpler test statistic Sn than
Tn which involves the nonparametric estimator gKy (U).
6.3.2 When Either Y or Z are Binary
In this subsection, we consider the situation of Section 5. We noted that the nonparametric
estimation error does not affect the asymptotic properties of the martingale-transformed
processes. However, in the case when Z is binary, the nonparametric estimation error can
improve the power of the test even after the martingale transform. This is because we are
taking a conditional martingale transform only on γy and thereby, wiping out the effect of
nonparametric estimation error only partially.
To substantiate these remarks, let us analyze the behavior of the two processes νKna(u, y)
and νKn,E(u, y) in (21) under a fixed alternative. Based on Theorems 1 and 2, we can easily
show that8
n−1/2νKna(u, y) → p ϕ0(u, y; τ) , Eτ
"Zγu(U)(γ
Ky )(U, Y )p
Pτ{Z = 1|U}
#and (24)
n−1/2νKn,E(u, y) → p ϕ1(u, y; τ) , Eτ
"Zγu(U)(γ
Ky )(U, Y )p
Pτ{Z = 1|U}− Pτ{Z = 1|U}2
#
where Eτ denotes the expectation under Pτ . Since the denominator of the shift term for
νKna(u, y) dominates that of the shift term for νKn,E(u, y), the test based on the process
8The second statement follows by the law of iteratred conditional expectations and the orthogonalityconditions in Theorem 2(i).
25
νKn,E(u, y) has a better local asymptotic power than that based on n−1/2νKna(u, y). This also
shows the interesting phenomenon that the test has a better power when we use the non-
parametric estimator for E£γKy (U, Y )|U
¤instead of the true one. As before, we can confirm
this phenomenon in terms of the limiting Pitman efficiency. Define now
Tn = sup(u,y)∈[0,1]×R|νKna(u, y)| and Sn = sup(u,y)∈[0,1]×R|νKn,E(u, y)|,
and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn.
Corollary 4 : Suppose that the conditions of Theorem 1 hold except for that of Z. Then
e∗T,S = limτ↓0
(sup(u,y)∈[0,1]×R |ϕ1(u, y; τ)|sup(u,y)∈[0,1]×R |ϕ0(u, y; τ)|
)2≤ 1,
if the limit above exists.
The result of the improved power property in Corollary 4 is demonstrated in the simu-
lation studies. Note that the local shift EhZγu(U)(γ
Ky )(U, Y )/
pP{Z = 1|U}
iin (24) can
be large or small depending on whether Z and Y are conditionally positively or negatively
dependent. Hence the asymptotic power properties can be asymmetric depending on the
alternatives. The symmetrized version TK1S removes part of this asymmetry.
6.4 The Effect of Conditional Martingale Transforms on Asymp-
totic Power of Tests
Based on the result of Theorem 2, we can perform an analysis on the effect of conditional
martingale transforms on the asymptotic power using the limiting Pitman efficiency. In this
subsection, we compare two tests based on the processes νn(r) and νKn (r) defined in (6) and
(15). At first glance, the comparison is not unambiguous because the martingale transform
dilates the variance of the empirical process under the null hypothesis, but at the same time
moves the shift term under the local alternatives farther from zero. The eventual comparison
should take into account the trade-off between these two changes. To facilitate the analysis,
we consider the following specification of Y :
Y = ξ(Z, ε),
where ξ(z, ε) : R2 → R and ε is a random variable taking values in R and is condition-
ally independent of Z given U. In the following we give the limiting Pitman efficiency of
26
Kolmogorov-Smirnov tests based on the processes ν1n(r) and νKn (r) and characterize a set
of alternatives under which the conditional martingale transform improves the asymptotic
power of the test.
Let us denote the conditional copula of Z and Y given U by
CZ,Y |U(z, y|U ; τ) , Pτ
©gYZ (U ; τ) ≤ z, gYY (U ; τ) ≤ y|U
ª.
The conditional copula CZ,Y |U(z, y|U ; τ) is the conditional joint distribution function of Zand Y under Pτ after normalized by the conditional quantile transforms. It summarizes the
conditional joint dependence of Z and Y given U . For introductory details about copulas,
see Nelsen (1999) and for conditional copulas, see Patton (2006). Define
Tn = supr∈S0|νn(r)| and Sn = supr∈S0 |νKn (r)|,
and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn.
Corollary 5 : Suppose the conditions of Theorems 1 and 2 hold. Furthermore assume thatξ(z, ε) is either strictly increasing in z for all ε in the support of ε or strictly decreasing in
z for all ε in the support of ε. Then
e∗T,S = limτ↓0sup(u,z,y)∈S
¯R u0
©CZ,Y |U(gZz (u; τ), g
Yy (u; τ)|u; τ)− gZz (u; τ)g
Yy (u; τ)
ªdu¯2
sup(z,y)∈R2
R 10(gZz (u; τ)− gZz (u; τ)
2)(gYy (u; τ)− gYy (u; τ)2)du
,
if the limit above exists. In particular, when gZz (u; τ) and gYy (u; τ) do not depend on u for
all τ ∈ [0, ε) for some ε > 0,
e∗T,S ≤1
4.
Corollary 5 provides a representation of the limiting Pitman efficiency e∗T,S.When gZz (u; τ)
and gYy (u; τ) do not depend on u on a neighborhood of τ = 0, so that the test of conditional
independence becomes test of independence, it follows that e∗T,S ≤ 1/4, and hence in thiscase, the conditional martingale transform weakly improves the asymptotic power as long
as ξ(z, ε) satisfies the condition of Corollary 5. However, in general, we cannot say whether
either of the two tests dominate the other.
27
7 Simulation Studies
7.1 Conditional Martingale Transform and Wild Bootstrap
In this section, we present and discuss the results from simulation studies that compare the
approach of conditional martingale transform and the approach of bootstrap. We consider
testing conditional independence of Yi and Zi givenXi. Each variable is set to be real-valued.
The variable Zi is binary taking zero or one depending on the following rule:
Zi = 1{Xi + ηi > 0}.
The variable Yi is determined by the data generating process
Yi = βXi + κZi + εi.
The parameter β is set to be 0.5. The variable Xi is drawn from the uniform distribution
on [−1, 1] and the errors ηi and εi are independently drawn from N(0, 1). Note that κ = 0
corresponds to the null hypothesis of conditional independence between Y and Z given X.
The magnitude of β controls the strength of the relation between Yi and Xi.
We consider two tests, one based on the conditional martingale transform approach and
the other based on the bootstrap approach. For both tests, we take a sample quantile
transform of the marginals of Yi and Xi to generate V Yi and Ui. The bootstrap procedure is
as follows:
(1) Take (εy,i)ni=1 defined by εy,i = γy(Yi)− E(γy(Yi)|Ui).
(2) Take a wild bootstrap sample (ε∗y,i)ni=1 = (ωiεy,i)
ni=1 where {ωi} is an i.i.d. sequence of
random variables that are independent of {Yi, Zi, Ui} such that E(ωi) = 0, E(ω2i ) = 1.
(3) Construct γ∗y,i = E(γy(Yi)|Ui) + ε∗y,i.
(4) Using (γ∗y,i, Ui)ni=1, estimate the bootstrap conditional mean function E(γ
∗y,i|Ui).
(5) Obtain the bootstrap empirical process ν∗n(u, y) =1√n
Pni=1 γu(Ui)Zi
hγ∗y,i − E(γ∗y,i|Ui)
i.
The bootstrap procedure is the same as that considered in Delgado and Gonzáles Man-
teiga (2001), p.1490, except that they did not take the quantile transform of X. Based upon
the bootstrap process, we can construct the bootstrap test statistics:
TBKS = sup
(u,y)∈[0,1]×R|ν∗n(u, y)| and TB
CM =
Z Zν∗n(u, y)
2dudy.
Nonparametric estimations are done using series-estimation with Legendre polynomials.
28
We rescale the variables Xi and Yi so that they have a unit interval support. The rescaling
was done by taking their (empirical) quantile transformations. We also take into account
unknown conditional heteroskedasticity by normalizing the semiparametric empirical process
by σ(Ui)−1. The test statistics are formed by using a Kolmogorov-Smirnov type. The number
of Monte Carlo iterations and the number of bootstrap Monte Carlo iterations are set to be
1000. The sample size is 300.
First, consider the size of the tests whose results are in Table 1. The nominal size is
set at 0.05. Order 1 represents the number of the terms included in the nonparametric
estimation of the bivariate density function fY,U(y, u) and Order 2 represents the number
of the terms included in the nonparametric estimation of the conditional expectations given
Ui. The martingale-transform based tests appear more sensitive to the choice of orders in the
series estimation than the bootstrap-based test. This is not surprising because the martingale
transform approach involves more incidence of nonparametric estimation than the bootstrap
approach. The empirical size overall seems reasonable.
[INSERT TABLE 1]
Tables 2 and 3 contain the results for power of the tests. The deviation from the null
hypothesis is controlled by the magnitude of κ. Results in Table 2 are under the alternatives
of positive dependence between Yi and Zi and those in Table 3 are under the alternatives of
negative dependence. First, the power of the test based on TK1A is shown to be disproportion-
ate between alternatives of positive dependence (with κ positive) and negative dependence
( with κ negative ). This disproportionate behavior disappears in the case of its "sym-
metrized" version TK1B. Second, the power of tests based on TK2 is better than that based
on TK1A or TK1B. Recall that the test statistic T
K2 involves the nonparametric estimator for
E[(γKy )(Ui, Yi)|Ui] = 0, whereas TK1A and TK1B simply uses zero in its place. The result shows
the interesting phenomenon that the use of the nonparametric estimator improves the power
of the test even when we know that it is equal to zero in population. The results from the
bootstrap works well and the performance is stable over the choice of order in the series and
over the alternatives under which the test is performed.
[INSERT TABLES 2 AND 3]
29
7.2 The Effect of Nonparametric Estimation of gYy (u)
We consider testing the conditional independence of Y and Z given U where Z is a binary
variable and (Y,U) has uniform marginals over [0, 1]2. We generate
Y = α((1− κ)U + κ(Zε+ (1− Z)U)) + (1− α)ε,
where ε follows a standard normal distribution and independent of U and Z, and Z is
generated by
Z = 1 {U − η > 0}
where η is a uniform variate [0, 1]. Hence P{Z = 1} = 1/2.When κ = 0, the null hypothesisof conditional independence of Y and Z given U holds. As κ gets bigger the role of Z in
directly contributing to Y without through U becomes more significant. In this case, we can
explicitly compute the conditional distribution function gYy (u) as follows
gYy (u) = P {Y ≤ y|U = u, Z = 1}P {Z = 1}+ P {Y ≤ y|U = u, Z = 0}P {Z = 0}
=1
2
µΦ
µy − α(1− κ)u
1− α(1− κ)
¶+ Φ
µy − αu
1− α
¶¶.
Note that E [Z|U ] = U. Hence σ2(U) =√U − U2.
We noted before that the estimation error in gYy (Ui) has an effect of improving the power
of the test. In order to investigate this in finite samples, we consider the two test statistics
T INFn = sup
(u,y)∈[0,1]2
¯¯ 1√n
nXi=1
1{Ui ≤ u}Zipσ22(Ui)
¡1{Yi ≤ y}− gYy (Ui)
¢¯¯ andTFEASn = sup
(u,y)∈[0,1]2
¯¯ 1√n
nXi=1
1{Ui ≤ u}Ziqσ22(Ui)− σ42(Ui)
¡1{Yi ≤ y}− gYy (Ui)
¢¯¯ .Note that gYy (u) and σ2(U) are completely known in this case.
We take the sample size equal to n = 3000 and set the Monte-Carlo simulation number
to equal to 1000. We consider the estimated density of the test statistics both under the null
(κ = 0) and under the alternative (κ = 0.5). Here we use the quantile transform of the data,
use the series estimation with Legendre polynomials with the number of terms equal to 5
when univariate and 52 when bivariate. The result is shown in Figure 1. Under the null, the
behavior of the two test statistics is more or less similar as consistent with the asymptotic
theory. However, under the alternative, the behavior becomes dramatically different. The
shift is more prominent in the case of TFEASn than in the case of T INF
n . This was expected
30
from our analysis of power from the previous section, and reflects our theoretical observation
that the use of gYy (Ui) instead of gYy (Ui) improves the local power properties of the tests.
This is also observed in terms of empirical processes inside T INFn and TFEAS
n as shown in
Figure 2.
[INSERT FIGURES 1 AND 2]
8 Conclusion
A test of conditional independence has been proposed. The test is asymptotically unbiased
against Pitman local alternatives and yet has critical values that can be tabulated from
the limiting distribution theory. As in Song (2007), the main idea is to employ conditional
martingale transforms, but the manner that we apply the transforms in testing conditional
independence is different. We apply the transform to the indicator functions of Zi and Yi
with the conditioning on Ui.
There are two possible extensions: first, the analysis of the asymptotic power function,
and second, the accommodation of weakly dependent data. The analysis of the asymptotic
power function will illuminate the effect of the martingale transform on the power of the
tests. The i.i.d. assumption in testing conditional independence is restrictive considering
that testing the presence of causal relations often presupposes the context of time series.
Extending the results in the paper to time series appears nontrivial because the methods
of this paper heavily draws on the results of empirical process theory that are based on
independent series. However, there have been numerous efforts in the literature to extend
the results of empirical processes to time series (e.g., Andrews (1991), Arcones and Yu (1994),
and Nishiyama (2000).) It seems that the results in this paper can be extended to time series
context based on those results.
9 Appendix: Mathematical Proofs
9.1 Main Results
In this section, we present the proofs of the main result. Throughout the proofs, the notation
C denotes a positive absolute constant that may assume different values in different contexts
in which it is used. We first prove Theorem 1. The following lemma is useful for this end.
31
Define
γu,n,λ(x) , 1(1
n
nXi=1
1 {λ(Xi) ≤ λ(x)} ≤ u
).
The function γu,n,λ(·) is indexed by (u, λ) ∈ [0, 1]×Λ0 where Λ0 , {λθ(·) : θ ∈ B(θ0, δ)}. Theindicator function contains the nonparametric function which is an empirical distribution
function. Note that we cannot directly apply the framework of Chen, Linton and van Keile-
gom (2003) to analyze the class of functions that the realizations of γu,n,λ(x) falls into. This
is because when the indicator function contains a nonparametric function, the procedure of
Chen, Linton, and van Keilegom (2003) requires the entropy condition for the nonparametric
functions with respect to the sup norm. This entropy condition with respect to the sup norm
does not help because Lp uniform continuity fails with respect to the sup norm. Hence we
establish the following result for our purpose.
Lemma A1 : Assume that supλ∈Λ0 supλ∈R fλ(λ) < ∞, where fλ(·) denotes the density ofλ(X). Then there exists a sequence of classes Gn of measurable functions such that
P©γu,n,λ ∈ Gn for all (u, λ) ∈ [0, 1]× Λ0
ª= 1,
and for any r ≥ 1,
logN[](C2ε,Gn, || · ||r) ≤ logN(εr/2,Λ0, || · ||∞) +C1ε,
where C1 and C2 are positive constants depending only on r.
Proof of Lemma A1 : For a fixed sequence {xi}ni=1 ∈ RndX , we define a function:
gn,λ,u(λ) , 1(1
n
nXi=1
1©λ(xi) ≤ λ
ª≤ u
),
where gn,λ,u(·) depends on the chosen sequence {xi}ni=1 .We introduce classes of functions asfollows:
G0n ,©gn,λ,u : (λ, {xi}ni=1, u) ∈ Λ0 ×RndX × [0, 1]
ªand
Gn , {g ◦ λ : (g, λ) ∈ G0n × Λ0} .
Then we have γu,n,λ ∈ Gn. Note that gn,λ,u(λ) is decreasing in λ. Hence by Theorem 2.7.5 in
32
van der Vaart and Wellner (1996), for any r ≥ 1
logN[](ε,G0n, || · ||Q,r) ≤C1ε, (25)
for any probability measure Q and for some constant C1 > 0.
Without loss of generality, assume that λ(X) ∈ [0, 1], because otherwise, we can considerΦ ◦ Λ0 = {Φ ◦ λ : λ ∈ Λ0} in place of Λ0, where Φ is the distribution function of a standardnormal variate. We choose {λ1, · · ·, λN2} such that for any λ ∈ Λ0, there exists j with
kλj − λk∞ < εr/2. For each j, we define λj(x) as follows.
λj(x) = mεr/2 when λj(x) ∈ [mεr/2, (m+ 1)εr/2) for some m ∈ {0, 1, 2, · · ·, b2/εrc},
where bzc denotes the greatest integer that does not exceed z. Note that the range of λj is
finite and ||λ− λj||∞ is equal to
supx
b2/εrcXk=0
¯λ(x)− kεr
2
¯1
½λj(x) ∈
∙kεr
2,(k + 1)εr
2
¶¾
≤ supx
b2/εcXk=0
µ|λ(x)− λj(x)|+
¯λj(x)−
kεr
2
¯¶1
½λj(x) ∈
∙kεr
2,(k + 1)εr
2
¶¾≤ εr.
From (25), we have logN(ε,G0n, || · ||Q,r) ≤ C1/ε. We choose {g1, · · ·, gN1} such that for anyg ∈ G0n, there exists j such that
R 10|g(λ)− gj(λ)|rdλ < εr. Then we have for any λ ∈ Λ0,
E |g(λ(X))− gj(λ(X))|r =Z 1
0
¯g(λ)− gj(λ)
¯rfλ(λ)dλ ≤ sup
λsupλ
fλ(λ)εr.
Now, take any h ∈ Gn such that h , g ◦ λ and choose (gj1, λj2) from {g1, · · ·, gN1} and{λ1, · · ·, λN2} such that
R 10|g(λ)− gj1(λ)|rdλ < εr and kλj2 − λk∞ < εr/2. Then consider°°°h− gj1 ◦ λj2
°°°r≤ kg ◦ λ− gj1 ◦ λkr +
°°°gj1 ◦ λ− gj1 ◦ λj2°°°r
≤µsupλsupλ
fλ(λ)
¶1/rε+
³E¯(gj1 ◦ λ)(X)− (gj1 ◦ λj2)(X)
¯r´1/r.
The absolute value in the expectation of the second term is bounded by
1
(1
n
nXi=1
1nλj1(xi) ≤ λj2(X)− εr
o≤ uj1
)− 1
(1
n
nXi=1
1nλj1(xi) ≤ λj2(X) + εr
o≤ uj1
),
33
or by
1
(1
n
nXi=1
1nλj1(xi) ≤ λj2(X)− εr
o≤ uj1 <
1
n
nXi=1
1nλj1(xi) ≤ λj2(X) + εr
o)
=
b2/εcXm=0
1
(1
n
nXi=1
1
½λj1(xi) ≤
mεr
2− εr
¾≤ uj1 <
1
n
nXi=1
1
½λj1(xi) ≤
mεr
2+ εr
¾)
×1½mεr
2≤ λj2(X) <
(m+ 1)εr
2
¾.
Since P {λj2(X) ∈ [mεr/2, (m+ 1)εr/2)} ≤ supλ supλ fλ(λ)εr ≤ Cεr by the mean-value the-
orem, and the sets
Am =
(u ∈ [0, 1] : 1
n
nXi=1
1
½λj1(xi) ≤
mεr
2− εr
¾≤ u <
1
n
nXi=1
1
½λj1(xi) ≤
mεr
2+ εr
¾)
constitutes the partition of [0, 1], we deduce that
E¯(gj1 ◦ λ)(X)− (gj1 ◦ λj2)(X)
¯r≤ Cεr.
Therefore,°°°h− gj1 ◦ λj2
°°°r≤ Cε, yielding the result that
logN[](Cε,Gn, || · ||r) ≤ logN(Cεr,Λ0, || · ||∞) + logN[](ε,G0n, || · ||r)≤ logN(Cεr,Λ0, || · ||∞) + C/ε.
Assumption A1 : (i) λmin¡R
pK(u)pK(u)0du¢> 0.
(ii) There exist d1 and d2 such that (a) for some d > 0, there exist sets of vectors {πy : y ∈ R}and {πz : z ∈ R} such that
sup(y,u)∈R×[0,1]
¯pK(u)0πy − gYy (u)
¯= O(K−d1) and
sup(z,u)∈R×[0,1]
¯pK(u)0πz − gZz (u)
¯= O(K−d2),
(b) for each u ∈ [0, 1], there exist sets of vectors in RK , {πy,u : y ∈ R}, such that
sup(y,u)∈R×[0,1]
µZ 1
0
¯PK(u)0πy,u − gYy (u)1{u ≤ u}
¯pdu
¶1/p= O(K−d1),
34
where gYy (·) denotes the first order derivative of gYy (·).(iii) For d1 and d2 in (ii),
√nζ20,KK
−d = o(1) and√nζ1,KK
−d2 = o(1).
(iv) For b in the definition of Λ0, we have n1/2−2bζ30,Kζ2,K = o(1), n−1/2+1/pK1−1/pζ20,K = o(1)
and n−bζ0,K{pζ0,Kζ2,K + ζ1,K} = o(1).
Proof of Theorem 1 : Choose a sequence λ within the shrinking neighborhood B(λ0, δn) ⊂Λ of λ0 with diameter δn = o(1) in terms of || · ||2. For Gn in Lemma A1, we define a processνn(g, r, λ) indexed by (g, r, λ) ∈ Gn × S × Λ0 as
νn(g, y, z, λ) ,1√n
nXi=1
g(Xi)γz(Zi)hγy(Yi)− pK(Ui)
0πy,λi,
where πy,λ is equal to πy defined below (4) except that in place of λθ(·), λθ ∈ Λ0, is used.
By Lemma A1 and Assumption 3(iii), we obtain
logN[](ε,Gn, || · ||p) ≤ logN(εr/2,Λ0, || · ||∞) + C1/ε ≤ C log ε+ C1/ε.
This implies that the bracketing entropy of {gγz : (z, g) ∈ R× Gn} is bounded by C log ε+C1/ε.We apply Lemma 1U of Escanciano and Song (2007) to νn(g, y, z, λ) to deduce that it
is equal to
1√n
nXi=1
[g(Xi)γz(Zi)−E [g(Xi)γz(Zi)|Ui]£γy(Yi)− P {Yi ≤ y|Ui}
¤+ oP (1)
where oP (1) is uniform over (g, y, z, λ) ∈ Gn ×R×R× Λ0. Hence the above holds for any
arbitrary sequence gn ∈ Gn such that gn(x) → γu((Fθ0 ◦ λθ0)(x)). This yields the wantedresult.
We define two notations that we use throughout the remaining proofs. First, P denotesthe operation of sample mean: Pf , 1
n
Pni=1 f(Si) and second,
⊥ denotes the conditional
mean deviation: γ⊥y (Y,U) , γy(Y )−E£γy(Y )|U
¤.We also denote for a real-valued function
g(y, z) on R×R, Pg , E [g(Y,Z)] and (PUg)(u) , E [g(Y, Z)|U = u] .
Proof of Corollary 1 : (i)(a) and (ii)(b): Assume the null hypothesis. We first considerν1n(r).The convergence of finite dimensional distributions of the process
√nPn
£γuγ
⊥z γ
⊥y
¤follows
by the usual CLT. The finite second moment condition trivially follows from the uniform
boundedness of the indicator functions. And it is trivial to compute the covariance function
of the process√nPn[γuγz⊥γy⊥] to verify the covariance function of the limit Gaussian process
in the theorem. The weak convergence√nPn[γuγz⊥γy⊥] of follows from the fact that the
35
class of univariate indicator functions is VC and products of a finite number of the indicator
functions are also VC by the stability properties of VC classes (see van der Vaart and Well-
ner(1996)). We can likewise analyze the limit behavior of the process νn(r) =√nPn
£γuγzγ
⊥y
¤to obtain the wanted result.
(i)(b) and (ii)(b): Write
n−1/2νn(r) = Pn£γuγzγ
⊥y
¤= Pnγu
£γzγ
⊥y
¤⊥+ Pn[γu[PUρw]−P{γu[PUρw]}] +P[γuρw],
where ρw , γz(γy −PUγy). Now, assume that the null hypothesis does not hold. Using the
preliminary result in (7), we write
n−1/2νn(r) = Pn£γuγ
⊥z γ
⊥y
¤+ oP (1)
= Pnγu£γ⊥z γ
⊥y
¤⊥+ Pn[γu[PUρw]−P{γu[PUρw]}] +P[γuρw] + oP (1),
by adding and subtracting terms. The first and second processes are a mean-zero empirical
process, and using the similar arguments in (i), it is not hard to show that the processes are
indexed by a Glivenko-Cantelli class. Hence, the result follows.
Proof of Theorem 2 (i) The proof is similar to the proof of Proposition 6.1 of Khmaladzeand Koul (2004).
(ii) (a) We apply Theorem 1 to the process
νKn (r) =1√n
nXi=1
γu(Ui)γKz (Zi){γKy (Yi)− (PUγKy )(Yi, Ui)}
by replacing γz and γy by γKz and γKy . In view of the proof of Theorem 1, the main step
is applying Lemma 1U of Escanciano and Song (2007) to the above process. To this end,
it suffices to show that the classes {γKy : y ∈ R} and {γKz : z ∈ R} satisfy the requiredbracketing entropy conditions. This is shown in Lemma A3(a) of Song (2007). Hence by
following the same steps in the proof of Theorem 1, we obtain that under the null hypothesis,
νKn (r) =√nPn[γu(γKz )⊥(γKy )⊥] + oP (1). (26)
Furthermore, by observing from (i) of this theorem, PU(γKy ) = 0 and PU(γKz ) = 0. Note
that under the null hypothesis of conditional independence, (γKz )(Ui, Zi) and (γKy )(Ui, Yi) are
36
conditionally independent given Ui. Therefore
PU(γKz γKy ) = P
U(γKz )PU(γKy ) = 0.
Therefore, under the null hypothesis, we have
νKn (r) =√nPn[γu(γKz )(γKy )] + oP (1).
We investigate the process√nPn[γu(γKz )(γKy )] and establish its weak convergence. The
convergence of finite dimensional distributions follows by the usual central limit theorem.
Note that the functions γKy and γKz are uniformly bounded. By Lemma A3(a) of Song
(2007), the class A = {γu(γKz )(γKy ) : (u, z, y) ∈ S} is totally bounded and the process√nPn[γu(γKz )(γKy )] is stochastically equicontinuous in A, and hence the weak convergence of
the process√nPn[γu(γKz )(γKy )] follows by Proposition in Andrews (1994), p. 2251.
The covariance function of this process is equal to P£γu1∧u2(γ
Kz1)(γKz2)(γ
Ky1)(γKy2)
¤. Under
the null hypothesis of conditional independence, the covariance function becomes
P£γu1∧u2P
U£(γKz1)(γ
Kz2)(γKy1)(γ
Ky2)¤¤
= P£γu1∧u2P
U£γKz1γ
Kz2
¤PU
£γKy1γ
Ky2
¤¤= P
£γu1∧u2P
U£γz1γz2
¤PU
£γy1γy2
¤¤= P
£γu1∧u2γz1∧z2γy1∧y2
¤.
By the null hypothesis of conditional independence, the above is equal toZ u1∧u2FZ|U(z1 ∧ z2|u)FY |U(y1 ∧ y2|u)du.
Now, when we replace γKu , γKz , and γKy by γ
Ku , γ
Kz , and γKy , the covariance function becomesZ u1∧u2 ∙Z z1∧z2
f−1Z|U(z|u)fZ|U(z|u)dz¸ ∙Z y1∧y2
f−1Y |U(y|u)fY |U(y|u)dy¸du
= (u1 ∧ u2)(z1 ∧ z2)(y1 ∧ y2).
(b) The result is immediate from (26).
Here, we review the notion of asymptotic Bahadur efficiency. We consider the para-
metrization that we introduced in the beginning of subsection 6.1. Denote by hT (v) and
hS(v) the limits of the rejection probability of tests based on Γνn and Γνn under the null
37
hypothesis. That is, under the null hypothesis
P {Tn ≥ v}→ hT (κ) and P {Sn ≥ κ}→ hS(κ)
as n→∞. This often boils down to the computation of tail probabilities of a functional of
Gaussian processes. Suppose that there exist constants bT and bS satisfying
lnhT (κ) = −1
2bTκ
2 + o(1) and lnhS(κ) = −1
2bSκ
2 + o(1) as κ→∞. (27)
On the other hand, suppose that under the fixed alternative Pτ , we have numbers ϕT
and ϕS on S such that1√nTn →p ϕT , and
1√nSn →p ϕS. (28)
Then the approximate Bahadur efficiency (denoted by eBT,S(τ)) at Pτ of a test based on the
test statistic Tn relative to one that is based on Sn is computed from (Nikitin (1995))
eBT,S(τ) =bTbS×½ϕT
ϕS
¾2. (29)
The approximate Bahadur efficiency is composed of two ratios. The first ratio bT/bS mea-
sures the relative tail behavior of the limiting distribution of the test statistic under the null
hypothesis. The second ratio ϕT/ϕS measures how sensitively the test statistics deviate from
the null limiting distribution in large samples under the alternatives. The limiting Pitman
efficiency e∗T,S coincides with limτ↓0 eBT,S(τ) when the conditions of Wieand (1976) is satisfied
and the limit limτ↓0 eBT,S(τ) exists.
Proof of Corollary 2 : It is obvious that the sequences {Tn} and {Sn} are standardsequences as defined in Bahadur (1960). Hence, we prove Condition III∗ of Wieand (1976).
Let bP =supr∈S |√nhγu, γz(γu − gYy )i| and
Un(r) = νn(r)−√nhγu, γz(γu − gYy )i.
Then, by the result of Corollary 1, limn→∞ P {supr∈S |Un(r)| < v} = Q(v), where Q is the
distribution function supr∈S |ν1(r)|. Now, observe that for any ε > 0 and δ ∈ (0, 1), thereexists C > 0 such that for all n > C/b2P ,
P©|n−1/2Tn − bP | < εbP
ª≥ P {supr∈S |Un(r)| < εbP} > 1− δ
if P ∈ {Pτ : τ ∈ (0, τ 0]}, by Lemma of Wieand (1976), p.1007. Hence the sequence Tn
38
satisfies the Condition III∗ of Wieand (1976). Similarly, we can show that the sequence Snalso satisfies Condition III∗ because the local shifts under the fixed alternatives are the same.
(See Corollary 1(i)(b) and (ii)(b).) Hence the limiting Pitman efficiency coincides with the
approximate Bahadur efficiency.
From now on, we omit the notation that represents dependence on τ for notational
brevity. The covariance kernels of the limit Gaussian processes of ν(r) and ν1(r) in Theorem
1 are respectively given by
c(r1, r2) =
Z u1∧u2
0
gZz1∧z2(u; τ)(gYy1∧y2(u)− gYy1(u)g
Yy2(u))du,
c1(r1, r2) =
Z u1∧u2
0
(gZz1∧z2(u)− gZz1(u)gZz2(u))(gYy1∧y2(u)− gYy1(u)g
Yy2(u))du.
Therefore, (e.g. Lemma 4.8.1 in Nikitin (1995)) we have
limκ→∞
κ−2 logP {supr∈S |ν(r)| ≥ κ} = −12bT ,
limκ→∞
κ−2 logP {supr∈S |ν1(r)| ≥ κ} = −12bS,
where the constants bT and bS are given by
b−1T = supr∈S
c(r, r) = supy∈R
Z 1
0
(gYy (u)− gYy (u)2)du (30)
and b−1S = supr∈S
c1(r, r) = sup(z,y)∈R2
Z 1
0
(gZz (u)− gZz (u)2)(gYy (u)− gYy (u)
2)du
HencebTbS=sup(z,y)∈R2
R 10(gZz (u)− gZz (u)
2)(gYy (u)− gYy (u)2)du
supy∈RR 10(gYy (u)− gYy (u)
2)du≤ 14.
Now let us consider the case of alternatives. The results of Theorem 1 yields that the
numbers ϕT and ϕS in (28) are equal, given by
ϕT = sup(u,z,y)∈S |Pτ
£γu(hγz, γyiU − hγz, 1iUhγy, 1iU)
¤|.
Therefore, we obtain the wanted result from (29).
Proof of Corollary 3 : Trivial from the result of Theorem 2.
Proof of Corollary 4 : We can easily check Condition III∗ of Wieand(1976) for Tn andSn similarly as in the proof of Corollary 2. Hence, we compute the approximate Bahadur
39
efficiency. The covariance kernels of the limit Gaussian processes of νKna(r) and νKn,E(r) in
Theorem 1 are identical. Hence for bT and bS in (27), bT = bS. Now, the wanted result
follows by plugging (24) into (29).
Proof of Corollary 5 : Again, we can easily check Condition III∗ of Wieand(1976) for Tnand Sn similarly as in the proof of Corollary 2. The computation of e∗T,S is again via the
approximate Bahadur efficiency.
The covariance kernels of the limit Gaussian processes of νK(r) is given by
cK(r1, r2) =Z u1∧u2
0
gZz1∧z2(u)gYy1∧y2(u)du,
yielding the result that (e.g. Lemma 4.8.1 in Nikitin (1995)) we have
limκ→∞
κ−2 logP {supr∈S |νK(r)| ≥ κ} = −bS/2,
where the constant bS is given by b−1S = supr∈S cK(r, r) = 1. Therefore,
bTbS= 1/
(sup
(z,y)∈R2
Z 1
0
(gZz (u)− gZz (u)2)(gYy (u)− gYy (u)
2)du
),
as bS in (30) is bT now. On the other hand, under Pτ , we have
ϕT = sup(u,z,y)∈S |Eτ
£γu{hγz, γyiU − hγz, 1iUhγy, 1iU}
¤|
= sup(u,z,y)∈S |Eτ
£γu(U)
©CZ,Y |U(gZz (U), g
Yy (U)|U)− gZz (U)g
Yy (U)
ª¤|.
Now, we turn to ϕS. Observe that
ϕS = sup(u,z,y)∈S |Pτ
£γu(hγKz , γKy iU − hγKz , 1iUhγKy , 1iU)
¤| (31)
= sup(u,z,y)∈S |Pτ
£γuhγKz , γKy iU
¤| = sup(u,z,y)∈S |Eτ
£(Kγz)(U ;Z)(Kγy)(U ;Y )|U
¤|.
The second equality follows from the conditional orthogonality property of K.First, assume that ξ(z, ε) is strictly increasing in z. We observe that the expectation in
the last term in (31) is equal to
Eτ [(Kγz)(U ;Z)(Kγξ−1(y,ε))(U ;Z)|U ] (32)
in this case. (When ξ(z, ε) is strictly decreasing in z, the expectation in the last term in (31)
40
is equal to
Eτ [(Kγz)(U ;Z)(K(1− γξ−1(y,ε)))(U ;Z)|U ]= −Eτ [(Kγz)(U ;Z)(Kγξ−1(y,ε))(U ;Z)|U ],
because K is a linear operator and Ka = 0 for any constant a. Then we can follow the
subsequent steps in a similar manner.) By the conditional isometry condition, the term in
(32) is equal to
Eτ [γz(Z)γξ−1(y,ε)(Z)|U ] = Eτ
£FZ|U
¡z ∧ ξ−1 (y, ε) |U
¢|U¤
= Pτ
©Z ≤ z, Z ≤ ξ−1 (y, ε) |U
ª= Pτ {Z ≤ z, Y ≤ y|U}
= CZ,Y |U(gZz (U ; τ), gYy (U ; τ)|U).
Therefore, (omitting the notation for dependence on τ)
ϕT = sup(z,y,u)∈S
¯E£γu(U)
©CZ,Y |U(gZz (U), g
Yy (U)|U)− gZz (U)g
Yy (U)
ª¤¯and
ϕS = sup(z,y,u)∈S
E£γu(U)CZ,Y |U(gZz (U), g
Yy (U)|U)
¤= 1.
Combining the results, we obtain the wanted statement.
Suppose gZz (u) and gYy (u) does not depend on u. Then
e∗T,S ≤ 4max(
sup(z,y)∈[0,1]2
{z ∧ y − zy}2, sup(z,y)∈[0,1]2
{zy −max{z + y − 1, 0}}2)=1
4.
Recall that z ∧ y and max{z + y − 1, 0} represent the upper and lower Frechet-Hoeffdingbounds. (e.g. Nelsen (1998)).
9.2 Asymptotic Validity of Feasible Martingale Transforms
In this section, we briefly sketch how the feasible martingale transform that we introduced
in a previous section can be shown to be asymptotically valid. The details can be filled
following the proof of Theorem 2 of Song (2007) and hence are omitted.
Let V Yi = FY (Yi) and V Z
i = FZ(Zi). From Theorem 2(ii), it is not hard to show that
1√n
nXi=1
γu(Ui)γKz (Ui, V
Zi )γ
Ky (Ui, V
Yi )ÃW (r),
41
where γKz (Ui, VZi ) and γKy (Ui, V
Yi ) are equal to γKz (Ui, Zi) and γKy (Ui, Yi) with Zi and Yi
replaced by V Zi and V Y
i . Hence it suffices to show that
1√n
nXi=1
γu(Ui)γKz (Ui, V
Zi )γ
Ky (Ui, V
Yi ) =
1√n
nXi=1
γu(Ui)γKz (Ui, V
Zi )γ
Ky (Ui, V
Yi ) + oP (1),
uniformly over r ∈ S00. The difference between the two sums is equal to
1√n
nXi=1
γu(Ui)γKz (Ui, V
Zi )nγKy (Ui, V
Yi )− γKy (Ui, V
Yi )o
+1√n
nXi=1
γu(Ui)γKy (Ui, V
Yi )nγKz (Ui, V
Zi )− γKz (Ui, V
Zi )o
+1√n
nXi=1
nγu(Ui)− γu(Ui)
oγKy (Ui, V
Yi )γ
Kz (Ui, V
Zi )
= A1n +A2n +A3n say.
We consider A1n first. First define
ψ(z0, x; z, u) , γu(Fn,θ(λθ(x)))γKz (Fn,θ(λθ(x)), F
Zn (z
0)) and
ψ0(z0, x; z, u) , γu(Fθ0(λθ0(x)))γ
Kz (Fθ0(λθ0(x)), FZ(z
0))
and
ϕ(y0, x; y, v) , 1{0 < F Yn (y
0) ≤ y ∧ v}qf(F Y
n (y0), Fn,θ(λθ(x)))
n1− F (F Y
n (y0)|Fn,θ(λθ(x)))
o andϕ0(y
0, x; y, v) , 1{0 < FY (y0) ≤ y ∧ v}q
f(FY (y0), Fθ0(λθ0(x))) {1− F (FY (y0)|Fθ0(λθ0(x)))}.
Then, A1n is equal to
1√n
nXi=1
ψ(Zi,Xi; z, u)
½Ehϕ(Yi,Xi; y, y)|Ui
iy=Yi−E [ϕ0(Yi, Xi; y, y)|Ui]y=Yi
¾.
Under the regularity conditions of the kind in Assumption 6 and conditions for basis functions
in Assumption 2U of Song (2007) for ϕ and ψ, we can deduce
A1n =1√n
nXi=1
Ehψ0(Zi,Xi; z, u)
nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi
o|Ui
iy=Yi
+oP (1).
(33)
42
Under the null hypothesis of conditional independence,
Ehψ0(Zi,Xi; z, u)
nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi
o|Ui
iy=Yi
= EhE [ψ0(Zi,Xi; z, u)|Yi, Ui]
nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi
o|Ui
iy=Yi
= 0
since
E [ψ0(Zi,Xi; z, u)|Yi, Ui] = E [ψ0(Zi,Xi; z, u)|Ui] = γu(Ui)hγKz , 1iUi = 0
by conditional independence and by Theorem 2(i). Under the local alternatives such that
|E [ψ0(Zi,Xi; z, u)|Yi, Ui]| = oP (1),
A1n = oP (1) because the process in (33) is a mean-zero process. (This follows by establishing
that the class of functions that index the process in (33) is P -Donsker.) Hence both under
the null hypothesis and under local alternatives, A1n = oP (1). The second term A2n can be
dealt with similarly.
Let us turn to the third term A3n.We replace Fn,θ(λθ(·)) in Ui by an arbitrary determin-
istic function G that belongs to a class Fn such that
P {Fn ∈ Fn}→ 1 and logN[](ε,Fn, || · ||p) < Cε−b
for b ∈ [0, 2), and for each G ∈ Fn, we have ||Fθ0 ◦λθ0−G||∞ = O(n−1/2). Then, we consider
the process
1√n
nXi=1
{γu(G(Xi))− γu(Ui)} γKy (Ui, VYi )γ
Kz (Ui, V
Zi ), (G, u, y, z) ∈ Fn × S,
which can be shown to be equivalent to
√n {γu(G(Xi))− γu(Ui)}E
hγKy (Ui, V
Yi )γ
Kz (Ui, V
Zi )|G(Xi), Ui
i+ oP (1).
Then using Lemma A2(ii) of Song (2006) and Theorem 2(i),
EhγKy (Ui, V
Yi )γ
Kz (Ui, V
Zi )|G(Xi), Ui
i= OP (n
−1/2).
Now, from the fact that ||Fθ0 ◦ λθ0 −G||∞ = O(n−1/2), it follows A3n = oP (1).
43
References
[1] Ai, C. and X. Chen (2003), "Efficient estimation of models with conditional moment
restrictions containing unknown functions," Econometrica 71, 1795-1843.
[2] Altonji, J. and R. Matzkin (2005), "Cross-section and panel data estimators for non-
separable models with endogenous regressors," Econometrica 73, 1053-1102.
[3] Andrews, D. W. K (1991), "An empirical process central limit theorem for dependent
nonidentically distributed random variables," Journal of Multivariate Analysis, 38, 187-
203.
[4] Andrews, D. W. K (1994), “Empirical process method in econometrics,” in The Hand-
book of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden, Amsterdam:
North-Holland.
[5] Andrews, D. W. K (1995), "Nonparametric kernel estimator for semiparametric mod-
els," Econometric Theory, 11, 560-595.
[6] Andrews, D. W. K (1997), “A conditional Kolmogorov test,” Econometrica, 65, 1097-
1128.
[7] Angrist, J. D. and G. M. Kuersteiner (2004), "Semiparametric causality tests using the
policy propensity score," Working paper.
[8] Arcones, M. A. and B. Yu (1994), "Central limit theorems for empirical and U-processes
of stationary mixing sequences," Journal of Theoretical Probability, 7, 47-71.
[9] Bai, J. (2003), "Testing parametric conditional distributions of dynamic models," Re-
view of Economics and Statistics, 85, 531-549.
[10] Bai, J. and S. Ng (2001), "A consistent test for conditional symmetry in time series
models," Journal of Econometrics, 103, 225-258.
[11] Bahadur, R. R. (1960), "Stochastic comparison of tests," Annals of Mathematical Sta-
tistics, 31, 276-295.
[12] Bierens, H. J. (1990), “A consistent conditional moment test of functional form,” Econo-
metrica 58, 1443-1458.
[13] Bierens, H. J. and W. Ploberger (1997), “Asymptotic theory of integrated conditional
moment tests,” Econometrica 65, 1129-1151.
44
[14] Cawley, J. and T. Phillipson (1999), "An empirical examination of information barriers
to trade insurance," American Economic Review 89, 827-846.
[15] Chen, X., O. Linton, and I. van Keilegom (2003), "Estimation of semiparametric models
when the criterion function is not smooth," Econometrica 71, 1591-1608.
[16] Chiappori, P-A, and B. Salanié (2000), "Testing for asymmetric information in insurance
markets," Journal of Political Economy, 108, 56-78.
[17] Chiappori, P-A, B. Jullien, B. Salanié, and F. Salanié (2002), "Asymmetric information
in insurance: general testable implications," mimeo.
[18] Chung, K. L. (2001), A Course in Probability Theory, Academic Press, 3rd Ed.
[19] Dawid, P. (1980), "Conditional independence for statistical operations," Annals of Sta-
tistics, 8, 598-617.
[20] Delgado, M. A., W. Gonzáles Manteiga (2001), "Significance testing in nonparametric
regression based on the bootstrap," Annals of Statistics 29, 1469-1507.
[21] Fan, Y. and Q. Li (1996), “Consistent model specification tests: omitted variables,
parametric and semiparametric functional forms,” Econometrica, 64, 865-890.
[22] Hahn, J. (1998), "On the role of the propensity score in efficient semiparametric esti-
mation of average treatment effects," Econometrica, 66, 315-331.
[23] Härdle, W. and E. Mammen (1993), “Comparing nonparametric versus parametric re-
gression fits,” Annals of Statistics, 21, 1926-1947.
[24] Heckman, J. J., H. Ichimura, J. Smith, and P. Todd (1998), "Characterizing selection
bias using experimental data," Econometrica, 66, 1017-1098.
[25] Heckman, J. J., J. Smith, and N. Clemens (1997), "Making the most out of programme
evaluations and social experiments: accounting for heterogeneity in programme im-
pacts," Review of Economic Studies, 64, 487-535.
[26] Hirano, K., G. W. Imbens, and G. Ridder (2003), "Efficient estimation of average
treatment effects using the estimated propensity score," Econometrica, 71, 1161-1189.
[27] Khmaladze, E. V. (1993), “Goodness of fit problem and scanning innovation martin-
gales,” Annals of Statistics, 21, 798-829.
45
[28] Khmaladze, E. V. and H. Koul (2004), “Martingale transforms goodness-of-fit tests in
regression models,” Annals of Statistics, 32, 995-1034.
[29] Linton, O. and P. Gozalo (1997), "Conditional independence restrictions: testing and
estimation", Cowles Foundation Discussion Paper 1140.
[30] McKeague, I. W. and Y. Sun (1996), “Transformations of gaussian random fields to
Brownian sheet and nonparametric change-point tests,” Statistics and Probability Let-
ters, 28, 311-319.
[31] Nelsen, R. B. (1999), An Introduction to Copulas, Springer-Verlag, New York
[32] Newey, W. K. (1997), "Convergence rates and asymptotic normality of series estima-
tors," Journal of Econometrics, 79, 147-168.
[33] Nikitin, Y. (1995), Asymptotic Efficiency of Nonparametric Tests, Cambridge University
Press, New York.
[34] Nishiyama, Y. (2000), Entropy methods for martingales, CWI Tract, 128.
[35] Rosenbaum, P. and D. Rubin (1983), "The central role of the propensity score in ob-
servational studies for causal effects," Biometrika, 70, 41-55.
[36] Patton, A. (2006), "Modeling asymmetric exchange rate dependence," International
Economic Review, 47, 527-556.
[37] Rosenblatt, M. (1952), "Remarks on a multivariate transform," Annals of Mathematical
Statistics, 23, 470-472.
[38] Rubin, D. "Bayesian inference for causal effects: the role of randomization," Annals of
Statistics, 6, 34-58.
[39] Song, K. (2006), "Uniform convergence of series estimators over function spaces," Work-
ing Paper.
[40] Song, K. (2007), "Testing semiparametric conditional moment restrictions using condi-
tional martingale transforms," Working Paper.
[41] Stinchcombe, M. B. and H.White (1998), "Consistent specification testing with nuisance
parameters present only under the alternative," Econometric Theory, 14, 295-325.
[42] Stute, W. and L. Zhu (2005), “Nonparametric checks for single-index models,” Annals
of Statistics, 33, 1048-1083.
46
[43] Su, L. and H. White (2003a), "Testing conditional independence via empirical likeli-
hood," Discussion Paper, University of California San Diego.
[44] Su, L. and H. White (2003b), "A nonparametric Hellinger metric test for conditional
independence," Discussion Paper, University of California San Diego.
[45] Su, L. and H. White (2003c), "A characteristic-function-based test for conditional in-
dependence," Discussion Paper, University of California San Diego.
[46] van der Vaart, A. W. and J. A. Wellner (1996), Weak Convergence and Empirical
Processes, Springer-Verlag.
47
Table 1 : Size of the Tests : Nominal size is set at 5 Percent 9
n = 300
Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap
4 0.034 0.054 0.061 0.060
42 5 0.050 0.077 0.076 0.055
6 0.058 0.072 0.073 0.056
7 0.076 0.118 0.088 0.055
4 0.050 0.044 0.070 0.039
52 5 0.037 0.064 0.059 0.056
6 0.059 0.099 0.078 0.046
7 0.090 0.120 0.115 0.042
4 0.035 0.040 0.055 0.050
62 5 0.032 0.068 0.057 0.055
6 0.054 0.092 0.085 0.056
7 0.070 0.106 0.096 0.049
4 0.041 0.048 0.067 0.061
72 5 0.047 0.073 0.066 0.051
6 0.062 0.084 0.082 0.048
7 0.080 0.121 0.089 0.055
9Order 1 represents the number of terms included in the bivariate density function estimation usingLegendre polynomials. Order 2 represents the number of terms included in the estimation of conditionalexpectation given Ui using Legendre polynomials.MG trns (TK1A) and MG trns (T
K2 ) indicate tests based on martingale transforms, the former using the test
statistic TK1A which is based on the true conditional expectation of γKy (Y ) given U and the latter using the
test statistic TK2 which is based on the estimated conditional expectation of γKy (Y ) given U. MG trns (TK1B)is the symmetrized version of MG trns (TK1A).
48
Table 2 : Power of the Tests : Conditional Positive Dependence
n = 300
Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap
52 5 0.193 0.191 0.229 0.216
κ = 0.2 6 0.214 0.190 0.225 0.203
62 5 0.187 0.196 0.226 0.230
6 0.245 0.213 0.269 0.243
52 5 0.575 0.581 0.678 0.676
κ = 0.4 6 0.604 0.605 0.667 0.664
62 5 0.591 0.585 0.679 0.673
6 0.626 0.612 0.694 0.701
52 5 0.909 0.938 0.953 0.950
κ = 0.6 6 0.930 0.921 0.957 0.956
62 5 0.898 0.927 0.949 0.949
6 0.919 0.912 0.951 0.948
49
Table 3 : Power of the Tests : Conditional Negative Dependence
n = 300
Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap
52 5 0.024 0.176 0.208 0.235
κ = −0.2 6 0.039 0.217 0.204 0.208
62 5 0.029 0.197 0.191 0.238
6 0.036 0.229 0.201 0.201
52 5 0.070 0.614 0.652 0.636
κ = −0.4 6 0.068 0.647 0.634 0.640
62 5 0.072 0.644 0.644 0.649
6 0.062 0.656 0.628 0.650
52 5 0.367 0.944 0.956 0.935
κ = −0.6 6 0.319 0.965 0.957 0.928
62 5 0.359 0.943 0.941 0.925
6 0.306 0.949 0.950 0.933
50
0 0.5 1 1.5 2 2.5 3 3.5 40
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8Figure 1: The Empirical Density of Test Statistics both under the Null and under the Alternative: n=3000
with true F(y|x) underH0
with estimated F(y|x)under H0
with true F(y|x) underH1
with estimated F(y|x)under H1
-1.5 -1 -0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
1.2
1.4Figure 2: Empirical Density of Empirical Processes at (x,y)=(0.5,0.5) under the Null and under the Alternative: n=3000
with true F(y|x) underH0
with true F(y|x) underH1
with estimated F(y|x)under H0
with estimated F(y|x)under H1