The Plant Journal Comparative analysis of complete...
Transcript of The Plant Journal Comparative analysis of complete...
Comparative analysis of complete orthologous centromeresfrom two subspecies of rice reveals rapid variation ofcentromere organization and structure
Jianzhong Wu1,†, Masaki Fujisawa2,†,‡, Zhixi Tian3,†, Harumi Yamagata2, Kozue Kamiya2, Michie Shibata2, Satomi Hosokawa2,
Yukiyo Ito2, Masao Hamada2, Satoshi Katagiri2, Kanako Kurita2, Mayu Yamamoto2, Ari Kikuta2, Kayo Machita2, Wataru
Karasawa2, Hiroyuki Kanamori2, Nobukazu Namiki2, Hiroshi Mizuno1, Jianxin Ma3, Takuji Sasaki1 and Takashi Matsumoto1,*
1Plant Genome Research Unit, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan,2Research Division I, Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki,
Japan, and3Department of Agronomy, Purdue University, West Lafayette, IN, USA
Received 4 June 2009; revised 28 July 2009; accepted 7 August 2009.*For correspondence (fax +81 29 838 2302; e-mail [email protected]).†These authors contributed equally to this work.‡Present address: Central Laboratories for Frontier Technology, Kirin Holdings Co., Ltd, Suematsu Nonoichi, Ishikawa, Japan.
SUMMARY
Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis
and meiosis. This function is conserved across species, but the DNA components that are involved in
kinetochore formation differ greatly, even between closely related species. To shed light on the nature,
evolutionary timing and evolutionary dynamics of rice centromeres, we decoded a 2.25-Mb DNA sequence
covering the centromeric region of chromosome 8 of an indica rice variety, ‘Kasalath’ (Kas-Cen8). Analysis of
repetitive sequences in Kas-Cen8 led to the identification of 222 long terminal repeat (LTR)-retrotransposon
elements and 584 CentO satellite monomers, which account for 59.2% of the region. A comparison of the Kas-
Cen8 sequence with that of japonica rice ‘Nipponbare’ (Nip-Cen8) revealed that about 66.8% of the Kas-Cen8
sequence was collinear with that of Nip-Cen8. Although the 27 putative genes are conserved between the two
subspecies, only 55.4% of the total LTR-retrotransposon elements in ‘Kasalath’ had orthologs in ‘Nipponbare’,
thus reflecting recent proliferation of a considerable number of LTR-retrotransposons since the divergence of
two rice subspecies of indica and japonica within Oryza sativa. Comparative analysis of the subfamilies, time of
insertion, and organization patterns of inserted LTR-retrotransposons between the two Cen8 regions revealed
variations between ‘Kasalath’ and ‘Nipponbare’ in the preferential accumulation of CRR elements, and the
expansion of CentO satellite repeats within the core domain of Cen8. Together, the results provide insights
into the recent proliferation of LTR-retrotransposons, and the rapid expansion of CentO satellite repeats,
underlying the dynamic variation and plasticity of plant centromeres.
Keywords: rice Cen8, LTR-retrotransposon, centromeric retrotransposons of rice, CentO satellite, active gene,
centromere evolution.
INTRODUCTION
Despite the conserved function of the chromosomal site for
kinetochore assembly, which plays a key role in the faithful
segregation of sister chromatids during cell division, the
centromere sequences of most multicellular organisms
show tremendous variation in size and organization, even
among related species (Henikoff et al., 2001; Malik and
Henikoff, 2002; Jiang et al., 2003; Lamb et al., 2004; Henikoff
and Dalal, 2005). Human centromeres are composed of
tandemly arrayed approximately 171-bp AT-rich repeats (asatellites) that are arranged in a head-to-tail fashion, and
vary in size from 3 to nearly 4 Mb (Schueler et al., 2001).
Among the higher plants, cytological analysis developed in
the last two decades by using fluorescence in-situ hybrid-
ization (FISH) has demonstrated the presence of abundant
repetitive DNA sequences in the centromeric regions of
Arabidopsis, rice, wheat and other species (Fransz et al.,
ª 2009 The Authors 1Journal compilation ª 2009 Blackwell Publishing Ltd
The Plant Journal (2009) doi: 10.1111/j.1365-313X.2009.04002.x
1998; Heslop-Harrison et al., 1999; Heslop-Harrison, 2000;
Fukui et al., 2001). The centromeres of Arabidopsis thaliana
and rice contain 178 and 155 bp, respectively, of tandemly
arrayed satellite repeats, ranging in size from 2.8 to
approximately 4.0 Mb, and from 60 kb to approximately
1.9 Mb, respectively, on different chromosomes (Kumeka-
wa et al., 2000, 2001; Cheng et al., 2002; Hosouchi et al.,
2002). Flanked by pericentromeric sequences consisting
largely of repetitive sequences, often with the clustering of
retroelements, these satellite repeats that are postulated
to mediate spindle attachment do not show sequence
homology between those of humans, Arabidopsis and rice.
One of the significant achievements of the rice genome
sequencing project was the complete sequencing of the two
rice centromeres of chromosomes 4 (Cen4) and 8 (Cen8)
from the japonica variety ‘Nipponbare’ (Wu et al., 2004;
Zhang et al., 2004; International Rice Genome Sequencing
Project, 2005). Sequence analysis of the 1.97-Mb genomic
region in ‘Nipponbare’ Cen8 identified about 200 transpos-
able elements and 440 copies of the 155-bp centromere-
specific satellite repeat CentO (Wu et al., 2004). Further
analysis by a comprehensive chromatin immunoprecipi-
tation (ChIP)-based study demonstrated the presence of
the kinetochore, a approximately 750-kb CENH3-binding
domain that defines the boundaries of the functional Cen8
region (Nagaki et al., 2004). An important discovery of these
studies was the identification of active genes within the core
domain of ‘Nipponbare’ Cen8.
Comparative analysis of orthologous sequences within
closely related species could shed light on the processes that
give rise to sequence divergence and structural changes.
Since the centromeres of most organisms have a dynamic
structure of size variation and sequence divergence, with
conserved function, comparative genomics provides an
ideal way to investigate the diversity of centromeric
sequences, and the underlying evolutionary mechanism,
with insights into general features of centromere biology
and function. Although the structure of the central domain is
still not known perfectly in the Arabidopsis centromere, for
example, comparative studies have revealed evidence that
the sequences of tandemly arrayed satellite repeats inter-
rupted by Athila derivatives appear to evolve rapidly,
highlighting its reorganization among the different species
(Kamm et al., 1995), as well as its maintenance of conserved
and variable domains within populations (Hall et al., 2003).
Current comparison of pericentromeres from four Brassic-
aceae species (A. thaliana, Arabidopsis arenosa, Capsella
rubella and Olimarabidopsis pumila) provides support to the
model in which plant pericentromeres may experience
selective pressures, distinct from euchromatin, with toler-
ance to rapid, dynamic changes in sequence content and
structure (Hall et al., 2006). Two subspecies of the Asian
cultivated rice Oryza sativa L. ssp. indica and japonica, are
estimated to have diverged from a common ancestor about
0.44 million years ago (Ma) (Khush, 1997; Ma and Bennetzen,
2004). A partial comparison of the rice Cen8 region between
the completed sequence of ‘Nipponbare’ and the draft
sequence of the indica variety ‘93-11’ showed a high
percentage (85%) of shared long terminal repeat (LTR)-
retrotransposon insertions (Ma and Bennetzen, 2006). This
value might be an overestimation, because the incomplete
sequence from ‘93-11’ allowed comparison of only 60% of all
LTR-retrotransposon elements in the ‘Nipponbare’ Cen8. In
addition, a comparison of the CentO satellite sequences and
chromosomal organization between the two subspecies
could not be performed on account of technical difficulties
associated with the assembly and chromosomal mapping of
these repeats within the whole-genome shotgun (WGS)
sequence of ‘93-11’. Centromere function might be associ-
ated with the interspersal of centromere satellite repeats
with other repetitive elements, primarily LTR-retrotranspo-
sons (Jiang et al., 2003; Lamb et al., 2004; Henikoff and
Dalal, 2005). The complete sequencing of the genome of
different rice varieties, or species, is needed for a compre-
hensive comparative analysis to fully explain the dynamic
evolutionary changes in centromere composition and struc-
ture. Here, we report the results from the sequencing of the
Cen8 region of ‘Kasalath’ (abbreviated as Kas-Cen8 hereafter
in the text), an indica rice variety that had been used before
for the construction of a high-density rice genetic map
(Harushima et al., 1998). Comparison of a 2.25-Mb Kas-Cen8
sequence with a 2.18-Mb sequence of the orthologous
region from ‘Nipponbare’ Cen8 (abbreviated as Nip-Cen8
hereafter the text) demonstrates the presence of highly
conserved genes in this centromeric region. We also iden-
tified many insertions, deletions, and duplications of chro-
mosome segments in this region, thus demonstrating the
dramatic structural changes that have occurred in the
centromeric DNA on chromosome 8 of ‘Kasalath’ and
‘Nipponbare’ rice. Detailed analysis of the sequences and
organization patterns of repetitive elements in the two
regions of Kas-Cen8 and Nip-Cen8 suggests that the recent
insertion of LTR-retrotransposons and the amplification of
CentO satellite monomers are primarily responsible for the
structural dynamics of the rice centromeres.
RESULTS
Sequencing and structural analysis of the Kas-Cen8 region
To decode the DNA sequence of a complete centromeric
region, it is necessary to construct a physical map of the
region, indicating the position of genomic clones that carry
large fragments, such as BAC (bacterial artificial chromo-
some) and PAC (P1-derived artificial chromosome) clones.
By using the methods as described in Experimental proce-
dures, we obtained 18 MTP clones from the ‘Kasalath’ BAC
library that covered the full Cen8 region, genetically mapped
by the two centromere-flanking DNA markers C1374 and
2 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
S21882S (Figure 1). This result demonstrates the value of
the published genomic sequence of ‘Nipponbare’ for the
construction of physical maps, even in complicated chro-
mosomal regions of related rice varieties. We sequenced the
18 BAC clones and generated 2.98 Mb of sequence data
(Table S1). Sequences of CentO satellites were confirmed in
three BAC clones (K0081B11, K0110D12 and K0116D04).
K0081B11 and K0110D12 overlapped within the central sec-
tion of the Kas-Cen8 region, which was further verified by
cytological analysis using the FISH method (Figure 1).
K0116D04, which contained the DNA marker S21882S, was
located on the long-arm side of the region. After removing
redundant sequences from the overlapping regions between
the neighboring BAC clones, we generated 2 249 426 bp of
continuous, high-quality DNA sequence, covering the entire
region of Kas-Cen8.
The overall analysis of base composition in the 2.25-Mb
sequence of Kas-Cen8 revealed an average G + C content of
45.0%, higher than that (43.6%) detected from the entire
genome (International Rice Genome Sequencing Project,
2005). Annotation of the Cen8 sequence predicted a total of
390 gene models (excluding transposon-related genes),
most of which were hypothetically predicted only on a
single computer program. On the basis of known full-length
cDNA sequences in rice, we identified 27 genes that encoded
unique proteins with known or unknown functions, includ-
ing two homologs of disease-resistance genes (Table 1). Six
of these genes were located within a 793-kb subregion from
nucleotide (nt) 1 216 242–2 009 728 of the 2.25-Mb virtual
contig that, based on the results of in silico mapping,
corresponds to the core domain of ‘Nipponbare’ Cen8
(Nagaki et al., 2004). We also identified a putative gene for
TGF-beta receptor-interacting protein (K0486F02.38) located
only 4.6 kb from the CentO satellite repeats.
Detailed analysis of repeat sequences within the 2.25-Mb
Kas-Cen8 region led to our identification of 222 LTR-retro-
transposon (class-I transposable elements) sequences: 88
intact or mostly intact elements and 90 solo LTRs flanked by
standard target-site duplications (TSDs), six intact or mostly
intact elements and four solo LTRs lacking TSDs, and 34
truncated elements, each of which contained at least one
identified LTR (Table S2). Of the 94 intact or mostly intact
retrotransposons, 41 (43.6%) were present in a nested
structure because of the single or multiple insertions of
DNA sequences derived from other retroelements. On the
basis of LTR sequence homology, we grouped the above 222
LTR-retrotransposon elements into 48 subfamilies (Table 2).
The Rire3 subfamily was the most abundant, consisting of 36
elements, which were located mostly in the pericentromeric
regions, and accounted for about 300 kb of the total
K01
22H
06
K00
63H
06
K00
65E
03
K00
07B
01
K00
48F0
5
K00
31E
03
K01
55E
09
K00
39A
02
K00
23E
10
K04
86F0
2
K00
81B
11
K01
10D
12
K00
98G
01
K00
98B
12
K02
53H
11
K04
13C
07
K01
16D
04
K01
55C
03
E31
128S
(54.
0 cM
)
C13
74
(54.
3 cM
)
S218
82S
(54.
3 cM
)
C10
983S
(55.
4 cM
)
R23
81
(54.
3 cM
)
E20
691S
(54.
3 cM
)
C51
155
(54.
3 cM
)
Cen8
K486F02 CentO
5 µm
8S 8L
Figure 1. Genetic and physical maps covering the ‘Kasalath’ Cen8 region.
The genetic map, bacterial artificial chromosome (BAC) contig and fluorescence in-situ hybridization (FISH) image are presented, from top to bottom. BAC clones
containing CentO satellite repeats are shown in yellow. 8S and 8L: short and long arms of chromosome 8.
Rapid-evolution rice centromeres 3
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
Tab
le1
List
of
pu
tati
veg
enes
pre
dic
ted
inth
e‘K
asal
ath
’C
en8
reg
ion
and
seq
uen
ced
iver
gen
cere
veal
edb
yco
mp
arat
ive
anal
ysis
Gen
en
o.
Gen
eID
in‘K
asal
ath
’N
ucl
eoti
de
po
siti
on
invi
rtu
alco
nti
gP
red
icte
dfu
nct
ion
Acc
essi
on
of
mat
ched
cDN
As
inG
enB
ank
Gen
eID
in‘N
ipp
on
bar
e’S
NP
(kb
)1)
Ka:K
sIn
del
(kb
)1)
1K
0122
H06
.474
71–8
060
Un
kno
wn
pro
tein
AK
0622
97O
J121
2_C
09.3
0–
02
K01
22H
06.9
2328
1–32
642
Pu
tati
veth
reo
nyl
-tR
NA
syn
thet
ase
AK
0703
78O
J121
2_C
09.6
7.58
0.19
0.45
3K
0122
H06
.31
9363
0–98
408
Bac
teri
alb
ligh
t-re
sist
ance
pro
tein
Xa1
-lik
eA
K06
6438
OJ1
212_
C09
.23
5.39
0.18
04
K01
22H
06.4
112
902
0–13
550
4P
uta
tive
bac
teri
alb
ligh
t-re
sist
ance
pro
tein
Xa1
AK
1212
98P
0024
C06
.103
6.88
1.01
0.21
5K
0122
H06
.43-
114
102
4–14
370
9P
uta
tive
tetr
atri
cop
epti
de
rep
eat
do
mai
n1
AK
0674
99P
0024
C06
.105
-115
.47
0.24
0.43
6K
0007
B01
.32
454
183–
455
469
Pu
tati
vest
ero
idsu
lfo
tran
sfer
ase
3A
K05
8698
P00
24C
06.1
2812
.90.
990
7K
0048
F05.
1555
262
7–55
854
9U
nkn
ow
np
rote
inA
K06
1379
OS
JNB
a006
3H21
.109
2.23
0.27
08
K00
48F0
5.25
594
071–
597
344
Pu
tati
veM
GD
Gsy
nth
ase
typ
eA
AK
0641
48O
SJN
Ba0
063H
21.1
230.
840
09
K00
31E
03.2
265
246
0–66
253
0P
uta
tive
CLB
1p
rote
inA
K06
9706
P00
45D
08.1
190.
59–
010
K00
31E
03.2
467
127
5–67
725
9P
uta
tive
chlo
rid
ech
ann
elA
K06
6375
P00
45D
08.1
200.
840
011
K00
31E
03.3
570
679
3–70
900
0P
uta
tive
fert
ility
rest
ore
rA
K10
1762
P00
45D
08.1
290.
450
012
K00
31E
03.4
673
727
8–74
542
1P
uta
tive
sucr
ose
-ph
osp
hat
esy
nth
ase
AK
1016
76O
J111
5_A
07.1
050.
31–
013
K01
55E
09.1
075
454
9–75
496
5U
nkn
ow
np
rote
inA
K07
2190
OJ1
115_
A07
.107
0–
014
K01
55E
09.1
777
156
1–77
295
4P
uta
tive
per
oxi
das
ep
recu
rso
rA
K10
6760
OJ1
115_
A07
.117
0–
015
K00
23E
10.2
91
031
445–
103
911
8P
uta
tive
form
amid
op
yrim
idin
e-D
NA
gly
cosy
lase
AK
0632
95O
SJN
Ba0
051M
20.1
251.
63–
016
K00
23E
10.3
21
046
367–
105
250
1P
uta
tive
cig
3>
cyto
kin
inin
du
cib
lg
ene
AK
1057
54O
SJN
Ba0
051M
20.1
280.
660.
290
17K
0486
F02.
381
222
002–
122
807
6P
uta
tive
TG
F-b
eta
rece
pto
r-in
tera
ctin
gp
rote
in1
AK
1220
60O
SJN
Ba0
061E
21.1
210
–0
18K
0098
G01
.19
145
068
1–1
453
712
Pu
tati
veLS
Uri
bo
som
alp
rote
inL1
5PA
K07
3645
B11
00F0
3.11
80
–0
19K
0098
B12
.43
166
101
4–1
669
044
CB
Sd
om
ain
-co
nta
inin
gp
rote
in-l
ike
AK
1219
69B
1136
D08
.130
0–
0
20K
0253
H11
.38
179
299
9–1
800
001
Pu
tati
vep
oly
(A)-
bin
din
gp
rote
inA
K06
5167
P04
51H
06.1
011.
010
0
21K
0116
D04
.18
1990
625–
199
1203
Un
kno
wn
pro
tein
AK
1030
62P
0406
D01
.113
0–
1.75
22K
0116
D04
.19
199
221
8-2
017
263
Exo
cyst
com
ple
xco
mp
on
ent
Sec
8-lik
eA
K07
0862
P04
06D
01.1
140.
630
0
23K
0155
C03
.21
214
472
4–2
150
146
Cyc
lase
-lik
ep
rote
inA
K10
8030
P04
65H
09.1
310
–3.
724
K01
55C
03.2
42
161
155–
216
472
8R
ibo
nu
cleo
pro
tein
-lik
eA
K12
1802
P04
65H
09.1
350.
950
025
K01
55C
03.2
6-1
217
096
2–2
173
643
Sex
ual
dif
fere
nti
atio
np
roce
ssp
rote
inis
p4-
like
AK
1212
57P
0465
H09
.136
-20
–0
26K
0155
C03
.32
220
596
0–2
206
424
Un
kno
wn
pro
tein
AK
1078
15P
0005
C02
.106
2.15
00
27K
0155
C03
.33
221
479
1–2
217
275
Asc
iclin
-lik
ear
abin
og
alac
tan
-pro
tein
-lik
eA
K11
9590
P00
05C
02.1
080
–0
Ave
rag
e2.
240.
360.
24
Gen
eslo
cate
din
the
core
do
mai
nar
eh
igh
ligh
ted
inb
old
.Ka:K
s,ra
tio
of
no
nsy
no
nym
ou
s(K
a)
tosy
no
nym
ou
s(K
s)su
bst
itu
tio
nsi
tes
det
ecet
dw
ith
inth
eex
on
sfr
om
each
pai
ro
fo
rth
olo
go
us
gen
esb
etw
een
the
two
Cen
8re
gio
ns
of
‘Kas
alat
h’
and
‘Nip
po
nb
are’
.
4 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
Table 2 Characterization of long terminal repeat (LTR)-retrotransposons within ‘Kasalath’ and ‘Nipponbare’ Cen8 regions
LTR-retrotransposonsubfamily
Cen8 in ‘Kasalath’ Cen8 in ‘Nipponbare’
Copy no.No. of intactelements
Total length ofsequence (bp) Copy no.
No. of intactelements
Total length ofsequence (bp)
Rire3 36 (6) 20 299 820 27 (8) 14 213 290noaCRR1/Osr37 28 (22) 12 87 517 14 (8) 8 43 735
Osr34 14 (7) 6 90 089 15 (5) 5 97 524Osr30 12 (7) 7 100 263 14 (8) 9 130 256Osr33 10 (0) 5 68 974 10 (1) 4 72 163CRR2 10 (7) 5 44 453 8 (4) 4 35 506
Osr29/Ovikoh 9 (3) 4 41 037 12 (7) 4 50 512Omasag 9 (3) 0 29 182 11 (6) 1 38 197Seaba 7 (1) 2 47 866 4 (1) 3 41 491Egah 7 (2) 3 43 494 13 (7) 4 88 796Osr41 6 (2) 2 20 885 6 (3) 2 27 910Osr8 5 (1) 2 23 792 3 (0) 1 11 532noaCRR2/Pawepe 5 (2) 1 4814 2 (0) 0 921
Osr26/Rire2 4 (1) 2 37 126 3 (2) 1 19 411Vemeal 4 (2) 1 33 019 2 (2) 0 10 211Osr25/Dasheng 4 (1) 4 29 314 3 (3) 2 16 298Aboov 4 (1) 2 22 536 3 (0) 2 19 950
CRR1/Rire7 4 (4) 2 18 688 3 (3) 3 22 862
Kangourou_osj 3 (2) 1 19 082 2 (2) 0 9061Jobe 3 (1) 0 16 312 2 (0) 0 10 368Kuvu 3 (1) 1 8750 4 (2) 1 9843Mesaaw 3 (1) 0 8535 5 (1) 2 27 182Osr40/Rire10 2 (1) 2 24 017 2 (1) 2 24 395Wube 2 (1) 2 19 590 0 (0) 0 0Ifisi 2 (1) 1 13 923 2 (1) 1 18 963Awab 2 (1) 0 5985 4 (2) 0 11 829Echidne_osj 2 (1) 0 2855 2 (1) 0 2826Rire1 2 (2) 0 2797 2 (2) 0 2742Suawoh 1 (0) 1 13 368 0 (0) 0 0Hopi 1 (1) 1 12 876 0 (0) 0 0Asuvi 1 (0) 1 9062 1 (0) 1 9077Yneub 1 (0) 0 7520 1 (0) 0 1184Awok 1 (1) 0 4892 2 (2) 0 2726Goatuw 1 (0) 1 4829 0 (0) 0 0Ibus 1 (1) 0 4413 1 (1) 1 7429Rn_363 1 (1) 1 4202 0 (0) 0 0Dendrobat_osj 1 (1) 1 3698 1 (1) 1 3693Obeh 1 (1) 0 2735 1 (1) 0 1613Kado 1 (1) 0 2404 1 (1) 0 2407Ofon 1 (0) 0 1392 1 (0) 1 13 180Vasy 1 (0) 0 991 1 (0) 0 1003Noedu 1 (0) 0 978 1 (0) 0 978Ornithorynque_osj/Haky 1 (1) 0 954 1 (1) 0 954Osr1 1 (0) 0 859 1 (0) 0 859Pese 1 (0) 0 761 1 (0) 0 1596Ifalu 1 (1) 1 500 1 (1) 1 500Gileub 1 (1) 0 38 0 (0) 0 0Panejy 1 (1) 0 234 3 (3) 0 702Osr13 0 (0) 0 0 1 (0) 1 6427Nori 0 (0) 0 0 1 (0) 0 1391Oren 0 (0) 0 0 1 (1) 1 12 921
Total 222 (95) 94 1 241 769 199 (92) 80 1 126 414
CRR elements are highlighted in bold. Numbers in parentheses represent copies located in the core domain.
Rapid-evolution rice centromeres 5
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
sequence. We also identified high copy numbers of the CRR
(centromeric retrotransposons of rice) subfamilies, includ-
ing 32 CRR1 (four CRR1 and 28 noaCRR1) and 15 CRR2 (10
CRR2 and five noaCRR2) elements. These CRR elements,
located mostly in the core domain of the Cen8 region,
accounted for about 155 kb of sequence. Overall, LTR-
retrotransposon sequences comprised about 55.2%
(1.24 Mb) of the entire region of Kas-Cen8. We found that
95 (42.8%) of the LTR-retrotransposon elements were dis-
tributed within the core domain of Kas-Cen8 (Figure 2).
Through the BLAST search using a consensus sequence
of CentO monomers derived from the ‘Nipponbare’ Cen8,
we identified 89.0 kb of CentO sequences in Kas-Cen8,
comprising 584 monomers of the 155-bp satellite repeats,
and accounting for 4.0% of the entire region (Table 3). The
majority (86.7 kb) of these CentO sequences were organized
into a large block (block 1, nt 1 234 540–1 383 007 in the 2.25-
Mb virtual contig). Located at the short-arm site of the in-
silico mapped core domain, this block consists of 11 tracts
(sub-blocks 1–11) of CentO satellite sequences that show
differences in size and orientation, and are interrupted by the
CRR elements (Figure 2). The largest tract (sub-block 5)
contains 174 tandemly arrayed CentO satellite monomers.
The remaining CentO sequences (2.3 kb) were located in
three short tracts outside the core domain on the long-arm
side of Cen8 (block 2, nt 2 048 528–2 071 077).
Sequence and structural comparison of the Kas-Cen8
and Nip-Cen8 regions
Sequence homology analysis showed that the 2.25-Mb
Kas-Cen8 sequence corresponds to the 2.18-Mb sequence in
Nip-Cen8 (IRGSP build 4.0, chromosome 8 pseudomolecule,
nt 11 954 044–14 133 830). Through pairwise comparative
analysis by BLAST and manual inspection, we found that
about 66.8% (1.50 Mb) of the Kas-Cen8 sequence is collinear
with the corresponding sequence in the Nip-Cen8 region
(Figure 3; Table 4). The BLAST alignment results identified
sites of 22 104 single nucleotide polymorphisms (SNPs) and
2834 indels (insertion or deletion) between the two Cen8
regions, revealing a frequency of 14.72 SNPs and 1.26 indels
per kb. We identified 33 large indels of more than 10 kb
along the two Cen8 regions. We also identified a large seg-
mental duplication in rice Cen8. For instance, as shown in
Figure 3, a triplication is apparent within a 210-kb subregion
located within the core domain of Nip-Cen8 (nt 1 036 343–
1 891 855 in the 2.18-Mb virtual contig), which is consistent
with a previous finding (Ma and Bennetzen, 2006).
To compare the genomic composition and structure of
the centromeric region between the two rice varieties, we
re-annotated the 2.18-Mb sequence of Nip-Cen8. This led to
the identification of 27 putative genes (Table 1), on the basis
of known transcripts, and 199 LTR-retrotransposon elements
noaCRR1
Osr26
noaCRR1*
noaCRR1
CRR1*
CRR2*
noaCRR1
CRR2noaCRR1
CRR1
noaCRR1*
noaCRR1noaCRR1noaCRR1noaCRR1noaCRR1
Mesaaw
Dendrobat_osj*CRR2Osr34Osr41noaCRR1
Hopi*
Echidne_osj
Kangourou_osj
noaCRR1*
noaCRR1
Omasag
Osr29*
Rire3*
CRR2*
Awok
Egah*
Osr34*Rire1
Osr29CRR1*Panejy
Osr29
Egah*
CRR2*
Osr34*
noaCRR1*
Omasag
CRR2*Wube*
Osr30*
Osr30
Ifisi
Vemeal
Osr34
Osr30*
Osr34Osr34
Vemeal
Osr25*OmasagKuvu
Osr30*
noaCRR2
Kangourou_osj*
Jobe
noaCRR1
Osr41*
m_363*
noaCRR1
CRR1noaCRR1
noaCRR1*
noaCRR1*
Osr34
Awab
Osr40*
CRR2
Osr30*
Rire3
noaCRR1
Kado
Aboov
Osr30
Rire3Obeh
Rire3*
noaCRR1*
Ibus
Seaba
Rire3
Osr8
Osr30*Ornithorynque_osj
Rire1
Gileub
Ifalu*
Rire3*
noaCRR1
CentO CRR Other LTR-retrotransposons
100-kb
(0.41)
(6.47)
(0.88)
(1.69)
(1.32)
(7.44)
(2.90)
(2.39)
(0.23)
(1.92)
(0.90)
(0.80)
(1.01)
(2.81)
(1.62)
(1.40)
(3.55)
(2.12)
(4.31)
(4.21)
(0.44)
(0.25)
(0)
(0.34)
(0)
(0.30)
(0.47)(0.24)
(0.56)
(0.17)
(0.61)
(1.49)
(0.24)
(0.79)
CentO block 1
8S 8L
1 21
6 24
2
2 00
9 72
8
Figure 2. Genomic distribution of long terminal repeat (LTR)-retrotransposon elements and CentO satellite repeats identified in the core domain (793 kb) of
‘Kasalath’ Cen8.
The unshared and shared LTR-retrotransposons between ‘Kasalath’ and ‘Nipponbare’ Cen8 regions are given above and below the sequence map, respectively.
Asterisks indicate the intact or mostly intact LTR-retrotransposon elements, and numbers in parentheses represent the estimated insertion date (Ma). Solo LTRs are
shown in red.
6 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
(Table S3) grouped into 45 subfamilies, on the basis of LTR
sequence homology (Table 2). These 27 genes are ortholo-
gous with the 27 putative genes predicted in Kas-Cen8. To
investigate the divergence of LTR-retrotransposon inser-
tions within the Cen8 region, we compared the chromo-
somal positions, sequences and structures of individual
elements, and identified 123 LTR-retrotransposon elements
common to ‘Kasalath’ and ‘Nipponbare’ (Table 5; see
Table S4 for details). Furthermore, to characterize the
amplification history of retroelements in two Cen8 regions,
Table 3 Sequences and structural analysis of CentO blocks detected in the ‘Kasalath’ Cen8 region
CentO Sequence
Position
Size (bp) Orientation CommentFrom To
Block 1 CentO sub-block 1 1 234 540 1 259 960 25 421 + 165 monomers
noaCRR1 1 259 961 1 264 335 4375 ) Intact, TDSCentO sub-block 2 1 264 344 1 265 026 688 + 5 monomers
noaCRR1 1 265 027 1 266 420 1394 ) PartialCentO sub-block 3 1 266 421 1 274 077 7657 + 50 monomers
CRR1 1 274 078 1 280 541 6464 + Intact, no TDSCentO sub-block 4 1 280 542 1 285 285 4744 ) 31 monomers
CRR2 1 285 286 1 293 040 7755 + IntactCentO sub-block 5 1 293 056 1 319 293 26 238 ) 174 monomers
noaCRR1 1 319 307 1 323 356 4050 + PartialCentO sub-block 6 1 323 366 1 328 778 5413 ) 36 monomers
CRR2 1 328 784 1 335 305 6522 ) Almost intactCentO sub-block 7 1 335 330 1 338 105 2776 + 18 monomers
noaCRR1 1 338 106 1 338 897 792 + Solo LTR, TDSCentO sub-block 8 1 338 992 1 343 023 4102 + 27 monomers
noaCRR2 1 343 024 1 343 521 498 ) PartialCentO sub-block 9 1 343 522 1 343 737 216 ) 2 monomers
CRR1 1 343 738 1 347 484 3747 + PartialCentO sub-block 10 1 347 485 1 352 727 5243 ) 34 monomers
noaCRR1 1 352 755 1 378 844 26 090 + CRR1 blocka
CentO sub-block 11 1 378 845 1 383 007 4163 ) 27 monomers
Block 2 CentO sub-block 1 2 048 528 2 049 600 1073 ) 7 monomersCentO sub-block 2 2 049 607 2 049 908 301 + 2 monomersJobe 2 051 317 2 067 687 5154 + Solo, TDSOsr33 2 052 360 2 063 576 11 217 + Intact, TDSCentO sub-block 3 2 070 159 2 071 077 918 + 6 monomers
CentO sequences in the core domain are highlighted in bold.aSeven CRR1 elements are custered in a nested pattern.
a b c
2.0
1.0
2.01.0
‘Kas
alat
h’
‘Nipponbare’
CentO block 1
CentO block 28L
8S
nt 1
216
242
-2 0
09 7
28
nt 1 036 343-1 891 855
Figure 3. Sequence alignment of the ‘Kasalath’
and ‘Nipponbare’ Cen8.
The positions of matched sequences detected by
BLASTZ (e < 10)20) are dot-plotted. The two
CentO blocks are encircled with the solid lines.
The subregion of the Cen8 core domain is
squared with a thick line, and is enlarged to
the right. The tandemly triplicated segments
detected only in ‘Nipponbare’ Cen8 are indicated
with a, b and c boxes in broken lines.
Rapid-evolution rice centromeres 7
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
we performed a phylogenetic analysis by using sequences
of reverse transcriptase (RT) domains from the intact or
mostly intact LTR-retrotransposons (53 in ‘Kasalath’ and 51
in ‘Nipponbare’). The tree generated by the neighbor-joining
method showed three main branches (Figure 4a; see Fig-
ure S1 for details). Branches I and II included 59 (eight
subfamilies, such as Rire3 and CRRs) and 35 elements (10
subfamilies such as Osr30 and Osr40), respectively, which
were members of the Ty3/gypsy family (Hansen and Heslop-
Harrison, 2004). Branch III was composed of only nine
elements (four subfamilies, such as Egah and Osr8) belong-
ing to the Ty1/copia family. We did not conduct the same
analysis for the remaining subfamilies grouped by LTR
sequences because they lacked RT domains like noaCRRs
and Dasheng, or carried the RT domain either with deleted or
inserted segments.
We also compared the DNA sequence and organization of
CentO satellite repeats between Kas-Cen8 and Nip-Cen8.
Both varieties harbor CentO satellite repeats at the corre-
sponding orthologous regions, thus suggesting conserva-
tion of two separated CentO blocks (Figure 3). However,
there was a notable difference between the varieties in the
copy number and organization pattern of CentO satellite
repeats within the large CentO block: block 1 (Figure 5).
CentO block 1 in Kas-Cen8 consisted of 569 copies of satellite
repeats (86.7 kb), which were interrupted by the CRR
elements into 11 sub-blocks. By comparison, the corre-
sponding CentO block in Nip-Cen8 had only 428 copies of
satellite repeats (68.5 kb), which were separated into three
sub-blocks by CRR elements. We extracted the intact or
mostly intact CentO satellite monomers from both blocks
(556 in ‘Kasalath’ and 428 in ‘Nipponbare’) for BLAST
analysis, revealing that the sequence of the CentO mono-
mers is highly conserved between the two varieties, with an
equal average identity of 94.6 � 1.9% in ‘Kasalath’ and
95.2 � 1.7% in ‘Nipponbare’ with the consensus CentO
sequence. To further investigate the evolutionary processes
underlying the formation of CentO blocks, we performed a
Table 4 Statistics and overall comparison of genomic sequencesbetween the two Cen8 regions of ‘Kasalath’ and ‘Nipponbare’
Total length of ‘Kasalath’ Cen8 sequence (bp) 2 249 426Total length of ‘Nipponbare’ Cen8 sequence (bp) 2 179 787Total length of collinearly aligned sequences (bp) 1 501 991Total number of SNPs within the alignmentsequence (bp)
22 104
Total sites of indels between the two Cen8 regions 2834Indels in length of <1 kb 2664Indels in length of 1–10 kb 137Indels in length of >10 kb 33
Table 5 Comparative analysis of long terminal repeat (LTR)-retrotransposons between two Cen8 sequences of ‘Kasalath’ and ‘Nipponbare’
LTR-retrotransposonCen8 in‘Kasalath’
Cen8 in‘Nipponbare’
Shared between‘Kasalath’ and ‘Nipponbare’
Unique to‘Kasalath’
Unique to‘Nipponbare’
Total number 222 199 123 99 76Length of sequence (bp) 1 241 769 1 126 414 625 247/658 754a 616 522 467 660Average age (Ma)b 1.12 1.15 1.71 0.39 0.33
aTotal of sequences respectively from the shared LTR-retrotransposons between ‘Kasalath’ and ‘Nipponbare’.bA substitution rate of 1.3 · 10)8 mutations per site per year was used to estimate the insertion date of LTR-retrotransposons. Data regarding theages of LTR-retrotransposons from the duplicated segments in ‘Nipponbare’ Cen8 was excluded.
0.2 0.01
‘Kasalath’ Cen8 ‘Nipponbare’ Cen8
Ty3-gypsy
Ty1-copia
I
II
III
I
II
(a) (b)Figure 4. Amplification of long terminal repeat
(LTR)-retrotransposons and satellite repeats in
the ‘Kasalath’ and ‘Nipponbare’ Cen8 regions, as
revealed by phylogenetic analysis.
(a) Tree generated from the sequence of reverse
transcriptase (RT) domains from 104 LTR-retro-
transposons.
(b) Tree generated from the sequence of 984
satellite monomers from the CentO block 1.
8 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
phylogenetic analysis of the above CentO monomers to
establish a neighbor-joining tree, which showed two main
branches, each composed of CentO monomers derived from
both varieties (Figure 4b; see Figure S2 for details). The
many small branches (sub-branches) evident under the two
main branches suggest the involvement of both ancient and
recent amplification of the CentO satellite monomers in the
rice centromere regions.
Genomic diversity of the Cen8 region in the Oryza genus
The genus Oryza comprises 23 species with nine different
genome types (AA, BB, CC, EE, FF, GG, BBCC, CCDD and
HHJJ). A comparison of these sequences will help us
understand how rapidly the Cen8 sequence has undergone
divergence during genome evolution. We performed PCR
analysis to screen Cen8 sequences in 94 samples derived
from cultivated or wild rice, covering all species or genome
types in Oryza. Among the eight primer pairs unique to
active genes both in Kas-Cen8 and Nip-Cen8 regions, six
amplified similar DNA fragments from all Oryza species
(Figure S3). The remaining two amplified DNA fragments
exhibiting variable sizes among the different species or
varieties. By comparison, only three out of 11 primer pairs
specific to CENH3-binding sites amplified DNA fragments
that were similar in most Oryza species. The remaining eight
only amplified DNA fragments from AA-genome species.
One of these pairs only amplified DNA fragments from
Oryza sativa and the progenitor species Oryza rufipogon.
The CentO-specific primer pair only amplified PCR frag-
ments from AA-genome species.
DISCUSSION
The difficulties associated with the sequencing and assem-
bly of entire centromeres in higher eukaryotes, as encoun-
tered in the highly studied genomes of humans and
Arabidopsis, in which sequence gaps remain in all centro-
meres (Arabidopsis Genome Initiative, 2000; International
Human Genome Sequencing Consortium, 2004), have lim-
ited our understanding of the evolutionary mechanisms
underlying the sequences and structures of centromeres.
The recent completion of the genomic sequence of ‘Nip-
ponbare’ has allowed the partial inter- or intra-chromosomal
comparison of rice centromeric sequences, which has pro-
vided evidence of the dramatic differences in composition
and structure of centromeric regions (Ma and Bennetzen,
2006; Ma et al., 2007). To fully understand the evolutionary
dynamics of the first completely sequenced centromere
of any species, we completely decoded the entire Cen8
sequence on chromosome 8 from ‘Kasalath’. Analysis of the
Kas-Cen8 sequence, and its comparison with the Nip-Cen8
sequence, demonstrated the presence of highly conserved
active genes, but rapidly diversified insertions of LTR-retro-
transposons and CentO satellite repeats in the two rice
subspecies. This study enhances our understanding of the
molecular mechanisms underlying evolutionary processes,
and of centromere function.
8L
CentO
CRR
20-kb
Subb
lock
1 (
165)
Subb
lock
2 (
5)
Subb
lock
3 (
50)
Subb
lock
4 (
31)
Subb
lock
5 (
174)
Subb
lock
6 (
36)
Subb
lock
7 (
18)
Subb
lock
8 (
27)
Subb
lock
9 (
2)
Subb
lock
10
(34)
Subb
lock
11
(27)
Subb
lock
1 (
218)
Subb
lock
2 (
51)
Subb
lock
3 (
159)
8S
8L8S
‘Kasalath’
‘Nipponbare’
Figure 5. Structural dynamics of CentO block 1 between ‘Kasalath’ and ‘Nipponbare’ Cen8, as revealed by the retrotransposon insertions and segmental duplication
of satellite repeats.
Numbers in parentheses indicate the copies, and white lines with arrows represent the orientation of satellite monomers of each CentO sub-block. The monomer
pairs (four or more monomers) showing most identity within or between the two Cen8 regions, as revealed by phylogenetic analysis, are connected by the black
curved or straight lines.
Rapid-evolution rice centromeres 9
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
Low-density, highly conserved gene sequences in rice Cen8
Expressed transcripts indicated 31 419 putative gene loci in
the 382-Mb sequence of ‘Nipponbare’, indicating a genome-
wide gene density of one gene per 12 kb (Rice Annotation
Project, 2008). When all hypothetical genes were ignored,
the two Cen8 regions were found to contain the same
number of 27 putative gene loci, supported by comparison
with known expressed transcripts, showing a very low gene
density of only about one gene per 83 kb. All 27 putative
genes in Kas-Cen8, including six annotated within the
in-silico mapped domain, had orthologs within Nip-Cen8
(Table 1). No other putative genes with evidence of full- or
partial-length cDNA sequences were identified within the
indel or duplication sites in either genome, on the other
hand, suggesting that gene content is conserved between
the two subspecies. Expressed genes have also been
reported in the Arabidopsis centromeres (Copenhaver et al.,
1999). Based on the present result, however, no expressed
genes with identical homology were found between the
centromere regions of rice and Arabidopsis, which implies
the presence of a variation in the content of genes associated
with the centromere sequences between the monocotyle-
don and dicotyledonous species, although they shared a
common structural organization that contain numerous
satellite repeats surrounded by flanking DNA rich in retro-
elements and transposons. It appears that genes within
the rice Cen8 regions have undergone a similar degree of
sequence divergence as the other genomic regions, as the
average rates of SNPs and indels (2.24 SNPs per kb and
0.24 indels per kb) observed from a comparison of their
coding regions between Kas-Cen8 and Nip-Cen8 were very
close to the values obtained from a genome-wide analysis
between ‘93-11’ and ‘Nipponbare’ (3.00 SNPs per kb and
0.22 indels per kb), or among a diverse panel (2.29 SNP per
kb) of Oryza sativa accessions (Yu et al., 2005; Caicedo et al.,
2007). Although the putative genes located within or close to
the core domain tend to have reduced rates of SNPs and
indels, the average rates of each detected from the coding
regions of the above 27 genes are notably lower than that
from the entire Cen8 region (14.72 SNPs per kb and
1.26 indels per kb). These observations suggest that natural
selection and adaptation of active genes have taken place
under a highly heterochromatic environment. The highly
conserved sequences of active genes observed among all
species of Oryza through PCR analysis support this notion.
Retrotransposons of gypsy-like subfamilies predominated
in rice centromeres
A previous study reported at least 59 distinct LTR-retro-
transposon groups existing in the euchromatic regions of
the rice genome, in which almost two-thirds consisted of
copia-like elements, but where gypsy-like elements out-
numbered copia-like elements by a ratio of 2:1 (McCarthy
et al., 2002). Using a similar method in the present study,
we identified 51 retrotransposon subfamilies (41 shared
between ‘Kasalath’ and ‘Nipponbare’), with a variable copy
number ranging from 1 to 36, within the two Cen8 regions
(Table 2), in which 15 subfamilies have already been
characterized in the euchromatic regions. This observation
provides a comprehensive description of compositional
features and evolutionary perspectives for the retroelements
in the rice centromeres. Based on the phylogenetic analysis
using the intact RT domains from 104 LTR-retrotransposons,
for example, less than one-fifth of the 22 subfamilies con-
sisted of copia-like elements, and the elements of gypsy-like
subfamilies predominated over the copia-like elements by a
ratio of approximately 10:1 (49:4 in ‘Kasalath’ and 46:5 in
‘Nipponbare’) within the Cen8 region, fivefold higher than
that observed in the euchromatin (Figures S1 and 4a). This
observation consequently addresses the obviously non-
uniform chromosomal distribution of the two families of
LTR-retrotransposons, Ty1/copia and Ty3/gypsy, also known
as Pseudoviridae and Metaviridae, respectively, in the rice
genome. A similar result was also reported previously in the
pericentromeres of Arabidopsis, although the percentage of
LTR-retrotransposons in its genome is one-tenth of that
observed in the rice genome (Peterson-Burch et al., 2004).
Our results thus imply that the two distantly related species,
which diverged from one another around 200 Ma, have a
common feature for the preferential accumulation of gypsy-
like retroelements in the centromere regions. An extensive
comparison of genomic sequences and distribution of
retroelements between the above two model organisms
might be needed for a complete understanding of the
functional and evolutionary mechanisms of centromeres
between distantly related plant species.
Dynamic structural variation in the Kas-Cen8 and Nip-Cen8
regions by recent LTR-retrotransposon insertion
As expected, the present study revealed a rapid divergence
of sequences and structure between the centromeres of the
two rice subspecies: only 66.8% (1.50 Mb) of the Kas-Cen8
sequence was collinear with the Nip-Cen8 sequence. We
investigated the cause of the major structural variations in
the two Cen8 regions by extensive characterization and
comparison of the repeat sequences between them. We
found 23 more LTR-retrotransposon elements (150-kb
sequences) in Kas-Cen8 (222 in total) than in Nip-Cen8 (199 in
total), but only 123 LTR-retrotransposon insertions in com-
mon between the two rice varieties (Table 5). This finding
suggests that up to 44.6% (99 elements) and 38.2% (76
elements), respectively, of the LTR-retrotransposon inser-
tions identified in Kas-Cen8 and Nip-Cen8 have accumu-
lated independently after the divergence of the indica and
japonica subspecies. These recently inserted LTR-retro-
transposons account for 0.62 Mb in Kas-Cen8, and 0.47 Mb
in Nip-Cen8, explaining the unexpectedly high number of
10 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
unaligned sequences between the two subspecies. To pro-
vide further evidence in support of this suggestion, we
estimated the evolutionary time scale of insertion of all
intact or mostly intact LTR-retrotransposons within the two
Cen8 regions by an analysis of LTR sequences. We found
that the shared (92 between ‘Kasalath’ and ‘Nipponbare’)
and unshared (47 in ‘Kasalath’ and 30 in ‘Nipponbare’) LTR-
retrotransposon elements showed different insertion dates
of 1.71 and 0.36 Ma, on average (Table 5). This result is
consistent with the estimated time of divergence of
Oryza sativa from Oryza rufipogon, its wild ancestor, about
0.44 Ma (Khush, 1997; Ma and Bennetzen, 2004). In sum-
mary, our findings demonstrate that the major structural
variations in the two regions of Kas-Cen8 and Nip-Cen8 are
caused by the recent insertion of LTR-retrotransposons.
Although LTR-retrotransposons have preferentially accu-
mulated in the Cen8 region, our results also indicate that
LTR-retrotransposon sequences have been eliminated from
the genome. We estimate that about 63.4 and 51.1% of the
shared and unshared LTR-retrotransposon elements are
composed of incomplete structures, including solo LTRs and
internally deleted or truncated elements. Unequal homolo-
gous recombination between two LTRs of a single element,
as well as illegitimate recombination, which does not
require extensive sequence homology to generate truncated
elements, are suggested as the main mechanisms underly-
ing the deletion of LTR-retrotransposon DNA (Devos et al.,
2002; Bennetzen et al., 2005). The ratio of solo LTRs to intact
elements within the Cen8 regions was 0.96:1 (94:98) in
‘Kasalath’ and 1.13:1 (86:76, excluding copies within the
triplicated segments) in ‘Nipponbare’, lower than the ratios
of 2.2:1 and 1.6:1 previously calculated for the euchromatic
regions and whole genome (Ma et al., 2004; Ma and
Bennetzen, 2006). Although the complete inhibition of
homologous recombination within Cen8 might repress
unequal recombination, interestingly, ‘Kasalath’ and ‘Nip-
ponbare’ showed different rates of elimination of LTR-
retrotransposon sequences from the Cen8 regions after
their divergence. The ratio of solo LTRs to intact elements
inserted before divergence was almost the same in ‘Kasa-
lath’ and ‘Nipponbare’, at 1.41:1 (61:44) and 1.42:1 (61:43),
respectively, but after divergence changed to 0.59:1 (32:54)
and 0.76:1 (25:33). This finding provides evidence of the
involvement of rice domestication in the evolution of Cen8
regions, and forms the basis for investigations into whether
retrotransposon selection, such as selection for CRR
elements, is important for centromere function.
Accumulation and rearrangement of CRR and CentO
sequences dramatically reshaped the core domain
of rice Cen8
The overall density of LTR-retrotransposons was higher in
the core domain of Kas-Cen8 and in the orthologous region
of Nip-Cen8 than in the pericentromeric regions (flanking the
core domain), with average increases of 3.3 and 2.7 per
100 kb, respectively (Figure S4). Although a triplication that
had led to the accumulation of 20 copies of LTR-retrotrans-
posons, with the subsequent loss of two, was observed only
in Nip-Cen8, almost the same numbers of LTR-retrotrans-
posons were present within the core domain of each: 95 in
‘Kasalath’ and 92 in ‘Nipponbare’ (Table 2). In Arabidopsis,
the gypsy-like subfamily of Athila appeared to be most
prevalent within its pericentromeric heterochromatin, and
strictly associated with the 178-bp satellite repeats (Copen-
haver, 2003; Peterson-Burch et al., 2004). Closely related
to aboov, Osr34 and other subfamilies, as revealed by
the phylogenetic analysis in the present study (Figure S1),
and CRR elements, which are enriched in the rice
centromeric region, are thought to be essential for centro-
mere function, together with CentO satellite repeats (Cheng
et al., 2002; Nagaki et al., 2004, 2005). With the aim of
further understanding the molecular and evolutionary
mechanisms underlying the conserved function of rice
centromeres, we compared the sequences and organiza-
tional patterns of the CRR elements and CentO satellite
repeats in the Cen8 core domain between the two rice
subspecies.
A total of 47 (155.5 kb) and 27 (103.0 kb) CRR elements
(CRR1, noaCRR1, CRR2 and noaCRR2 subfamilies) accumu-
lated in Kas-Cen8 and Nip-Cen8, respectively, accounting for
21.2 and 13.6% of the LTR-retrotransposon insertions in
these rice subspecies (Table 2). Thirty-five (74.5%, 113.8 kb)
and 15 (55.5%, 61.4 kb) of these elements were organized
within the core domain of Kas-Cen8 and Nip-Cen8, respec-
tively, and accounted for 36.8 and 16.3% of the LTR-
retrotransposon insertions in this subregion (Tables 2 and
S4). This finding indicates that more CRR elements accu-
mulated in the core domain of Kas-Cen8 than in that of Nip-
Cen8. The eight orthologous CRR elements among the 55
shared LTR-retrotransposons (average insertion date of
2.42 Ma) in the core domain of Kas-Cen8 and Nip-Cen8 are
most likely to be the result of ancient insertions (Figure 2).
By comparison, most of the CRR elements (27 out of 35;
average insertion date of 0.32 Ma) in this domain of Kas-
Cen8 accumulated after the divergence of indica and japon-
ica. It is notable that CRR elements make up 67.5% of the
total LTR-retrotransposons (40 elements) recently inserted
into the ‘Kasalath’ core domain, and that 15 CRR elements
co-localize with CentO satellite sequences (CentO sub-blocks
1–11). Segmental duplication rather than integration of
active elements has been suggested as the mechanism of
accumulation of most of the CRR elements in the Cen4 core
region (Ma and Jackson, 2006). On the basis of the young
insertion time and the unique TSD sequences observed in
this study, we suggest that the accumulation of CRR
elements in Kas-Cen8 derives from a recent insertion, rather
than from segmental duplication. Unexpectedly, seven CRR
elements were clustered between CentO sub-blocks 10 and
Rapid-evolution rice centromeres 11
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
11 to take the shape of a 29.1-kb noaCRR1 block (Figures 2
and 5). From the structure and organization pattern of each
CRR, we assume that this noaCRR1 block was formed by
several episodes of inter-element unequal recombination
(Figure S5).
Similarly, a higher copy number of CentO satellite mono-
mers were also found within the core domain of Kas-Cen8
(CentO block 1, measuring 18.2 kb) than were found in the
core domain of Nip-Cen8. These variations between the two
rice varieties indicate the preferential accumulation of both
CRR and CentO satellite-repeat sequences in Kas-Cen8.
Although we observed small duplications or deletions of
DNA sequences (approximately 12 bp) within a number of
CentO monomers, the CentO monomers were highly con-
served between the two subspecies, with 94.6–95.2% con-
sensus sequence identity. Centromeric satellite repeats can
be homogenized by unequal conversion, whereas variation
in copy number and arrangement can be caused by unequal
exchange (Lee et al., 2006; Ma and Jackson, 2006; Malik and
Bayes, 2006). Phylogenetic analysis of the CentO satellite
repeats in ‘Nipponbare’ Cen8 has already demonstrated a
recent inverted segmental duplication that was responsible
for the amplification of CentO monomers (Ma and Bennet-
zen, 2006). Using the same method to analyse the most
related monomers of CentO satellite repeats revealed by the
neighbor-joining phylogenetic tree, we found no similar
segmental duplication in Kas-Cen8 (Figure 5). This finding
suggests that the known segmental duplication of CentO
satellite repeats between sub-blocks 1 and 3 in Nip-Cen8
must have occurred after the divergence of indica and
japonica. We investigated 26 segments in Kas-Cen8 that
contained ordered pairs of CentO satellites (four or more
monomers), showing a very high degree of sequence
similarity (ranging from 98 to 100%). On the basis of their
positions and orientations (Table S5), these segments seem
to have derived from multiple tandem duplications. In
support of this finding, orthologous pairs of CentO mono-
mers were found only at the start of the first CentO sub-block
and at the end of the last CentO sub-block (Figure 5).
Although we were not able to trace the origin of most
internal CentO sub-blocks in Kas-Cen8, because of the very
recent amplification, the rapid and dramatic rearrange-
ments, and reshuffling of the satellite repeats, as indicated
by the close proximity of branches or sub-branches in the
phylogenetic tree, the above observations provide strong
evidence that the core domain of rice Cen8 has been
dramatically reshaped through the variable accumulation
of CRR elements, and the rapid expansion or rearrangement
of CentO satellite repeats, in the two rice subspecies.
Because of the presence of active genes in the CENH3-
binding domain and low numbers of CentO repeats, ‘Nip-
ponbare’ Cen8 is thought to represent an intermediate stage
in the evolution of centromeres, similar to human neocen-
tromeres, to fully mature centromeres that accumulate
megabases of homogeneous satellite arrays (Nagaki et al.,
2004). The high rates of rearrangements of CRR elements
and rapid expansion of CentO satellite repeats, observed
within the two rice subspecies here, indicate that centromere
function is maintained regardless of the dynamic changes in
genomic structure. Consequently, our results raise further
questions. Do these two classes of centromeric repetitive
sequences have similar or distinct roles in centromere
function in rice? Are other types of retrotransposons or
centromeric satellite repeats involved, directly or indirectly,
in centromere performance? To explain the rapid sequence
divergence of the genes encoding CENH3 proteins and
centromere DNA repeats, an evolutional model involving
centromere drive has been recently proposed in both
animals and plants (Smith, 1976; Malik and Henikoff, 2002;
Talbert et al., 2002; Heslop-Harrison et al., 2003). Supposing
that centromere variants with enriched retrotransposons
and expanded satellite-repeat arrays increase CENH3 bind-
ing sites, and facilitate microtubule-binding ability during
female meiosis in rice (Ma et al., 2007), it is possible that the
preferential accumulation of centromere-specific retrotrans-
posons and satellite repeats is an outcome of centromere
drive.
Conservation of genes and divergence of CENH3-binding
and CentO sequences in the Cen8 region of Oryza
PCR amplification with primers designed from the coding
regions of putative genes indicated that all genes annotated
in the Cen8 region of cultivated rice are conserved within the
genus Oryza (Figure S3). These primers could be used to
provide landmarks for future structural and evolutionary
analysis of Cen8 in different rice varieties, as well as in wild
species of rice. Because the conserved genes in the Cen8
region, which is embedded by abundant repeat sequences,
are active, these genes are good candidates for future
studies of the mechanisms controlling gene expression un-
der highly heterochromatic environments. Differing clearly
from the findings for active genes, our results provide strong
evidence that CENH3-binding sites and CentO satellite
repeats are only highly conserved within species with AA
genomes. This result supports previous reports that satellite
repeats are only preserved in closely related species (Zhong
et al., 2002; Lee et al., 2005). AA-genome species are esti-
mated to have diverged from common ancestors with the BB
genome only about 2 Ma (Ma and Bennetzen, 2004; Zhu and
Ge, 2005). Recent analysis of the Cen8 sequence in a
wild-rice species Oryza brachyantha with the FF genome
provided strong evidence of the amplification of a new ret-
roelement in the last few million years, to replace the
canonical CRR detected in other Oryza species (Gao et al.,
2009). Further investigations, i.e. sequencing and comparing
the retrotransposons, the satellite repeats, as well as the
CENH3 gene among different Oryza species, to determine
whether and/or how the rapid divergence of centromeric
12 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
repeated sequences has played a role in the evolution of
functional centromeres, and then in the speciation in Oryza,
should be conducted in the future.
EXPERIMENTAL PROCEDURES
Mapping, sequencing and annotation of the
Kas-Cen8 region
The construction of a BAC library, end-sequencing and in-silicomapping of BAC clones from the rice variety ‘Kasalath’ (Oryza sa-tiva L. ssp. indica) were conducted as described before (Katagiriet al., 2004). Relying on the completed genomic sequence ofchromosome 8 (accession number AP008214) of rice variety ‘Nip-ponbare’ (Oryza sativa L. ssp. japonica), we generated six BACcontigs, flanked by the co-segregated markers C1374 and S21882,within the genetically defined centromeric region of chromosome8 of ‘Kasalath’ (Harushima et al., 1998). The physical locations ofthese contigs were confirmed by PCR analysis of 16 genetic andexpressed sequence tag (EST) markers located within the Cen8region (Harushima et al., 1998; Wu et al., 2002). Physical gapsbetween adjacent BAC contigs were closed through chromosomalwalking by using unique BAC-end sequences to select bridgeclones from the whole ‘Kasalath’ BAC library. Minimum-tiling-path(MTP) clones within the completed BAC physical map wereexamined and selected for shotgun sequencing to give anapproximately 10-fold sequence coverage, using a previouslydescribed method (Sasaki et al., 2002; Wu et al., 2004). Sequencegaps remaining in BAC clones were generally filled by sequencingthe bridge subclones with custom primers. Regions with low-quality scores were improved by resequencing with customprimers or by alternative chemistries. We applied a transposoninsertion/sequencing system (Genome Priming System GPS-1;New England Biolabs, http://www.neb.com) for the completesequencing of the subclones that contained highly repeatedsequences. Assembled sequences were confirmed to have <1 errorper 10 000 bases, and were verified to resolve any misassembly.Sequences from the overlapping regions between neighboringBAC clones were checked, and were confirmed to be correct. TheDNA sequence analysis software SEQUENCHER 4.1 (Gene Codes,http://www.genecodes.com) was used to create a single, non-overlapped contiguous sequence, based on each completelysequenced BAC clone from the ‘Kasalath’ Cen8 region. Geneannotation was performed using our previously developed andverified annotation system (International Rice Genome SequencingProject, 2005; Rice Annotation Project, 2008).
Fluorescence in-situ hybridization (FISH)
A FISH experiment was performed according to a previouslydescribed protocol, with minor modifications (International RiceGenome Sequencing Project, 2005). Briefly, fresh young leaveswere chopped with a sharp scalpel and were then filtered through60-mm nylon mesh (Millipore, http://www.millipore.com) toremove debris, and to isolate nuclei in the filtrate. Nucleus lysis buffer(0.5% SDS, 10 mM EDTA, 10 mM Tris, pH 7.0) was added to a sus-pension of nuclei placed on a glass slide, and DNA fibers were leftto extend from the nuclei by gravity. The PCR-amplified DNAprobes from the ‘Kasalath’ BAC clone K0486F02 (within a 51-kbsubregion) or CentO sequences were labeled with digoxigenin-dUTP or biotin-dUTP, respectively, and then hybridized with theDNA fibers (Table S6). Detection was performed with a fluoresceinisothiocyanate (FITC)-conjugated anti-digoxigenin antibody or Cy3-conjugated avidin. FISH signals were captured by using a BX51microscope (Olympus, http://www.olympus.com) with a CoolSNAP
HQ charge-coupled device camera (Roper Scientific, http://www.roperscientific.com).
Classification of repeat sequences
Intact LTR-retrotransposons were determined by using LTR-STRUC,an LTR-retrotransposon mining program (McCarthy et al., 2002),and by methods previously described (Ma and Bennetzen, 2004; Maet al., 2004). Solo LTRs and truncated elements were identified bysequence homology searches against the rice LTR-retrotransposondatabase collected from the completed ‘Nipponbare’ genomesequence, generated by the International Rice Genome SequencingProject (Ma and Bennetzen, 2006). The structures of all LTR-retro-transposons identified were confirmed by manual inspection. Forestimating the insertion date of LTR-retrotransposons, we extractedtwo LTR sequences from each intact or mostly intact LTR-retro-transposon, and aligned them using CLUSTALX (Thompson et al.,1997). After editing manually, if necessary, we applied a mutationrate of 1.3 · 10)8 substitutions per base per year for the age calcu-lation (Ma and Jackson, 2006). For characterization of centromeresatellite repeats in the Kas-Cen8 region, we used a consensussequence of CentO monomers, previously reported from the Cen8region of ‘Nipponbare’, for BLAST analysis (Wu et al., 2004).
Alignment and comparison of genomic sequences between
the two regions of Kas-Cen8 and Nip-Cen8
Genomic sequences were compared between the orthologous Cen8regions of ‘Kasalath’ and ‘Nipponbare’ by using the BLAST algo-rithm (Altschul et al., 1997). Homologous sequences were alignedand dot-plotted with BLASTZ (Schwartz et al., 2003). SNP and indel(insertion or deletion) sites present between the two orthologousgenomic regions were detected by using AVID (Bray et al., 2003).Ratios of non-synonymous substitution (Ka) to synonymous sub-stitution (Ks) between the orthologous genes were calculated withSNAP (http://hiv-web.lanl.gov/content/hiv-db/SNAP/README.html)by the Nei and Gojobori method, with Jukes–Cantor correction (Neiand Gojobori, 1986). Sequences of RT domain and satellite repeatswere extracted, respectively, from all intact or mostly intact LTR-retrotransposons and CentO monomers in both varieties, for aphylogenetic analysis to build neighbor-joining trees by the Kimura(1980) two-parameter method.
PCR amplification of centromeric sequences within the
genus Oryza
We prepared a set of 96 varieties and wild-rice accessions thatrepresent all species and genome types from AA to HHJJ of Oryza(Table S7). Rice varieties (Oryza sativa L. ssp. japonica and indica)were drawn from the Rice Diversity Research Set of Germplasm,developed by the National Institute of Agrobiological Sciences(NIAS) (Kojima et al., 2005). Accessions of African cultivated orwild-rice species were obtained from the collections in theresource centers of the National Institute of Genetics (NIG) or theInternational Rice Research Institute. DNA was isolated fromyoung leaves by the cetyltrimethylammonium bromide (CTAB)method (Murray and Thompson, 1980). For PCR screening, weused the 19 unique primer pairs previously designed for confir-mation of active genes or fragments of CENH3-binding sites in theCen8 region of ‘Nipponbare’ (Table S8). A special primer pair foramplification of CentO satellite DNA was also used (Wu et al.,2002). PCR was performed in a final volume of 20 ll, comprising2 ll of 10 · buffer, 2 ll of MgCl2 (25 mM), 2 ll of dNTPs (25 mM),0.2 ll of Taq polymerase (5 U ll)1), 0.3 ll of primer DNA (10 lM
each), 4 ll of 50% glycerol, 5 ll of template DNA (5 ng ll)1) and
Rapid-evolution rice centromeres 13
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
4.5 ll of water, with a PTC-225 DNA Engine Tetrad Cycler (Bio-Rad,http://www.bio-rad.com) under 35 cycles at 94�C for 30 sec, 55�Cfor 30 sec, and 72�C for 1 min. PCR products were examined byelectrophoresis in 1% agarose gel.
Accession numbers
The ‘Kasalath’ BAC sequences obtained in this study were submit-ted to DDBJ under the accession numbers AP009077–AP009094. Acontiguous sequence of the 2.25-Mb ‘Kasalath’ Cen8, as well asdetailed results of gene annotation in each BAC sequence, can bedownloaded at http://rgp.dna.affrc.go.jp/E/Publicdata.html
ACKNOWLEDGEMENTS
We thank Nori Kurata (NIG), and Makoto Kawase, Duncan A.Vaughan, Kaworu Ebana and Takeshi Izawa (NIAS), for providingthe plant material. We also thank Masahiro Nakagahra for adviceand encouragement. This work was supported by grants from theMinistry of Agriculture, Forestry and Fisheries of Japan (GS1101,GS1201 and GD2007).
SUPPORTING INFORMATION
Additional Supporting Information may be found in the onlineversion of this article:Figure S1. Families and subfamilies of rice long terminal repeat(LTR)-retrotransposons within the phylogenetic tree.Figure S2. CentO monomers within the phylogenetic tree.Figure S3. PCR screening of Cen8 sequences in the genus Oryza.Figure S4. Distribution patterns of long terminal repeat (LTR)-retrotransposons in the rice Cen8 regions.Figure S5. Model for the formation of the CRR block by interelementunequal recombination.Table S1. Sequence statistics of BAC clones covering the ‘Kasalath’Cen8.Table S2. Identification of long terminal repeat (LTR)-retrotranspo-son elements in ‘Kasalath’ Cen8.Table S3. Identification of long terminal repeat (LTR)-retrotranspo-son elements in ‘Nipponbare’ Cen8.Table S4. Shared and unshared long terminal repeat (LTR)-retro-transposon insertions between the two Cen8 regions of ‘Kasalath’and ‘Nipponbare’.Table S5. Duplicated and conserved genomic segments estimatedin or between the CentO block 1 of ‘Kasalath’ and ‘Nipponbare’.Table S6. Primer sequences used in the preparation of DNA probesfor FISH analysis.Table S7. Names and accessions of cultivated and wild-rice speciesused in this study.Table S8. Sequences and position of PCR primers in the virtualcontig of ‘Nipponbare’ Cen8.Please note: Wiley-Blackwell are not responsible for the content orfunctionality of any supporting materials supplied by the authors.Any queries (other than missing material) should be directed to thecorresponding author for the article.
REFERENCES
Altschul, S., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and
Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of
the flowering plant Arabidopsis thaliana. Nature, 408, 796–815.
Bennetzen, J.L., Ma, J. and Devos, K.M. (2005) Mechanisms of recent genome
size variation in flowering plants. Ann. Bot. 95, 127–132.
Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: a global alignment program.
Genome Res. 13, 97–102.
Caicedo, A.L., Williamson, S.H., Hernandez, R.D. et al. (2007) Genome-wide
patterns of nucleotide polymorphism in domesticated rice. PLoS Genet., 3,
e163, doi:10.1371/ journal.pgen.0030163.
Cheng, Z., Dong, F., Langdon, T., Ouyang, S., Buell, C.R., Gu, M., Blattner, F.R.
and Jiang, J. (2002) Functional rice centromeres are marked by a satellite
repeat and a centromere-specific retrotransposon. Plant Cell, 14, 1691–
1704.
Copenhaver, G.P. (2003) Using Arabidopsis to understand centromere func-
tion: progress and prospects. Chromosome Res. 11, 255–262.
Copenhaver, G.P., Nickel, K., Kuromori, T. et al. (1999) Genetic definition and
sequence analysis of Arabidopsis centromeres. Science, 286, 2468–2474.
Devos, K.M., Brown, J.K.M. and Bennetzen, J.L. (2002) Genome size reduction
through illegitimate recombination counteracts genome expansion in
Arabidopsis. Genome Res. 12, 1075–1079.
Fransz, P.F., Armstrong, S., Alonso-Blanco, C., Fischer, T.C., Torres-Ruiz, R.A.
and Jones, J. (1998) Cytogenetics for the model system Arabidopsis tha-
liana. Plant J. 13, 867–876.
Fukui, K.-N., Suzuki, G., Lagudah, E.S., Rahman, S., Appels, R., Yamamoto, M.
and Mukai, Y. (2001) Physical arrangement of retrotransposon-related re-
peats in centromeric regions of wheat. Plant Cell Physiol. 42, 189–196.
Gao, D., Gill, N., Kim, H.-R. et al. (2009) A lineage-specific centromere
retrotransposon in Oryza brachyantha. Plant J. doi:10.1111/ j.1365-313X.
2009.04005.x.
Hall, S.E., Kettler, G. and Preuss, D. (2003) Centromere satellites from
Arabidopsis populations: maintenance of conserved and variable domains.
Genome Res. 13, 195–205.
Hall, S.E., Kettler, G. and Preuss, D. (2006) Dynamic evolution at pericentro-
meres. Genome Res. 16, 355–364.
Hansen, C.N. and Heslop-Harrison, J.S. (2004) Sequences and phylogenies of
plant pararetroviruses, viruses and transposable elements. Adv. Bot. Res.
41, 165–193.
Harushima, Y., Yano, M., Shomura, A. et al. (1998) A high-density rice genetic
linkage map with 2,275 markers using a single F2 population. Genetics, 148,
479–494.
Henikoff, S. and Dalal, Y. (2005) Centromeric chromatin: what makes it
unique? Curr. Opin. Genet. Dev. 15, 177–184.
Henikoff, S., Ahmad, K. and Malik, H.S. (2001) The centromere paradox: stable
inheritance with rapidly evolving DNA. Science, 293, 1098–1102.
Heslop-Harrison, J.S. (2000) Comparative genome organization in plants:
from sequence and markers to chromatin and chromosomes. Plant Cell, 12,
617–635.
Heslop-Harrison, J.S., Murata, M., Ogura, Y., Schwarzacher, T. and Motoyoshi,
F. (1999) Polymorphisms and genomic organization of repetitive DNA from
centromeric regions of Arabidopsis chromosomes. Plant Cell, 11, 31–42.
Heslop-Harrison, J.S., Brandes, A. and Schwarzacher, T. (2003) Tandemly
repeated DNA sequences and centromeric chromosomal regions of Ara-
bidopsis species. Chromosome Res. 11, 241–253.
Hosouchi, T., Kumekawa, N., Tsuruoka, H. and Kotani, H. (2002) Physical map-
based sizes of the centromeric regions of Arabidopsis thaliana chromo-
somes 1, 2, and 3. DNA Res. 9, 117–121.
International Human Genome Sequencing Consortium. (2004) Finishing the
euchromatic sequence of the human genome. Nature, 431, 931–945.
International Rice Genome Sequencing Project. (2005) The map-based
sequence of the rice genome. Nature, 436, 793–800.
Jiang, J., Birchler, J.A., Parrott, W.A. and Dawe, R.K. (2003) A molecular view
of plant centromeres. Trends Plant Sci. 8, 570–575.
Kamm, A., Galasso, I., Schmidt, T. and Heslop-Harrison, J.S. (1995) Analysis
of a repetitive DNA family from Arabidopsis arenosa and relationships
between Arabidopsis species. Plant Mol. Biol. 27, 853–862.
Katagiri, S., Wu, J., Ito, Y., Karasawa, W., Shibata, M., Kanamori, H., Katayose,
Y., Namiki, N., Matsumoto, T. and Sasaki, T. (2004) End sequencing and
chromosomal in silico mapping of BAC clones derived from an indica rice
variety, Kasalath. Breed. Sci. 54, 273–279.
Khush, G.S. (1997) Origin, dispersal, cultivation and variation of rice. Plant
Mol. Biol. 35, 25–34.
Kimura, M. (1980) A simple method for estimating evolutionary rates of base
substitutions through comparative studies of nucleotide sequences. J. Mol.
Evol. 16, 111–120.
Kojima, Y., Ebana, K., Fukuoka, S., Nagamine, T. and Kawase, M. (2005)
Development of an RFLP-based rice diversity research set of germplasm.
Breed. Sci. 55, 431–440.
14 Jianzhong Wu et al.
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x
Kumekawa, N., Hosouchi, T., Tsuruoka, H. and Kotani, H. (2000) The size and
sequence organization of the centromeric region of Arabidopsis thaliana
chromosome 5. DNA Res. 7, 315–321.
Kumekawa, N., Hosouchi, T., Tsuruoka, H. and Kotani, H. (2001) The size and
sequence organization of the centromeric region of Arabidopsis thaliana
chromosome 4. DNA Res. 8, 285–290.
Lamb, J.C., Theuri, J. and Birchler, J.A. (2004) What’s in a centromere? Gen-
ome Biol. 5, 239.
Lee, H.-R., Zhang, W., Langdon, T., Jin, W., Yan, H., Cheng, Z. and Jiang, J.
(2005) Chromatin immunoprecipitation cloning reveals rapid evolutionary
patterns of centromeric DNA in Oryza species. Proc. Natl Acad. Sci. USA,
102, 11793–11798.
Lee, H.-R., Neumann, P., Macas, J. and Jiang, J. (2006) Transcription and
evolutionary dynamics of the centromeric satellite repeat CentO in rice.
Mol. Biol. Evol. 23, 2505–2520.
Ma, J. and Bennetzen, J.L. (2004) Rapid recent growth and divergence of rice
nuclear genomes. Proc. Natl Acad. Sci. USA, 101, 12404–12410.
Ma, J. and Bennetzen, J.L. (2006) Recombination, rearrangement, reshuffling,
and divergence in a centromeric region of rice. Proc. Natl Acad. Sci. USA,
103, 383–388.
Ma, J. and Jackson, S.A. (2006) Retrotransposon accumulation and satellite
amplification mediated by segmental duplication facilitate centromere
expansion in rice. Genome Res. 16, 251–259.
Ma, J., Devos, K.M. and Bennetzen, J.L. (2004) Analyses of LTR-retrotrans-
poson structures reveal recent and rapid genomic DNA loss in rice. Gen-
ome Res. 14, 860–869.
Ma, J., Wing, R.A., Bennetzen, J.L. and Jackson, S.A. (2007) Plant centromere
organization: a dynamic structure with conserved functions. Trends Genet.
23, 134–139.
Malik, H.S. and Bayes, J.J. (2006) Genetic conflicts during meiosis and the
evolution of origins of centromere complexity. Biochem. Soc. Trans. 34,
569–573.
Malik, H.S. and Henikoff, S. (2002) Conflict begets complexity: the evolution of
centromeres. Curr. Opin. Genet. Dev. 12, 711–718.
McCarthy, E.M., Liu, J., Gao, L. and McDonald, J.F. (2002) Long terminal
repeat retrotransposons of Oryza sativa. Genome Biol. 3, research,
0053.1–0053.11.
Murray, M.G. and Thompson, W.F. (1980) Rapid isolation of high molecular
weight plant DNA. Nucleic Acids Res. 8, 4321–4325.
Nagaki, K., Cheng, Z., Ouyang, S., Talbert, P.B., Kim, M., Jones, K.M.,
Henikoff, S., Buell, C.R. and Jiang, J. (2004) Sequencing of a rice centro-
mere uncovers active genes. Nat. Genet. 36, 138–145.
Nagaki, K., Neumann, P., Zhang, D., Ouyang, S., Buell, C.R., Cheng, Z. and
Jiang, J. (2005) Structure, divergence, and distribution of the CRR centro-
meric retrotransposon family in rice. Mol. Biol. Evol. 22, 845–855.
Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of
synonymous and nonsynonymous nucleotide substitutions. Mol. Biol.
Evol. 3, 418–426.
Peterson-Burch, B.D., Nettleton, D. and Voytas, D.F. (2004) Genomic
neighborhoods for Arabidopsis retrotransposons: a role for
targeted integration in the distribution of the Metaviridae. Genome Biol.
5, R78.
Rice Annotation Project. (2008) The Rice Annotation Project Database (RAP-
DB): 2008 update. Nucleic Acids Res. 36, D1028–D1033.
Sasaki, T., Matsumoto, T., Yamamoto, K. et al. (2002) The genome sequence
and structure of rice chromosome 1. Nature, 420, 312–316.
Schueler, M., Higgins, A., Rudd, N.K., Gustashaw, K. and Willard, H.F. (2001)
Genomic and genetic definition of a functional human centromere. Sci-
ence, 294, 109–115.
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C.,
Haussler, D. and Miller, W. (2003) Human-mouse alignments with BLASTZ.
Genome Res. 13, 103–107.
Smith, G.P. (1976) Evolution of repeated DNA sequences by unequal cross-
over. Science, 191, 528–535.
Talbert, P.B., Masuelli, R., Tyagi, A.P., Comai, L. and Henikoff, S. (2002) Cen-
tromeric localization and adaptive evolution of an Arabidopsis histone H3
variant. Plant Cell, 14, 1053–1066.
Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G.
(1997) The CLUSTAL_X windows interface: flexible strategies for multiple
sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25,
4876–4882.
Wu, J., Maehara, T., Shimokawa, T. et al. (2002) A comprehensive rice tran-
script map containing 6591 expressed sequence tag sites. Plant Cell, 14,
525–535.
Wu, J., Yamagata, H., Hayashi-Tsugane, M. et al. (2004) Composition and
structure of the centromeric region of rice chromosome 8. Plant Cell, 16,
967–976.
Yu, J., Wang, J., Lin, W. et al. (2005) The genomes of Oryza sativa: a history of
duplications. PLoS Biol. 3, e38, doi: 10.1371/journal.pbio.0030038.
Zhang, Y., Huang, Y., Zhang, L. et al. (2004) Structural features of the rice
chromosome 4 centromere. Nucl Acids Res. 32, 2023–2030.
Zhong, C.X., Marshall, J.B., Topp, C., Mroczek, R., Kato, A., Nagaki, K.,
Birchler, J.A., Jiang, J. and Dawe, R.K. (2002) Centromeric retroelements
and satellites interact with maize kinetochore protein CENH3. Plant Cell, 14,
2825–2836.
Zhu, Q. and Ge, S. (2005) Phylogenetic relationships among A-genome spe-
cies of the genus Oryza revealed by intron sequences of four nuclear genes.
New Phytol. 167, 249–267.
Rapid-evolution rice centromeres 15
ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x