The Plant Journal Comparative analysis of complete...

15
Comparative analysis of complete orthologous centromeres from two subspecies of rice reveals rapid variation of centromere organization and structure Jianzhong Wu 1,† , Masaki Fujisawa 2,†,‡ , Zhixi Tian 3,† , Harumi Yamagata 2 , Kozue Kamiya 2 , Michie Shibata 2 , Satomi Hosokawa 2 , Yukiyo Ito 2 , Masao Hamada 2 , Satoshi Katagiri 2 , Kanako Kurita 2 , Mayu Yamamoto 2 , Ari Kikuta 2 , Kayo Machita 2 , Wataru Karasawa 2 , Hiroyuki Kanamori 2 , Nobukazu Namiki 2 , Hiroshi Mizuno 1 , Jianxin Ma 3 , Takuji Sasaki 1 and Takashi Matsumoto 1,* 1 Plant Genome Research Unit, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan, 2 Research Division I, Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki, Japan, and 3 Department of Agronomy, Purdue University, West Lafayette, IN, USA Received 4 June 2009; revised 28 July 2009; accepted 7 August 2009. * For correspondence (fax +81 29 838 2302; e-mail [email protected]). These authors contributed equally to this work. Present address: Central Laboratories for Frontier Technology, Kirin Holdings Co., Ltd, Suematsu Nonoichi, Ishikawa, Japan. SUMMARY Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis and meiosis. This function is conserved across species, but the DNA components that are involved in kinetochore formation differ greatly, even between closely related species. To shed light on the nature, evolutionary timing and evolutionary dynamics of rice centromeres, we decoded a 2.25-Mb DNA sequence covering the centromeric region of chromosome 8 of an indica rice variety, ‘Kasalath’ (Kas-Cen8). Analysis of repetitive sequences in Kas-Cen8 led to the identification of 222 long terminal repeat (LTR)-retrotransposon elements and 584 CentO satellite monomers, which account for 59.2% of the region. A comparison of the Kas- Cen8 sequence with that of japonica rice ‘Nipponbare’ (Nip-Cen8) revealed that about 66.8% of the Kas-Cen8 sequence was collinear with that of Nip-Cen8. Although the 27 putative genes are conserved between the two subspecies, only 55.4% of the total LTR-retrotransposon elements in ‘Kasalath’ had orthologs in ‘Nipponbare’, thus reflecting recent proliferation of a considerable number of LTR-retrotransposons since the divergence of two rice subspecies of indica and japonica within Oryza sativa. Comparative analysis of the subfamilies, time of insertion, and organization patterns of inserted LTR-retrotransposons between the two Cen8 regions revealed variations between ‘Kasalath’ and ‘Nipponbare’ in the preferential accumulation of CRR elements, and the expansion of CentO satellite repeats within the core domain of Cen8. Together, the results provide insights into the recent proliferation of LTR-retrotransposons, and the rapid expansion of CentO satellite repeats, underlying the dynamic variation and plasticity of plant centromeres. Keywords: rice Cen8, LTR-retrotransposon, centromeric retrotransposons of rice, CentO satellite, active gene, centromere evolution. INTRODUCTION Despite the conserved function of the chromosomal site for kinetochore assembly, which plays a key role in the faithful segregation of sister chromatids during cell division, the centromere sequences of most multicellular organisms show tremendous variation in size and organization, even among related species (Henikoff et al., 2001; Malik and Henikoff, 2002; Jiang et al., 2003; Lamb et al., 2004; Henikoff and Dalal, 2005). Human centromeres are composed of tandemly arrayed approximately 171-bp AT-rich repeats (a satellites) that are arranged in a head-to-tail fashion, and vary in size from 3 to nearly 4 Mb (Schueler et al., 2001). Among the higher plants, cytological analysis developed in the last two decades by using fluorescence in-situ hybrid- ization (FISH) has demonstrated the presence of abundant repetitive DNA sequences in the centromeric regions of Arabidopsis, rice, wheat and other species (Fransz et al., ª 2009 The Authors 1 Journal compilation ª 2009 Blackwell Publishing Ltd The Plant Journal (2009) doi: 10.1111/j.1365-313X.2009.04002.x

Transcript of The Plant Journal Comparative analysis of complete...

Page 1: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

Comparative analysis of complete orthologous centromeresfrom two subspecies of rice reveals rapid variation ofcentromere organization and structure

Jianzhong Wu1,†, Masaki Fujisawa2,†,‡, Zhixi Tian3,†, Harumi Yamagata2, Kozue Kamiya2, Michie Shibata2, Satomi Hosokawa2,

Yukiyo Ito2, Masao Hamada2, Satoshi Katagiri2, Kanako Kurita2, Mayu Yamamoto2, Ari Kikuta2, Kayo Machita2, Wataru

Karasawa2, Hiroyuki Kanamori2, Nobukazu Namiki2, Hiroshi Mizuno1, Jianxin Ma3, Takuji Sasaki1 and Takashi Matsumoto1,*

1Plant Genome Research Unit, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan,2Research Division I, Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries, Tsukuba, Ibaraki,

Japan, and3Department of Agronomy, Purdue University, West Lafayette, IN, USA

Received 4 June 2009; revised 28 July 2009; accepted 7 August 2009.*For correspondence (fax +81 29 838 2302; e-mail [email protected]).†These authors contributed equally to this work.‡Present address: Central Laboratories for Frontier Technology, Kirin Holdings Co., Ltd, Suematsu Nonoichi, Ishikawa, Japan.

SUMMARY

Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis

and meiosis. This function is conserved across species, but the DNA components that are involved in

kinetochore formation differ greatly, even between closely related species. To shed light on the nature,

evolutionary timing and evolutionary dynamics of rice centromeres, we decoded a 2.25-Mb DNA sequence

covering the centromeric region of chromosome 8 of an indica rice variety, ‘Kasalath’ (Kas-Cen8). Analysis of

repetitive sequences in Kas-Cen8 led to the identification of 222 long terminal repeat (LTR)-retrotransposon

elements and 584 CentO satellite monomers, which account for 59.2% of the region. A comparison of the Kas-

Cen8 sequence with that of japonica rice ‘Nipponbare’ (Nip-Cen8) revealed that about 66.8% of the Kas-Cen8

sequence was collinear with that of Nip-Cen8. Although the 27 putative genes are conserved between the two

subspecies, only 55.4% of the total LTR-retrotransposon elements in ‘Kasalath’ had orthologs in ‘Nipponbare’,

thus reflecting recent proliferation of a considerable number of LTR-retrotransposons since the divergence of

two rice subspecies of indica and japonica within Oryza sativa. Comparative analysis of the subfamilies, time of

insertion, and organization patterns of inserted LTR-retrotransposons between the two Cen8 regions revealed

variations between ‘Kasalath’ and ‘Nipponbare’ in the preferential accumulation of CRR elements, and the

expansion of CentO satellite repeats within the core domain of Cen8. Together, the results provide insights

into the recent proliferation of LTR-retrotransposons, and the rapid expansion of CentO satellite repeats,

underlying the dynamic variation and plasticity of plant centromeres.

Keywords: rice Cen8, LTR-retrotransposon, centromeric retrotransposons of rice, CentO satellite, active gene,

centromere evolution.

INTRODUCTION

Despite the conserved function of the chromosomal site for

kinetochore assembly, which plays a key role in the faithful

segregation of sister chromatids during cell division, the

centromere sequences of most multicellular organisms

show tremendous variation in size and organization, even

among related species (Henikoff et al., 2001; Malik and

Henikoff, 2002; Jiang et al., 2003; Lamb et al., 2004; Henikoff

and Dalal, 2005). Human centromeres are composed of

tandemly arrayed approximately 171-bp AT-rich repeats (asatellites) that are arranged in a head-to-tail fashion, and

vary in size from 3 to nearly 4 Mb (Schueler et al., 2001).

Among the higher plants, cytological analysis developed in

the last two decades by using fluorescence in-situ hybrid-

ization (FISH) has demonstrated the presence of abundant

repetitive DNA sequences in the centromeric regions of

Arabidopsis, rice, wheat and other species (Fransz et al.,

ª 2009 The Authors 1Journal compilation ª 2009 Blackwell Publishing Ltd

The Plant Journal (2009) doi: 10.1111/j.1365-313X.2009.04002.x

Page 2: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

1998; Heslop-Harrison et al., 1999; Heslop-Harrison, 2000;

Fukui et al., 2001). The centromeres of Arabidopsis thaliana

and rice contain 178 and 155 bp, respectively, of tandemly

arrayed satellite repeats, ranging in size from 2.8 to

approximately 4.0 Mb, and from 60 kb to approximately

1.9 Mb, respectively, on different chromosomes (Kumeka-

wa et al., 2000, 2001; Cheng et al., 2002; Hosouchi et al.,

2002). Flanked by pericentromeric sequences consisting

largely of repetitive sequences, often with the clustering of

retroelements, these satellite repeats that are postulated

to mediate spindle attachment do not show sequence

homology between those of humans, Arabidopsis and rice.

One of the significant achievements of the rice genome

sequencing project was the complete sequencing of the two

rice centromeres of chromosomes 4 (Cen4) and 8 (Cen8)

from the japonica variety ‘Nipponbare’ (Wu et al., 2004;

Zhang et al., 2004; International Rice Genome Sequencing

Project, 2005). Sequence analysis of the 1.97-Mb genomic

region in ‘Nipponbare’ Cen8 identified about 200 transpos-

able elements and 440 copies of the 155-bp centromere-

specific satellite repeat CentO (Wu et al., 2004). Further

analysis by a comprehensive chromatin immunoprecipi-

tation (ChIP)-based study demonstrated the presence of

the kinetochore, a approximately 750-kb CENH3-binding

domain that defines the boundaries of the functional Cen8

region (Nagaki et al., 2004). An important discovery of these

studies was the identification of active genes within the core

domain of ‘Nipponbare’ Cen8.

Comparative analysis of orthologous sequences within

closely related species could shed light on the processes that

give rise to sequence divergence and structural changes.

Since the centromeres of most organisms have a dynamic

structure of size variation and sequence divergence, with

conserved function, comparative genomics provides an

ideal way to investigate the diversity of centromeric

sequences, and the underlying evolutionary mechanism,

with insights into general features of centromere biology

and function. Although the structure of the central domain is

still not known perfectly in the Arabidopsis centromere, for

example, comparative studies have revealed evidence that

the sequences of tandemly arrayed satellite repeats inter-

rupted by Athila derivatives appear to evolve rapidly,

highlighting its reorganization among the different species

(Kamm et al., 1995), as well as its maintenance of conserved

and variable domains within populations (Hall et al., 2003).

Current comparison of pericentromeres from four Brassic-

aceae species (A. thaliana, Arabidopsis arenosa, Capsella

rubella and Olimarabidopsis pumila) provides support to the

model in which plant pericentromeres may experience

selective pressures, distinct from euchromatin, with toler-

ance to rapid, dynamic changes in sequence content and

structure (Hall et al., 2006). Two subspecies of the Asian

cultivated rice Oryza sativa L. ssp. indica and japonica, are

estimated to have diverged from a common ancestor about

0.44 million years ago (Ma) (Khush, 1997; Ma and Bennetzen,

2004). A partial comparison of the rice Cen8 region between

the completed sequence of ‘Nipponbare’ and the draft

sequence of the indica variety ‘93-11’ showed a high

percentage (85%) of shared long terminal repeat (LTR)-

retrotransposon insertions (Ma and Bennetzen, 2006). This

value might be an overestimation, because the incomplete

sequence from ‘93-11’ allowed comparison of only 60% of all

LTR-retrotransposon elements in the ‘Nipponbare’ Cen8. In

addition, a comparison of the CentO satellite sequences and

chromosomal organization between the two subspecies

could not be performed on account of technical difficulties

associated with the assembly and chromosomal mapping of

these repeats within the whole-genome shotgun (WGS)

sequence of ‘93-11’. Centromere function might be associ-

ated with the interspersal of centromere satellite repeats

with other repetitive elements, primarily LTR-retrotranspo-

sons (Jiang et al., 2003; Lamb et al., 2004; Henikoff and

Dalal, 2005). The complete sequencing of the genome of

different rice varieties, or species, is needed for a compre-

hensive comparative analysis to fully explain the dynamic

evolutionary changes in centromere composition and struc-

ture. Here, we report the results from the sequencing of the

Cen8 region of ‘Kasalath’ (abbreviated as Kas-Cen8 hereafter

in the text), an indica rice variety that had been used before

for the construction of a high-density rice genetic map

(Harushima et al., 1998). Comparison of a 2.25-Mb Kas-Cen8

sequence with a 2.18-Mb sequence of the orthologous

region from ‘Nipponbare’ Cen8 (abbreviated as Nip-Cen8

hereafter the text) demonstrates the presence of highly

conserved genes in this centromeric region. We also iden-

tified many insertions, deletions, and duplications of chro-

mosome segments in this region, thus demonstrating the

dramatic structural changes that have occurred in the

centromeric DNA on chromosome 8 of ‘Kasalath’ and

‘Nipponbare’ rice. Detailed analysis of the sequences and

organization patterns of repetitive elements in the two

regions of Kas-Cen8 and Nip-Cen8 suggests that the recent

insertion of LTR-retrotransposons and the amplification of

CentO satellite monomers are primarily responsible for the

structural dynamics of the rice centromeres.

RESULTS

Sequencing and structural analysis of the Kas-Cen8 region

To decode the DNA sequence of a complete centromeric

region, it is necessary to construct a physical map of the

region, indicating the position of genomic clones that carry

large fragments, such as BAC (bacterial artificial chromo-

some) and PAC (P1-derived artificial chromosome) clones.

By using the methods as described in Experimental proce-

dures, we obtained 18 MTP clones from the ‘Kasalath’ BAC

library that covered the full Cen8 region, genetically mapped

by the two centromere-flanking DNA markers C1374 and

2 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 3: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

S21882S (Figure 1). This result demonstrates the value of

the published genomic sequence of ‘Nipponbare’ for the

construction of physical maps, even in complicated chro-

mosomal regions of related rice varieties. We sequenced the

18 BAC clones and generated 2.98 Mb of sequence data

(Table S1). Sequences of CentO satellites were confirmed in

three BAC clones (K0081B11, K0110D12 and K0116D04).

K0081B11 and K0110D12 overlapped within the central sec-

tion of the Kas-Cen8 region, which was further verified by

cytological analysis using the FISH method (Figure 1).

K0116D04, which contained the DNA marker S21882S, was

located on the long-arm side of the region. After removing

redundant sequences from the overlapping regions between

the neighboring BAC clones, we generated 2 249 426 bp of

continuous, high-quality DNA sequence, covering the entire

region of Kas-Cen8.

The overall analysis of base composition in the 2.25-Mb

sequence of Kas-Cen8 revealed an average G + C content of

45.0%, higher than that (43.6%) detected from the entire

genome (International Rice Genome Sequencing Project,

2005). Annotation of the Cen8 sequence predicted a total of

390 gene models (excluding transposon-related genes),

most of which were hypothetically predicted only on a

single computer program. On the basis of known full-length

cDNA sequences in rice, we identified 27 genes that encoded

unique proteins with known or unknown functions, includ-

ing two homologs of disease-resistance genes (Table 1). Six

of these genes were located within a 793-kb subregion from

nucleotide (nt) 1 216 242–2 009 728 of the 2.25-Mb virtual

contig that, based on the results of in silico mapping,

corresponds to the core domain of ‘Nipponbare’ Cen8

(Nagaki et al., 2004). We also identified a putative gene for

TGF-beta receptor-interacting protein (K0486F02.38) located

only 4.6 kb from the CentO satellite repeats.

Detailed analysis of repeat sequences within the 2.25-Mb

Kas-Cen8 region led to our identification of 222 LTR-retro-

transposon (class-I transposable elements) sequences: 88

intact or mostly intact elements and 90 solo LTRs flanked by

standard target-site duplications (TSDs), six intact or mostly

intact elements and four solo LTRs lacking TSDs, and 34

truncated elements, each of which contained at least one

identified LTR (Table S2). Of the 94 intact or mostly intact

retrotransposons, 41 (43.6%) were present in a nested

structure because of the single or multiple insertions of

DNA sequences derived from other retroelements. On the

basis of LTR sequence homology, we grouped the above 222

LTR-retrotransposon elements into 48 subfamilies (Table 2).

The Rire3 subfamily was the most abundant, consisting of 36

elements, which were located mostly in the pericentromeric

regions, and accounted for about 300 kb of the total

K01

22H

06

K00

63H

06

K00

65E

03

K00

07B

01

K00

48F0

5

K00

31E

03

K01

55E

09

K00

39A

02

K00

23E

10

K04

86F0

2

K00

81B

11

K01

10D

12

K00

98G

01

K00

98B

12

K02

53H

11

K04

13C

07

K01

16D

04

K01

55C

03

E31

128S

(54.

0 cM

)

C13

74

(54.

3 cM

)

S218

82S

(54.

3 cM

)

C10

983S

(55.

4 cM

)

R23

81

(54.

3 cM

)

E20

691S

(54.

3 cM

)

C51

155

(54.

3 cM

)

Cen8

K486F02 CentO

5 µm

8S 8L

Figure 1. Genetic and physical maps covering the ‘Kasalath’ Cen8 region.

The genetic map, bacterial artificial chromosome (BAC) contig and fluorescence in-situ hybridization (FISH) image are presented, from top to bottom. BAC clones

containing CentO satellite repeats are shown in yellow. 8S and 8L: short and long arms of chromosome 8.

Rapid-evolution rice centromeres 3

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 4: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

Tab

le1

List

of

pu

tati

veg

enes

pre

dic

ted

inth

e‘K

asal

ath

’C

en8

reg

ion

and

seq

uen

ced

iver

gen

cere

veal

edb

yco

mp

arat

ive

anal

ysis

Gen

en

o.

Gen

eID

in‘K

asal

ath

’N

ucl

eoti

de

po

siti

on

invi

rtu

alco

nti

gP

red

icte

dfu

nct

ion

Acc

essi

on

of

mat

ched

cDN

As

inG

enB

ank

Gen

eID

in‘N

ipp

on

bar

e’S

NP

(kb

)1)

Ka:K

sIn

del

(kb

)1)

1K

0122

H06

.474

71–8

060

Un

kno

wn

pro

tein

AK

0622

97O

J121

2_C

09.3

0–

02

K01

22H

06.9

2328

1–32

642

Pu

tati

veth

reo

nyl

-tR

NA

syn

thet

ase

AK

0703

78O

J121

2_C

09.6

7.58

0.19

0.45

3K

0122

H06

.31

9363

0–98

408

Bac

teri

alb

ligh

t-re

sist

ance

pro

tein

Xa1

-lik

eA

K06

6438

OJ1

212_

C09

.23

5.39

0.18

04

K01

22H

06.4

112

902

0–13

550

4P

uta

tive

bac

teri

alb

ligh

t-re

sist

ance

pro

tein

Xa1

AK

1212

98P

0024

C06

.103

6.88

1.01

0.21

5K

0122

H06

.43-

114

102

4–14

370

9P

uta

tive

tetr

atri

cop

epti

de

rep

eat

do

mai

n1

AK

0674

99P

0024

C06

.105

-115

.47

0.24

0.43

6K

0007

B01

.32

454

183–

455

469

Pu

tati

vest

ero

idsu

lfo

tran

sfer

ase

3A

K05

8698

P00

24C

06.1

2812

.90.

990

7K

0048

F05.

1555

262

7–55

854

9U

nkn

ow

np

rote

inA

K06

1379

OS

JNB

a006

3H21

.109

2.23

0.27

08

K00

48F0

5.25

594

071–

597

344

Pu

tati

veM

GD

Gsy

nth

ase

typ

eA

AK

0641

48O

SJN

Ba0

063H

21.1

230.

840

09

K00

31E

03.2

265

246

0–66

253

0P

uta

tive

CLB

1p

rote

inA

K06

9706

P00

45D

08.1

190.

59–

010

K00

31E

03.2

467

127

5–67

725

9P

uta

tive

chlo

rid

ech

ann

elA

K06

6375

P00

45D

08.1

200.

840

011

K00

31E

03.3

570

679

3–70

900

0P

uta

tive

fert

ility

rest

ore

rA

K10

1762

P00

45D

08.1

290.

450

012

K00

31E

03.4

673

727

8–74

542

1P

uta

tive

sucr

ose

-ph

osp

hat

esy

nth

ase

AK

1016

76O

J111

5_A

07.1

050.

31–

013

K01

55E

09.1

075

454

9–75

496

5U

nkn

ow

np

rote

inA

K07

2190

OJ1

115_

A07

.107

0–

014

K01

55E

09.1

777

156

1–77

295

4P

uta

tive

per

oxi

das

ep

recu

rso

rA

K10

6760

OJ1

115_

A07

.117

0–

015

K00

23E

10.2

91

031

445–

103

911

8P

uta

tive

form

amid

op

yrim

idin

e-D

NA

gly

cosy

lase

AK

0632

95O

SJN

Ba0

051M

20.1

251.

63–

016

K00

23E

10.3

21

046

367–

105

250

1P

uta

tive

cig

3>

cyto

kin

inin

du

cib

lg

ene

AK

1057

54O

SJN

Ba0

051M

20.1

280.

660.

290

17K

0486

F02.

381

222

002–

122

807

6P

uta

tive

TG

F-b

eta

rece

pto

r-in

tera

ctin

gp

rote

in1

AK

1220

60O

SJN

Ba0

061E

21.1

210

–0

18K

0098

G01

.19

145

068

1–1

453

712

Pu

tati

veLS

Uri

bo

som

alp

rote

inL1

5PA

K07

3645

B11

00F0

3.11

80

–0

19K

0098

B12

.43

166

101

4–1

669

044

CB

Sd

om

ain

-co

nta

inin

gp

rote

in-l

ike

AK

1219

69B

1136

D08

.130

0–

0

20K

0253

H11

.38

179

299

9–1

800

001

Pu

tati

vep

oly

(A)-

bin

din

gp

rote

inA

K06

5167

P04

51H

06.1

011.

010

0

21K

0116

D04

.18

1990

625–

199

1203

Un

kno

wn

pro

tein

AK

1030

62P

0406

D01

.113

0–

1.75

22K

0116

D04

.19

199

221

8-2

017

263

Exo

cyst

com

ple

xco

mp

on

ent

Sec

8-lik

eA

K07

0862

P04

06D

01.1

140.

630

0

23K

0155

C03

.21

214

472

4–2

150

146

Cyc

lase

-lik

ep

rote

inA

K10

8030

P04

65H

09.1

310

–3.

724

K01

55C

03.2

42

161

155–

216

472

8R

ibo

nu

cleo

pro

tein

-lik

eA

K12

1802

P04

65H

09.1

350.

950

025

K01

55C

03.2

6-1

217

096

2–2

173

643

Sex

ual

dif

fere

nti

atio

np

roce

ssp

rote

inis

p4-

like

AK

1212

57P

0465

H09

.136

-20

–0

26K

0155

C03

.32

220

596

0–2

206

424

Un

kno

wn

pro

tein

AK

1078

15P

0005

C02

.106

2.15

00

27K

0155

C03

.33

221

479

1–2

217

275

Asc

iclin

-lik

ear

abin

og

alac

tan

-pro

tein

-lik

eA

K11

9590

P00

05C

02.1

080

–0

Ave

rag

e2.

240.

360.

24

Gen

eslo

cate

din

the

core

do

mai

nar

eh

igh

ligh

ted

inb

old

.Ka:K

s,ra

tio

of

no

nsy

no

nym

ou

s(K

a)

tosy

no

nym

ou

s(K

s)su

bst

itu

tio

nsi

tes

det

ecet

dw

ith

inth

eex

on

sfr

om

each

pai

ro

fo

rth

olo

go

us

gen

esb

etw

een

the

two

Cen

8re

gio

ns

of

‘Kas

alat

h’

and

‘Nip

po

nb

are’

.

4 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 5: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

Table 2 Characterization of long terminal repeat (LTR)-retrotransposons within ‘Kasalath’ and ‘Nipponbare’ Cen8 regions

LTR-retrotransposonsubfamily

Cen8 in ‘Kasalath’ Cen8 in ‘Nipponbare’

Copy no.No. of intactelements

Total length ofsequence (bp) Copy no.

No. of intactelements

Total length ofsequence (bp)

Rire3 36 (6) 20 299 820 27 (8) 14 213 290noaCRR1/Osr37 28 (22) 12 87 517 14 (8) 8 43 735

Osr34 14 (7) 6 90 089 15 (5) 5 97 524Osr30 12 (7) 7 100 263 14 (8) 9 130 256Osr33 10 (0) 5 68 974 10 (1) 4 72 163CRR2 10 (7) 5 44 453 8 (4) 4 35 506

Osr29/Ovikoh 9 (3) 4 41 037 12 (7) 4 50 512Omasag 9 (3) 0 29 182 11 (6) 1 38 197Seaba 7 (1) 2 47 866 4 (1) 3 41 491Egah 7 (2) 3 43 494 13 (7) 4 88 796Osr41 6 (2) 2 20 885 6 (3) 2 27 910Osr8 5 (1) 2 23 792 3 (0) 1 11 532noaCRR2/Pawepe 5 (2) 1 4814 2 (0) 0 921

Osr26/Rire2 4 (1) 2 37 126 3 (2) 1 19 411Vemeal 4 (2) 1 33 019 2 (2) 0 10 211Osr25/Dasheng 4 (1) 4 29 314 3 (3) 2 16 298Aboov 4 (1) 2 22 536 3 (0) 2 19 950

CRR1/Rire7 4 (4) 2 18 688 3 (3) 3 22 862

Kangourou_osj 3 (2) 1 19 082 2 (2) 0 9061Jobe 3 (1) 0 16 312 2 (0) 0 10 368Kuvu 3 (1) 1 8750 4 (2) 1 9843Mesaaw 3 (1) 0 8535 5 (1) 2 27 182Osr40/Rire10 2 (1) 2 24 017 2 (1) 2 24 395Wube 2 (1) 2 19 590 0 (0) 0 0Ifisi 2 (1) 1 13 923 2 (1) 1 18 963Awab 2 (1) 0 5985 4 (2) 0 11 829Echidne_osj 2 (1) 0 2855 2 (1) 0 2826Rire1 2 (2) 0 2797 2 (2) 0 2742Suawoh 1 (0) 1 13 368 0 (0) 0 0Hopi 1 (1) 1 12 876 0 (0) 0 0Asuvi 1 (0) 1 9062 1 (0) 1 9077Yneub 1 (0) 0 7520 1 (0) 0 1184Awok 1 (1) 0 4892 2 (2) 0 2726Goatuw 1 (0) 1 4829 0 (0) 0 0Ibus 1 (1) 0 4413 1 (1) 1 7429Rn_363 1 (1) 1 4202 0 (0) 0 0Dendrobat_osj 1 (1) 1 3698 1 (1) 1 3693Obeh 1 (1) 0 2735 1 (1) 0 1613Kado 1 (1) 0 2404 1 (1) 0 2407Ofon 1 (0) 0 1392 1 (0) 1 13 180Vasy 1 (0) 0 991 1 (0) 0 1003Noedu 1 (0) 0 978 1 (0) 0 978Ornithorynque_osj/Haky 1 (1) 0 954 1 (1) 0 954Osr1 1 (0) 0 859 1 (0) 0 859Pese 1 (0) 0 761 1 (0) 0 1596Ifalu 1 (1) 1 500 1 (1) 1 500Gileub 1 (1) 0 38 0 (0) 0 0Panejy 1 (1) 0 234 3 (3) 0 702Osr13 0 (0) 0 0 1 (0) 1 6427Nori 0 (0) 0 0 1 (0) 0 1391Oren 0 (0) 0 0 1 (1) 1 12 921

Total 222 (95) 94 1 241 769 199 (92) 80 1 126 414

CRR elements are highlighted in bold. Numbers in parentheses represent copies located in the core domain.

Rapid-evolution rice centromeres 5

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 6: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

sequence. We also identified high copy numbers of the CRR

(centromeric retrotransposons of rice) subfamilies, includ-

ing 32 CRR1 (four CRR1 and 28 noaCRR1) and 15 CRR2 (10

CRR2 and five noaCRR2) elements. These CRR elements,

located mostly in the core domain of the Cen8 region,

accounted for about 155 kb of sequence. Overall, LTR-

retrotransposon sequences comprised about 55.2%

(1.24 Mb) of the entire region of Kas-Cen8. We found that

95 (42.8%) of the LTR-retrotransposon elements were dis-

tributed within the core domain of Kas-Cen8 (Figure 2).

Through the BLAST search using a consensus sequence

of CentO monomers derived from the ‘Nipponbare’ Cen8,

we identified 89.0 kb of CentO sequences in Kas-Cen8,

comprising 584 monomers of the 155-bp satellite repeats,

and accounting for 4.0% of the entire region (Table 3). The

majority (86.7 kb) of these CentO sequences were organized

into a large block (block 1, nt 1 234 540–1 383 007 in the 2.25-

Mb virtual contig). Located at the short-arm site of the in-

silico mapped core domain, this block consists of 11 tracts

(sub-blocks 1–11) of CentO satellite sequences that show

differences in size and orientation, and are interrupted by the

CRR elements (Figure 2). The largest tract (sub-block 5)

contains 174 tandemly arrayed CentO satellite monomers.

The remaining CentO sequences (2.3 kb) were located in

three short tracts outside the core domain on the long-arm

side of Cen8 (block 2, nt 2 048 528–2 071 077).

Sequence and structural comparison of the Kas-Cen8

and Nip-Cen8 regions

Sequence homology analysis showed that the 2.25-Mb

Kas-Cen8 sequence corresponds to the 2.18-Mb sequence in

Nip-Cen8 (IRGSP build 4.0, chromosome 8 pseudomolecule,

nt 11 954 044–14 133 830). Through pairwise comparative

analysis by BLAST and manual inspection, we found that

about 66.8% (1.50 Mb) of the Kas-Cen8 sequence is collinear

with the corresponding sequence in the Nip-Cen8 region

(Figure 3; Table 4). The BLAST alignment results identified

sites of 22 104 single nucleotide polymorphisms (SNPs) and

2834 indels (insertion or deletion) between the two Cen8

regions, revealing a frequency of 14.72 SNPs and 1.26 indels

per kb. We identified 33 large indels of more than 10 kb

along the two Cen8 regions. We also identified a large seg-

mental duplication in rice Cen8. For instance, as shown in

Figure 3, a triplication is apparent within a 210-kb subregion

located within the core domain of Nip-Cen8 (nt 1 036 343–

1 891 855 in the 2.18-Mb virtual contig), which is consistent

with a previous finding (Ma and Bennetzen, 2006).

To compare the genomic composition and structure of

the centromeric region between the two rice varieties, we

re-annotated the 2.18-Mb sequence of Nip-Cen8. This led to

the identification of 27 putative genes (Table 1), on the basis

of known transcripts, and 199 LTR-retrotransposon elements

noaCRR1

Osr26

noaCRR1*

noaCRR1

CRR1*

CRR2*

noaCRR1

CRR2noaCRR1

CRR1

noaCRR1*

noaCRR1noaCRR1noaCRR1noaCRR1noaCRR1

Mesaaw

Dendrobat_osj*CRR2Osr34Osr41noaCRR1

Hopi*

Echidne_osj

Kangourou_osj

noaCRR1*

noaCRR1

Omasag

Osr29*

Rire3*

CRR2*

Awok

Egah*

Osr34*Rire1

Osr29CRR1*Panejy

Osr29

Egah*

CRR2*

Osr34*

noaCRR1*

Omasag

CRR2*Wube*

Osr30*

Osr30

Ifisi

Vemeal

Osr34

Osr30*

Osr34Osr34

Vemeal

Osr25*OmasagKuvu

Osr30*

noaCRR2

Kangourou_osj*

Jobe

noaCRR1

Osr41*

m_363*

noaCRR1

CRR1noaCRR1

noaCRR1*

noaCRR1*

Osr34

Awab

Osr40*

CRR2

Osr30*

Rire3

noaCRR1

Kado

Aboov

Osr30

Rire3Obeh

Rire3*

noaCRR1*

Ibus

Seaba

Rire3

Osr8

Osr30*Ornithorynque_osj

Rire1

Gileub

Ifalu*

Rire3*

noaCRR1

CentO CRR Other LTR-retrotransposons

100-kb

(0.41)

(6.47)

(0.88)

(1.69)

(1.32)

(7.44)

(2.90)

(2.39)

(0.23)

(1.92)

(0.90)

(0.80)

(1.01)

(2.81)

(1.62)

(1.40)

(3.55)

(2.12)

(4.31)

(4.21)

(0.44)

(0.25)

(0)

(0.34)

(0)

(0.30)

(0.47)(0.24)

(0.56)

(0.17)

(0.61)

(1.49)

(0.24)

(0.79)

CentO block 1

8S 8L

1 21

6 24

2

2 00

9 72

8

Figure 2. Genomic distribution of long terminal repeat (LTR)-retrotransposon elements and CentO satellite repeats identified in the core domain (793 kb) of

‘Kasalath’ Cen8.

The unshared and shared LTR-retrotransposons between ‘Kasalath’ and ‘Nipponbare’ Cen8 regions are given above and below the sequence map, respectively.

Asterisks indicate the intact or mostly intact LTR-retrotransposon elements, and numbers in parentheses represent the estimated insertion date (Ma). Solo LTRs are

shown in red.

6 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 7: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

(Table S3) grouped into 45 subfamilies, on the basis of LTR

sequence homology (Table 2). These 27 genes are ortholo-

gous with the 27 putative genes predicted in Kas-Cen8. To

investigate the divergence of LTR-retrotransposon inser-

tions within the Cen8 region, we compared the chromo-

somal positions, sequences and structures of individual

elements, and identified 123 LTR-retrotransposon elements

common to ‘Kasalath’ and ‘Nipponbare’ (Table 5; see

Table S4 for details). Furthermore, to characterize the

amplification history of retroelements in two Cen8 regions,

Table 3 Sequences and structural analysis of CentO blocks detected in the ‘Kasalath’ Cen8 region

CentO Sequence

Position

Size (bp) Orientation CommentFrom To

Block 1 CentO sub-block 1 1 234 540 1 259 960 25 421 + 165 monomers

noaCRR1 1 259 961 1 264 335 4375 ) Intact, TDSCentO sub-block 2 1 264 344 1 265 026 688 + 5 monomers

noaCRR1 1 265 027 1 266 420 1394 ) PartialCentO sub-block 3 1 266 421 1 274 077 7657 + 50 monomers

CRR1 1 274 078 1 280 541 6464 + Intact, no TDSCentO sub-block 4 1 280 542 1 285 285 4744 ) 31 monomers

CRR2 1 285 286 1 293 040 7755 + IntactCentO sub-block 5 1 293 056 1 319 293 26 238 ) 174 monomers

noaCRR1 1 319 307 1 323 356 4050 + PartialCentO sub-block 6 1 323 366 1 328 778 5413 ) 36 monomers

CRR2 1 328 784 1 335 305 6522 ) Almost intactCentO sub-block 7 1 335 330 1 338 105 2776 + 18 monomers

noaCRR1 1 338 106 1 338 897 792 + Solo LTR, TDSCentO sub-block 8 1 338 992 1 343 023 4102 + 27 monomers

noaCRR2 1 343 024 1 343 521 498 ) PartialCentO sub-block 9 1 343 522 1 343 737 216 ) 2 monomers

CRR1 1 343 738 1 347 484 3747 + PartialCentO sub-block 10 1 347 485 1 352 727 5243 ) 34 monomers

noaCRR1 1 352 755 1 378 844 26 090 + CRR1 blocka

CentO sub-block 11 1 378 845 1 383 007 4163 ) 27 monomers

Block 2 CentO sub-block 1 2 048 528 2 049 600 1073 ) 7 monomersCentO sub-block 2 2 049 607 2 049 908 301 + 2 monomersJobe 2 051 317 2 067 687 5154 + Solo, TDSOsr33 2 052 360 2 063 576 11 217 + Intact, TDSCentO sub-block 3 2 070 159 2 071 077 918 + 6 monomers

CentO sequences in the core domain are highlighted in bold.aSeven CRR1 elements are custered in a nested pattern.

a b c

2.0

1.0

2.01.0

‘Kas

alat

h’

‘Nipponbare’

CentO block 1

CentO block 28L

8S

nt 1

216

242

-2 0

09 7

28

nt 1 036 343-1 891 855

Figure 3. Sequence alignment of the ‘Kasalath’

and ‘Nipponbare’ Cen8.

The positions of matched sequences detected by

BLASTZ (e < 10)20) are dot-plotted. The two

CentO blocks are encircled with the solid lines.

The subregion of the Cen8 core domain is

squared with a thick line, and is enlarged to

the right. The tandemly triplicated segments

detected only in ‘Nipponbare’ Cen8 are indicated

with a, b and c boxes in broken lines.

Rapid-evolution rice centromeres 7

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 8: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

we performed a phylogenetic analysis by using sequences

of reverse transcriptase (RT) domains from the intact or

mostly intact LTR-retrotransposons (53 in ‘Kasalath’ and 51

in ‘Nipponbare’). The tree generated by the neighbor-joining

method showed three main branches (Figure 4a; see Fig-

ure S1 for details). Branches I and II included 59 (eight

subfamilies, such as Rire3 and CRRs) and 35 elements (10

subfamilies such as Osr30 and Osr40), respectively, which

were members of the Ty3/gypsy family (Hansen and Heslop-

Harrison, 2004). Branch III was composed of only nine

elements (four subfamilies, such as Egah and Osr8) belong-

ing to the Ty1/copia family. We did not conduct the same

analysis for the remaining subfamilies grouped by LTR

sequences because they lacked RT domains like noaCRRs

and Dasheng, or carried the RT domain either with deleted or

inserted segments.

We also compared the DNA sequence and organization of

CentO satellite repeats between Kas-Cen8 and Nip-Cen8.

Both varieties harbor CentO satellite repeats at the corre-

sponding orthologous regions, thus suggesting conserva-

tion of two separated CentO blocks (Figure 3). However,

there was a notable difference between the varieties in the

copy number and organization pattern of CentO satellite

repeats within the large CentO block: block 1 (Figure 5).

CentO block 1 in Kas-Cen8 consisted of 569 copies of satellite

repeats (86.7 kb), which were interrupted by the CRR

elements into 11 sub-blocks. By comparison, the corre-

sponding CentO block in Nip-Cen8 had only 428 copies of

satellite repeats (68.5 kb), which were separated into three

sub-blocks by CRR elements. We extracted the intact or

mostly intact CentO satellite monomers from both blocks

(556 in ‘Kasalath’ and 428 in ‘Nipponbare’) for BLAST

analysis, revealing that the sequence of the CentO mono-

mers is highly conserved between the two varieties, with an

equal average identity of 94.6 � 1.9% in ‘Kasalath’ and

95.2 � 1.7% in ‘Nipponbare’ with the consensus CentO

sequence. To further investigate the evolutionary processes

underlying the formation of CentO blocks, we performed a

Table 4 Statistics and overall comparison of genomic sequencesbetween the two Cen8 regions of ‘Kasalath’ and ‘Nipponbare’

Total length of ‘Kasalath’ Cen8 sequence (bp) 2 249 426Total length of ‘Nipponbare’ Cen8 sequence (bp) 2 179 787Total length of collinearly aligned sequences (bp) 1 501 991Total number of SNPs within the alignmentsequence (bp)

22 104

Total sites of indels between the two Cen8 regions 2834Indels in length of <1 kb 2664Indels in length of 1–10 kb 137Indels in length of >10 kb 33

Table 5 Comparative analysis of long terminal repeat (LTR)-retrotransposons between two Cen8 sequences of ‘Kasalath’ and ‘Nipponbare’

LTR-retrotransposonCen8 in‘Kasalath’

Cen8 in‘Nipponbare’

Shared between‘Kasalath’ and ‘Nipponbare’

Unique to‘Kasalath’

Unique to‘Nipponbare’

Total number 222 199 123 99 76Length of sequence (bp) 1 241 769 1 126 414 625 247/658 754a 616 522 467 660Average age (Ma)b 1.12 1.15 1.71 0.39 0.33

aTotal of sequences respectively from the shared LTR-retrotransposons between ‘Kasalath’ and ‘Nipponbare’.bA substitution rate of 1.3 · 10)8 mutations per site per year was used to estimate the insertion date of LTR-retrotransposons. Data regarding theages of LTR-retrotransposons from the duplicated segments in ‘Nipponbare’ Cen8 was excluded.

0.2 0.01

‘Kasalath’ Cen8 ‘Nipponbare’ Cen8

Ty3-gypsy

Ty1-copia

I

II

III

I

II

(a) (b)Figure 4. Amplification of long terminal repeat

(LTR)-retrotransposons and satellite repeats in

the ‘Kasalath’ and ‘Nipponbare’ Cen8 regions, as

revealed by phylogenetic analysis.

(a) Tree generated from the sequence of reverse

transcriptase (RT) domains from 104 LTR-retro-

transposons.

(b) Tree generated from the sequence of 984

satellite monomers from the CentO block 1.

8 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 9: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

phylogenetic analysis of the above CentO monomers to

establish a neighbor-joining tree, which showed two main

branches, each composed of CentO monomers derived from

both varieties (Figure 4b; see Figure S2 for details). The

many small branches (sub-branches) evident under the two

main branches suggest the involvement of both ancient and

recent amplification of the CentO satellite monomers in the

rice centromere regions.

Genomic diversity of the Cen8 region in the Oryza genus

The genus Oryza comprises 23 species with nine different

genome types (AA, BB, CC, EE, FF, GG, BBCC, CCDD and

HHJJ). A comparison of these sequences will help us

understand how rapidly the Cen8 sequence has undergone

divergence during genome evolution. We performed PCR

analysis to screen Cen8 sequences in 94 samples derived

from cultivated or wild rice, covering all species or genome

types in Oryza. Among the eight primer pairs unique to

active genes both in Kas-Cen8 and Nip-Cen8 regions, six

amplified similar DNA fragments from all Oryza species

(Figure S3). The remaining two amplified DNA fragments

exhibiting variable sizes among the different species or

varieties. By comparison, only three out of 11 primer pairs

specific to CENH3-binding sites amplified DNA fragments

that were similar in most Oryza species. The remaining eight

only amplified DNA fragments from AA-genome species.

One of these pairs only amplified DNA fragments from

Oryza sativa and the progenitor species Oryza rufipogon.

The CentO-specific primer pair only amplified PCR frag-

ments from AA-genome species.

DISCUSSION

The difficulties associated with the sequencing and assem-

bly of entire centromeres in higher eukaryotes, as encoun-

tered in the highly studied genomes of humans and

Arabidopsis, in which sequence gaps remain in all centro-

meres (Arabidopsis Genome Initiative, 2000; International

Human Genome Sequencing Consortium, 2004), have lim-

ited our understanding of the evolutionary mechanisms

underlying the sequences and structures of centromeres.

The recent completion of the genomic sequence of ‘Nip-

ponbare’ has allowed the partial inter- or intra-chromosomal

comparison of rice centromeric sequences, which has pro-

vided evidence of the dramatic differences in composition

and structure of centromeric regions (Ma and Bennetzen,

2006; Ma et al., 2007). To fully understand the evolutionary

dynamics of the first completely sequenced centromere

of any species, we completely decoded the entire Cen8

sequence on chromosome 8 from ‘Kasalath’. Analysis of the

Kas-Cen8 sequence, and its comparison with the Nip-Cen8

sequence, demonstrated the presence of highly conserved

active genes, but rapidly diversified insertions of LTR-retro-

transposons and CentO satellite repeats in the two rice

subspecies. This study enhances our understanding of the

molecular mechanisms underlying evolutionary processes,

and of centromere function.

8L

CentO

CRR

20-kb

Subb

lock

1 (

165)

Subb

lock

2 (

5)

Subb

lock

3 (

50)

Subb

lock

4 (

31)

Subb

lock

5 (

174)

Subb

lock

6 (

36)

Subb

lock

7 (

18)

Subb

lock

8 (

27)

Subb

lock

9 (

2)

Subb

lock

10

(34)

Subb

lock

11

(27)

Subb

lock

1 (

218)

Subb

lock

2 (

51)

Subb

lock

3 (

159)

8S

8L8S

‘Kasalath’

‘Nipponbare’

Figure 5. Structural dynamics of CentO block 1 between ‘Kasalath’ and ‘Nipponbare’ Cen8, as revealed by the retrotransposon insertions and segmental duplication

of satellite repeats.

Numbers in parentheses indicate the copies, and white lines with arrows represent the orientation of satellite monomers of each CentO sub-block. The monomer

pairs (four or more monomers) showing most identity within or between the two Cen8 regions, as revealed by phylogenetic analysis, are connected by the black

curved or straight lines.

Rapid-evolution rice centromeres 9

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 10: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

Low-density, highly conserved gene sequences in rice Cen8

Expressed transcripts indicated 31 419 putative gene loci in

the 382-Mb sequence of ‘Nipponbare’, indicating a genome-

wide gene density of one gene per 12 kb (Rice Annotation

Project, 2008). When all hypothetical genes were ignored,

the two Cen8 regions were found to contain the same

number of 27 putative gene loci, supported by comparison

with known expressed transcripts, showing a very low gene

density of only about one gene per 83 kb. All 27 putative

genes in Kas-Cen8, including six annotated within the

in-silico mapped domain, had orthologs within Nip-Cen8

(Table 1). No other putative genes with evidence of full- or

partial-length cDNA sequences were identified within the

indel or duplication sites in either genome, on the other

hand, suggesting that gene content is conserved between

the two subspecies. Expressed genes have also been

reported in the Arabidopsis centromeres (Copenhaver et al.,

1999). Based on the present result, however, no expressed

genes with identical homology were found between the

centromere regions of rice and Arabidopsis, which implies

the presence of a variation in the content of genes associated

with the centromere sequences between the monocotyle-

don and dicotyledonous species, although they shared a

common structural organization that contain numerous

satellite repeats surrounded by flanking DNA rich in retro-

elements and transposons. It appears that genes within

the rice Cen8 regions have undergone a similar degree of

sequence divergence as the other genomic regions, as the

average rates of SNPs and indels (2.24 SNPs per kb and

0.24 indels per kb) observed from a comparison of their

coding regions between Kas-Cen8 and Nip-Cen8 were very

close to the values obtained from a genome-wide analysis

between ‘93-11’ and ‘Nipponbare’ (3.00 SNPs per kb and

0.22 indels per kb), or among a diverse panel (2.29 SNP per

kb) of Oryza sativa accessions (Yu et al., 2005; Caicedo et al.,

2007). Although the putative genes located within or close to

the core domain tend to have reduced rates of SNPs and

indels, the average rates of each detected from the coding

regions of the above 27 genes are notably lower than that

from the entire Cen8 region (14.72 SNPs per kb and

1.26 indels per kb). These observations suggest that natural

selection and adaptation of active genes have taken place

under a highly heterochromatic environment. The highly

conserved sequences of active genes observed among all

species of Oryza through PCR analysis support this notion.

Retrotransposons of gypsy-like subfamilies predominated

in rice centromeres

A previous study reported at least 59 distinct LTR-retro-

transposon groups existing in the euchromatic regions of

the rice genome, in which almost two-thirds consisted of

copia-like elements, but where gypsy-like elements out-

numbered copia-like elements by a ratio of 2:1 (McCarthy

et al., 2002). Using a similar method in the present study,

we identified 51 retrotransposon subfamilies (41 shared

between ‘Kasalath’ and ‘Nipponbare’), with a variable copy

number ranging from 1 to 36, within the two Cen8 regions

(Table 2), in which 15 subfamilies have already been

characterized in the euchromatic regions. This observation

provides a comprehensive description of compositional

features and evolutionary perspectives for the retroelements

in the rice centromeres. Based on the phylogenetic analysis

using the intact RT domains from 104 LTR-retrotransposons,

for example, less than one-fifth of the 22 subfamilies con-

sisted of copia-like elements, and the elements of gypsy-like

subfamilies predominated over the copia-like elements by a

ratio of approximately 10:1 (49:4 in ‘Kasalath’ and 46:5 in

‘Nipponbare’) within the Cen8 region, fivefold higher than

that observed in the euchromatin (Figures S1 and 4a). This

observation consequently addresses the obviously non-

uniform chromosomal distribution of the two families of

LTR-retrotransposons, Ty1/copia and Ty3/gypsy, also known

as Pseudoviridae and Metaviridae, respectively, in the rice

genome. A similar result was also reported previously in the

pericentromeres of Arabidopsis, although the percentage of

LTR-retrotransposons in its genome is one-tenth of that

observed in the rice genome (Peterson-Burch et al., 2004).

Our results thus imply that the two distantly related species,

which diverged from one another around 200 Ma, have a

common feature for the preferential accumulation of gypsy-

like retroelements in the centromere regions. An extensive

comparison of genomic sequences and distribution of

retroelements between the above two model organisms

might be needed for a complete understanding of the

functional and evolutionary mechanisms of centromeres

between distantly related plant species.

Dynamic structural variation in the Kas-Cen8 and Nip-Cen8

regions by recent LTR-retrotransposon insertion

As expected, the present study revealed a rapid divergence

of sequences and structure between the centromeres of the

two rice subspecies: only 66.8% (1.50 Mb) of the Kas-Cen8

sequence was collinear with the Nip-Cen8 sequence. We

investigated the cause of the major structural variations in

the two Cen8 regions by extensive characterization and

comparison of the repeat sequences between them. We

found 23 more LTR-retrotransposon elements (150-kb

sequences) in Kas-Cen8 (222 in total) than in Nip-Cen8 (199 in

total), but only 123 LTR-retrotransposon insertions in com-

mon between the two rice varieties (Table 5). This finding

suggests that up to 44.6% (99 elements) and 38.2% (76

elements), respectively, of the LTR-retrotransposon inser-

tions identified in Kas-Cen8 and Nip-Cen8 have accumu-

lated independently after the divergence of the indica and

japonica subspecies. These recently inserted LTR-retro-

transposons account for 0.62 Mb in Kas-Cen8, and 0.47 Mb

in Nip-Cen8, explaining the unexpectedly high number of

10 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 11: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

unaligned sequences between the two subspecies. To pro-

vide further evidence in support of this suggestion, we

estimated the evolutionary time scale of insertion of all

intact or mostly intact LTR-retrotransposons within the two

Cen8 regions by an analysis of LTR sequences. We found

that the shared (92 between ‘Kasalath’ and ‘Nipponbare’)

and unshared (47 in ‘Kasalath’ and 30 in ‘Nipponbare’) LTR-

retrotransposon elements showed different insertion dates

of 1.71 and 0.36 Ma, on average (Table 5). This result is

consistent with the estimated time of divergence of

Oryza sativa from Oryza rufipogon, its wild ancestor, about

0.44 Ma (Khush, 1997; Ma and Bennetzen, 2004). In sum-

mary, our findings demonstrate that the major structural

variations in the two regions of Kas-Cen8 and Nip-Cen8 are

caused by the recent insertion of LTR-retrotransposons.

Although LTR-retrotransposons have preferentially accu-

mulated in the Cen8 region, our results also indicate that

LTR-retrotransposon sequences have been eliminated from

the genome. We estimate that about 63.4 and 51.1% of the

shared and unshared LTR-retrotransposon elements are

composed of incomplete structures, including solo LTRs and

internally deleted or truncated elements. Unequal homolo-

gous recombination between two LTRs of a single element,

as well as illegitimate recombination, which does not

require extensive sequence homology to generate truncated

elements, are suggested as the main mechanisms underly-

ing the deletion of LTR-retrotransposon DNA (Devos et al.,

2002; Bennetzen et al., 2005). The ratio of solo LTRs to intact

elements within the Cen8 regions was 0.96:1 (94:98) in

‘Kasalath’ and 1.13:1 (86:76, excluding copies within the

triplicated segments) in ‘Nipponbare’, lower than the ratios

of 2.2:1 and 1.6:1 previously calculated for the euchromatic

regions and whole genome (Ma et al., 2004; Ma and

Bennetzen, 2006). Although the complete inhibition of

homologous recombination within Cen8 might repress

unequal recombination, interestingly, ‘Kasalath’ and ‘Nip-

ponbare’ showed different rates of elimination of LTR-

retrotransposon sequences from the Cen8 regions after

their divergence. The ratio of solo LTRs to intact elements

inserted before divergence was almost the same in ‘Kasa-

lath’ and ‘Nipponbare’, at 1.41:1 (61:44) and 1.42:1 (61:43),

respectively, but after divergence changed to 0.59:1 (32:54)

and 0.76:1 (25:33). This finding provides evidence of the

involvement of rice domestication in the evolution of Cen8

regions, and forms the basis for investigations into whether

retrotransposon selection, such as selection for CRR

elements, is important for centromere function.

Accumulation and rearrangement of CRR and CentO

sequences dramatically reshaped the core domain

of rice Cen8

The overall density of LTR-retrotransposons was higher in

the core domain of Kas-Cen8 and in the orthologous region

of Nip-Cen8 than in the pericentromeric regions (flanking the

core domain), with average increases of 3.3 and 2.7 per

100 kb, respectively (Figure S4). Although a triplication that

had led to the accumulation of 20 copies of LTR-retrotrans-

posons, with the subsequent loss of two, was observed only

in Nip-Cen8, almost the same numbers of LTR-retrotrans-

posons were present within the core domain of each: 95 in

‘Kasalath’ and 92 in ‘Nipponbare’ (Table 2). In Arabidopsis,

the gypsy-like subfamily of Athila appeared to be most

prevalent within its pericentromeric heterochromatin, and

strictly associated with the 178-bp satellite repeats (Copen-

haver, 2003; Peterson-Burch et al., 2004). Closely related

to aboov, Osr34 and other subfamilies, as revealed by

the phylogenetic analysis in the present study (Figure S1),

and CRR elements, which are enriched in the rice

centromeric region, are thought to be essential for centro-

mere function, together with CentO satellite repeats (Cheng

et al., 2002; Nagaki et al., 2004, 2005). With the aim of

further understanding the molecular and evolutionary

mechanisms underlying the conserved function of rice

centromeres, we compared the sequences and organiza-

tional patterns of the CRR elements and CentO satellite

repeats in the Cen8 core domain between the two rice

subspecies.

A total of 47 (155.5 kb) and 27 (103.0 kb) CRR elements

(CRR1, noaCRR1, CRR2 and noaCRR2 subfamilies) accumu-

lated in Kas-Cen8 and Nip-Cen8, respectively, accounting for

21.2 and 13.6% of the LTR-retrotransposon insertions in

these rice subspecies (Table 2). Thirty-five (74.5%, 113.8 kb)

and 15 (55.5%, 61.4 kb) of these elements were organized

within the core domain of Kas-Cen8 and Nip-Cen8, respec-

tively, and accounted for 36.8 and 16.3% of the LTR-

retrotransposon insertions in this subregion (Tables 2 and

S4). This finding indicates that more CRR elements accu-

mulated in the core domain of Kas-Cen8 than in that of Nip-

Cen8. The eight orthologous CRR elements among the 55

shared LTR-retrotransposons (average insertion date of

2.42 Ma) in the core domain of Kas-Cen8 and Nip-Cen8 are

most likely to be the result of ancient insertions (Figure 2).

By comparison, most of the CRR elements (27 out of 35;

average insertion date of 0.32 Ma) in this domain of Kas-

Cen8 accumulated after the divergence of indica and japon-

ica. It is notable that CRR elements make up 67.5% of the

total LTR-retrotransposons (40 elements) recently inserted

into the ‘Kasalath’ core domain, and that 15 CRR elements

co-localize with CentO satellite sequences (CentO sub-blocks

1–11). Segmental duplication rather than integration of

active elements has been suggested as the mechanism of

accumulation of most of the CRR elements in the Cen4 core

region (Ma and Jackson, 2006). On the basis of the young

insertion time and the unique TSD sequences observed in

this study, we suggest that the accumulation of CRR

elements in Kas-Cen8 derives from a recent insertion, rather

than from segmental duplication. Unexpectedly, seven CRR

elements were clustered between CentO sub-blocks 10 and

Rapid-evolution rice centromeres 11

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 12: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

11 to take the shape of a 29.1-kb noaCRR1 block (Figures 2

and 5). From the structure and organization pattern of each

CRR, we assume that this noaCRR1 block was formed by

several episodes of inter-element unequal recombination

(Figure S5).

Similarly, a higher copy number of CentO satellite mono-

mers were also found within the core domain of Kas-Cen8

(CentO block 1, measuring 18.2 kb) than were found in the

core domain of Nip-Cen8. These variations between the two

rice varieties indicate the preferential accumulation of both

CRR and CentO satellite-repeat sequences in Kas-Cen8.

Although we observed small duplications or deletions of

DNA sequences (approximately 12 bp) within a number of

CentO monomers, the CentO monomers were highly con-

served between the two subspecies, with 94.6–95.2% con-

sensus sequence identity. Centromeric satellite repeats can

be homogenized by unequal conversion, whereas variation

in copy number and arrangement can be caused by unequal

exchange (Lee et al., 2006; Ma and Jackson, 2006; Malik and

Bayes, 2006). Phylogenetic analysis of the CentO satellite

repeats in ‘Nipponbare’ Cen8 has already demonstrated a

recent inverted segmental duplication that was responsible

for the amplification of CentO monomers (Ma and Bennet-

zen, 2006). Using the same method to analyse the most

related monomers of CentO satellite repeats revealed by the

neighbor-joining phylogenetic tree, we found no similar

segmental duplication in Kas-Cen8 (Figure 5). This finding

suggests that the known segmental duplication of CentO

satellite repeats between sub-blocks 1 and 3 in Nip-Cen8

must have occurred after the divergence of indica and

japonica. We investigated 26 segments in Kas-Cen8 that

contained ordered pairs of CentO satellites (four or more

monomers), showing a very high degree of sequence

similarity (ranging from 98 to 100%). On the basis of their

positions and orientations (Table S5), these segments seem

to have derived from multiple tandem duplications. In

support of this finding, orthologous pairs of CentO mono-

mers were found only at the start of the first CentO sub-block

and at the end of the last CentO sub-block (Figure 5).

Although we were not able to trace the origin of most

internal CentO sub-blocks in Kas-Cen8, because of the very

recent amplification, the rapid and dramatic rearrange-

ments, and reshuffling of the satellite repeats, as indicated

by the close proximity of branches or sub-branches in the

phylogenetic tree, the above observations provide strong

evidence that the core domain of rice Cen8 has been

dramatically reshaped through the variable accumulation

of CRR elements, and the rapid expansion or rearrangement

of CentO satellite repeats, in the two rice subspecies.

Because of the presence of active genes in the CENH3-

binding domain and low numbers of CentO repeats, ‘Nip-

ponbare’ Cen8 is thought to represent an intermediate stage

in the evolution of centromeres, similar to human neocen-

tromeres, to fully mature centromeres that accumulate

megabases of homogeneous satellite arrays (Nagaki et al.,

2004). The high rates of rearrangements of CRR elements

and rapid expansion of CentO satellite repeats, observed

within the two rice subspecies here, indicate that centromere

function is maintained regardless of the dynamic changes in

genomic structure. Consequently, our results raise further

questions. Do these two classes of centromeric repetitive

sequences have similar or distinct roles in centromere

function in rice? Are other types of retrotransposons or

centromeric satellite repeats involved, directly or indirectly,

in centromere performance? To explain the rapid sequence

divergence of the genes encoding CENH3 proteins and

centromere DNA repeats, an evolutional model involving

centromere drive has been recently proposed in both

animals and plants (Smith, 1976; Malik and Henikoff, 2002;

Talbert et al., 2002; Heslop-Harrison et al., 2003). Supposing

that centromere variants with enriched retrotransposons

and expanded satellite-repeat arrays increase CENH3 bind-

ing sites, and facilitate microtubule-binding ability during

female meiosis in rice (Ma et al., 2007), it is possible that the

preferential accumulation of centromere-specific retrotrans-

posons and satellite repeats is an outcome of centromere

drive.

Conservation of genes and divergence of CENH3-binding

and CentO sequences in the Cen8 region of Oryza

PCR amplification with primers designed from the coding

regions of putative genes indicated that all genes annotated

in the Cen8 region of cultivated rice are conserved within the

genus Oryza (Figure S3). These primers could be used to

provide landmarks for future structural and evolutionary

analysis of Cen8 in different rice varieties, as well as in wild

species of rice. Because the conserved genes in the Cen8

region, which is embedded by abundant repeat sequences,

are active, these genes are good candidates for future

studies of the mechanisms controlling gene expression un-

der highly heterochromatic environments. Differing clearly

from the findings for active genes, our results provide strong

evidence that CENH3-binding sites and CentO satellite

repeats are only highly conserved within species with AA

genomes. This result supports previous reports that satellite

repeats are only preserved in closely related species (Zhong

et al., 2002; Lee et al., 2005). AA-genome species are esti-

mated to have diverged from common ancestors with the BB

genome only about 2 Ma (Ma and Bennetzen, 2004; Zhu and

Ge, 2005). Recent analysis of the Cen8 sequence in a

wild-rice species Oryza brachyantha with the FF genome

provided strong evidence of the amplification of a new ret-

roelement in the last few million years, to replace the

canonical CRR detected in other Oryza species (Gao et al.,

2009). Further investigations, i.e. sequencing and comparing

the retrotransposons, the satellite repeats, as well as the

CENH3 gene among different Oryza species, to determine

whether and/or how the rapid divergence of centromeric

12 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 13: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

repeated sequences has played a role in the evolution of

functional centromeres, and then in the speciation in Oryza,

should be conducted in the future.

EXPERIMENTAL PROCEDURES

Mapping, sequencing and annotation of the

Kas-Cen8 region

The construction of a BAC library, end-sequencing and in-silicomapping of BAC clones from the rice variety ‘Kasalath’ (Oryza sa-tiva L. ssp. indica) were conducted as described before (Katagiriet al., 2004). Relying on the completed genomic sequence ofchromosome 8 (accession number AP008214) of rice variety ‘Nip-ponbare’ (Oryza sativa L. ssp. japonica), we generated six BACcontigs, flanked by the co-segregated markers C1374 and S21882,within the genetically defined centromeric region of chromosome8 of ‘Kasalath’ (Harushima et al., 1998). The physical locations ofthese contigs were confirmed by PCR analysis of 16 genetic andexpressed sequence tag (EST) markers located within the Cen8region (Harushima et al., 1998; Wu et al., 2002). Physical gapsbetween adjacent BAC contigs were closed through chromosomalwalking by using unique BAC-end sequences to select bridgeclones from the whole ‘Kasalath’ BAC library. Minimum-tiling-path(MTP) clones within the completed BAC physical map wereexamined and selected for shotgun sequencing to give anapproximately 10-fold sequence coverage, using a previouslydescribed method (Sasaki et al., 2002; Wu et al., 2004). Sequencegaps remaining in BAC clones were generally filled by sequencingthe bridge subclones with custom primers. Regions with low-quality scores were improved by resequencing with customprimers or by alternative chemistries. We applied a transposoninsertion/sequencing system (Genome Priming System GPS-1;New England Biolabs, http://www.neb.com) for the completesequencing of the subclones that contained highly repeatedsequences. Assembled sequences were confirmed to have <1 errorper 10 000 bases, and were verified to resolve any misassembly.Sequences from the overlapping regions between neighboringBAC clones were checked, and were confirmed to be correct. TheDNA sequence analysis software SEQUENCHER 4.1 (Gene Codes,http://www.genecodes.com) was used to create a single, non-overlapped contiguous sequence, based on each completelysequenced BAC clone from the ‘Kasalath’ Cen8 region. Geneannotation was performed using our previously developed andverified annotation system (International Rice Genome SequencingProject, 2005; Rice Annotation Project, 2008).

Fluorescence in-situ hybridization (FISH)

A FISH experiment was performed according to a previouslydescribed protocol, with minor modifications (International RiceGenome Sequencing Project, 2005). Briefly, fresh young leaveswere chopped with a sharp scalpel and were then filtered through60-mm nylon mesh (Millipore, http://www.millipore.com) toremove debris, and to isolate nuclei in the filtrate. Nucleus lysis buffer(0.5% SDS, 10 mM EDTA, 10 mM Tris, pH 7.0) was added to a sus-pension of nuclei placed on a glass slide, and DNA fibers were leftto extend from the nuclei by gravity. The PCR-amplified DNAprobes from the ‘Kasalath’ BAC clone K0486F02 (within a 51-kbsubregion) or CentO sequences were labeled with digoxigenin-dUTP or biotin-dUTP, respectively, and then hybridized with theDNA fibers (Table S6). Detection was performed with a fluoresceinisothiocyanate (FITC)-conjugated anti-digoxigenin antibody or Cy3-conjugated avidin. FISH signals were captured by using a BX51microscope (Olympus, http://www.olympus.com) with a CoolSNAP

HQ charge-coupled device camera (Roper Scientific, http://www.roperscientific.com).

Classification of repeat sequences

Intact LTR-retrotransposons were determined by using LTR-STRUC,an LTR-retrotransposon mining program (McCarthy et al., 2002),and by methods previously described (Ma and Bennetzen, 2004; Maet al., 2004). Solo LTRs and truncated elements were identified bysequence homology searches against the rice LTR-retrotransposondatabase collected from the completed ‘Nipponbare’ genomesequence, generated by the International Rice Genome SequencingProject (Ma and Bennetzen, 2006). The structures of all LTR-retro-transposons identified were confirmed by manual inspection. Forestimating the insertion date of LTR-retrotransposons, we extractedtwo LTR sequences from each intact or mostly intact LTR-retro-transposon, and aligned them using CLUSTALX (Thompson et al.,1997). After editing manually, if necessary, we applied a mutationrate of 1.3 · 10)8 substitutions per base per year for the age calcu-lation (Ma and Jackson, 2006). For characterization of centromeresatellite repeats in the Kas-Cen8 region, we used a consensussequence of CentO monomers, previously reported from the Cen8region of ‘Nipponbare’, for BLAST analysis (Wu et al., 2004).

Alignment and comparison of genomic sequences between

the two regions of Kas-Cen8 and Nip-Cen8

Genomic sequences were compared between the orthologous Cen8regions of ‘Kasalath’ and ‘Nipponbare’ by using the BLAST algo-rithm (Altschul et al., 1997). Homologous sequences were alignedand dot-plotted with BLASTZ (Schwartz et al., 2003). SNP and indel(insertion or deletion) sites present between the two orthologousgenomic regions were detected by using AVID (Bray et al., 2003).Ratios of non-synonymous substitution (Ka) to synonymous sub-stitution (Ks) between the orthologous genes were calculated withSNAP (http://hiv-web.lanl.gov/content/hiv-db/SNAP/README.html)by the Nei and Gojobori method, with Jukes–Cantor correction (Neiand Gojobori, 1986). Sequences of RT domain and satellite repeatswere extracted, respectively, from all intact or mostly intact LTR-retrotransposons and CentO monomers in both varieties, for aphylogenetic analysis to build neighbor-joining trees by the Kimura(1980) two-parameter method.

PCR amplification of centromeric sequences within the

genus Oryza

We prepared a set of 96 varieties and wild-rice accessions thatrepresent all species and genome types from AA to HHJJ of Oryza(Table S7). Rice varieties (Oryza sativa L. ssp. japonica and indica)were drawn from the Rice Diversity Research Set of Germplasm,developed by the National Institute of Agrobiological Sciences(NIAS) (Kojima et al., 2005). Accessions of African cultivated orwild-rice species were obtained from the collections in theresource centers of the National Institute of Genetics (NIG) or theInternational Rice Research Institute. DNA was isolated fromyoung leaves by the cetyltrimethylammonium bromide (CTAB)method (Murray and Thompson, 1980). For PCR screening, weused the 19 unique primer pairs previously designed for confir-mation of active genes or fragments of CENH3-binding sites in theCen8 region of ‘Nipponbare’ (Table S8). A special primer pair foramplification of CentO satellite DNA was also used (Wu et al.,2002). PCR was performed in a final volume of 20 ll, comprising2 ll of 10 · buffer, 2 ll of MgCl2 (25 mM), 2 ll of dNTPs (25 mM),0.2 ll of Taq polymerase (5 U ll)1), 0.3 ll of primer DNA (10 lM

each), 4 ll of 50% glycerol, 5 ll of template DNA (5 ng ll)1) and

Rapid-evolution rice centromeres 13

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 14: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

4.5 ll of water, with a PTC-225 DNA Engine Tetrad Cycler (Bio-Rad,http://www.bio-rad.com) under 35 cycles at 94�C for 30 sec, 55�Cfor 30 sec, and 72�C for 1 min. PCR products were examined byelectrophoresis in 1% agarose gel.

Accession numbers

The ‘Kasalath’ BAC sequences obtained in this study were submit-ted to DDBJ under the accession numbers AP009077–AP009094. Acontiguous sequence of the 2.25-Mb ‘Kasalath’ Cen8, as well asdetailed results of gene annotation in each BAC sequence, can bedownloaded at http://rgp.dna.affrc.go.jp/E/Publicdata.html

ACKNOWLEDGEMENTS

We thank Nori Kurata (NIG), and Makoto Kawase, Duncan A.Vaughan, Kaworu Ebana and Takeshi Izawa (NIAS), for providingthe plant material. We also thank Masahiro Nakagahra for adviceand encouragement. This work was supported by grants from theMinistry of Agriculture, Forestry and Fisheries of Japan (GS1101,GS1201 and GD2007).

SUPPORTING INFORMATION

Additional Supporting Information may be found in the onlineversion of this article:Figure S1. Families and subfamilies of rice long terminal repeat(LTR)-retrotransposons within the phylogenetic tree.Figure S2. CentO monomers within the phylogenetic tree.Figure S3. PCR screening of Cen8 sequences in the genus Oryza.Figure S4. Distribution patterns of long terminal repeat (LTR)-retrotransposons in the rice Cen8 regions.Figure S5. Model for the formation of the CRR block by interelementunequal recombination.Table S1. Sequence statistics of BAC clones covering the ‘Kasalath’Cen8.Table S2. Identification of long terminal repeat (LTR)-retrotranspo-son elements in ‘Kasalath’ Cen8.Table S3. Identification of long terminal repeat (LTR)-retrotranspo-son elements in ‘Nipponbare’ Cen8.Table S4. Shared and unshared long terminal repeat (LTR)-retro-transposon insertions between the two Cen8 regions of ‘Kasalath’and ‘Nipponbare’.Table S5. Duplicated and conserved genomic segments estimatedin or between the CentO block 1 of ‘Kasalath’ and ‘Nipponbare’.Table S6. Primer sequences used in the preparation of DNA probesfor FISH analysis.Table S7. Names and accessions of cultivated and wild-rice speciesused in this study.Table S8. Sequences and position of PCR primers in the virtualcontig of ‘Nipponbare’ Cen8.Please note: Wiley-Blackwell are not responsible for the content orfunctionality of any supporting materials supplied by the authors.Any queries (other than missing material) should be directed to thecorresponding author for the article.

REFERENCES

Altschul, S., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and

Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of

protein database search programs. Nucleic Acids Res. 25, 3389–3402.

Arabidopsis Genome Initiative. (2000) Analysis of the genome sequence of

the flowering plant Arabidopsis thaliana. Nature, 408, 796–815.

Bennetzen, J.L., Ma, J. and Devos, K.M. (2005) Mechanisms of recent genome

size variation in flowering plants. Ann. Bot. 95, 127–132.

Bray, N., Dubchak, I. and Pachter, L. (2003) AVID: a global alignment program.

Genome Res. 13, 97–102.

Caicedo, A.L., Williamson, S.H., Hernandez, R.D. et al. (2007) Genome-wide

patterns of nucleotide polymorphism in domesticated rice. PLoS Genet., 3,

e163, doi:10.1371/ journal.pgen.0030163.

Cheng, Z., Dong, F., Langdon, T., Ouyang, S., Buell, C.R., Gu, M., Blattner, F.R.

and Jiang, J. (2002) Functional rice centromeres are marked by a satellite

repeat and a centromere-specific retrotransposon. Plant Cell, 14, 1691–

1704.

Copenhaver, G.P. (2003) Using Arabidopsis to understand centromere func-

tion: progress and prospects. Chromosome Res. 11, 255–262.

Copenhaver, G.P., Nickel, K., Kuromori, T. et al. (1999) Genetic definition and

sequence analysis of Arabidopsis centromeres. Science, 286, 2468–2474.

Devos, K.M., Brown, J.K.M. and Bennetzen, J.L. (2002) Genome size reduction

through illegitimate recombination counteracts genome expansion in

Arabidopsis. Genome Res. 12, 1075–1079.

Fransz, P.F., Armstrong, S., Alonso-Blanco, C., Fischer, T.C., Torres-Ruiz, R.A.

and Jones, J. (1998) Cytogenetics for the model system Arabidopsis tha-

liana. Plant J. 13, 867–876.

Fukui, K.-N., Suzuki, G., Lagudah, E.S., Rahman, S., Appels, R., Yamamoto, M.

and Mukai, Y. (2001) Physical arrangement of retrotransposon-related re-

peats in centromeric regions of wheat. Plant Cell Physiol. 42, 189–196.

Gao, D., Gill, N., Kim, H.-R. et al. (2009) A lineage-specific centromere

retrotransposon in Oryza brachyantha. Plant J. doi:10.1111/ j.1365-313X.

2009.04005.x.

Hall, S.E., Kettler, G. and Preuss, D. (2003) Centromere satellites from

Arabidopsis populations: maintenance of conserved and variable domains.

Genome Res. 13, 195–205.

Hall, S.E., Kettler, G. and Preuss, D. (2006) Dynamic evolution at pericentro-

meres. Genome Res. 16, 355–364.

Hansen, C.N. and Heslop-Harrison, J.S. (2004) Sequences and phylogenies of

plant pararetroviruses, viruses and transposable elements. Adv. Bot. Res.

41, 165–193.

Harushima, Y., Yano, M., Shomura, A. et al. (1998) A high-density rice genetic

linkage map with 2,275 markers using a single F2 population. Genetics, 148,

479–494.

Henikoff, S. and Dalal, Y. (2005) Centromeric chromatin: what makes it

unique? Curr. Opin. Genet. Dev. 15, 177–184.

Henikoff, S., Ahmad, K. and Malik, H.S. (2001) The centromere paradox: stable

inheritance with rapidly evolving DNA. Science, 293, 1098–1102.

Heslop-Harrison, J.S. (2000) Comparative genome organization in plants:

from sequence and markers to chromatin and chromosomes. Plant Cell, 12,

617–635.

Heslop-Harrison, J.S., Murata, M., Ogura, Y., Schwarzacher, T. and Motoyoshi,

F. (1999) Polymorphisms and genomic organization of repetitive DNA from

centromeric regions of Arabidopsis chromosomes. Plant Cell, 11, 31–42.

Heslop-Harrison, J.S., Brandes, A. and Schwarzacher, T. (2003) Tandemly

repeated DNA sequences and centromeric chromosomal regions of Ara-

bidopsis species. Chromosome Res. 11, 241–253.

Hosouchi, T., Kumekawa, N., Tsuruoka, H. and Kotani, H. (2002) Physical map-

based sizes of the centromeric regions of Arabidopsis thaliana chromo-

somes 1, 2, and 3. DNA Res. 9, 117–121.

International Human Genome Sequencing Consortium. (2004) Finishing the

euchromatic sequence of the human genome. Nature, 431, 931–945.

International Rice Genome Sequencing Project. (2005) The map-based

sequence of the rice genome. Nature, 436, 793–800.

Jiang, J., Birchler, J.A., Parrott, W.A. and Dawe, R.K. (2003) A molecular view

of plant centromeres. Trends Plant Sci. 8, 570–575.

Kamm, A., Galasso, I., Schmidt, T. and Heslop-Harrison, J.S. (1995) Analysis

of a repetitive DNA family from Arabidopsis arenosa and relationships

between Arabidopsis species. Plant Mol. Biol. 27, 853–862.

Katagiri, S., Wu, J., Ito, Y., Karasawa, W., Shibata, M., Kanamori, H., Katayose,

Y., Namiki, N., Matsumoto, T. and Sasaki, T. (2004) End sequencing and

chromosomal in silico mapping of BAC clones derived from an indica rice

variety, Kasalath. Breed. Sci. 54, 273–279.

Khush, G.S. (1997) Origin, dispersal, cultivation and variation of rice. Plant

Mol. Biol. 35, 25–34.

Kimura, M. (1980) A simple method for estimating evolutionary rates of base

substitutions through comparative studies of nucleotide sequences. J. Mol.

Evol. 16, 111–120.

Kojima, Y., Ebana, K., Fukuoka, S., Nagamine, T. and Kawase, M. (2005)

Development of an RFLP-based rice diversity research set of germplasm.

Breed. Sci. 55, 431–440.

14 Jianzhong Wu et al.

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x

Page 15: The Plant Journal Comparative analysis of complete ...tianlab.genetics.ac.cn/TianLab_Publications/pdf/2009_plant_journal_kasalath.pdf · Comparative analysis of complete orthologous

Kumekawa, N., Hosouchi, T., Tsuruoka, H. and Kotani, H. (2000) The size and

sequence organization of the centromeric region of Arabidopsis thaliana

chromosome 5. DNA Res. 7, 315–321.

Kumekawa, N., Hosouchi, T., Tsuruoka, H. and Kotani, H. (2001) The size and

sequence organization of the centromeric region of Arabidopsis thaliana

chromosome 4. DNA Res. 8, 285–290.

Lamb, J.C., Theuri, J. and Birchler, J.A. (2004) What’s in a centromere? Gen-

ome Biol. 5, 239.

Lee, H.-R., Zhang, W., Langdon, T., Jin, W., Yan, H., Cheng, Z. and Jiang, J.

(2005) Chromatin immunoprecipitation cloning reveals rapid evolutionary

patterns of centromeric DNA in Oryza species. Proc. Natl Acad. Sci. USA,

102, 11793–11798.

Lee, H.-R., Neumann, P., Macas, J. and Jiang, J. (2006) Transcription and

evolutionary dynamics of the centromeric satellite repeat CentO in rice.

Mol. Biol. Evol. 23, 2505–2520.

Ma, J. and Bennetzen, J.L. (2004) Rapid recent growth and divergence of rice

nuclear genomes. Proc. Natl Acad. Sci. USA, 101, 12404–12410.

Ma, J. and Bennetzen, J.L. (2006) Recombination, rearrangement, reshuffling,

and divergence in a centromeric region of rice. Proc. Natl Acad. Sci. USA,

103, 383–388.

Ma, J. and Jackson, S.A. (2006) Retrotransposon accumulation and satellite

amplification mediated by segmental duplication facilitate centromere

expansion in rice. Genome Res. 16, 251–259.

Ma, J., Devos, K.M. and Bennetzen, J.L. (2004) Analyses of LTR-retrotrans-

poson structures reveal recent and rapid genomic DNA loss in rice. Gen-

ome Res. 14, 860–869.

Ma, J., Wing, R.A., Bennetzen, J.L. and Jackson, S.A. (2007) Plant centromere

organization: a dynamic structure with conserved functions. Trends Genet.

23, 134–139.

Malik, H.S. and Bayes, J.J. (2006) Genetic conflicts during meiosis and the

evolution of origins of centromere complexity. Biochem. Soc. Trans. 34,

569–573.

Malik, H.S. and Henikoff, S. (2002) Conflict begets complexity: the evolution of

centromeres. Curr. Opin. Genet. Dev. 12, 711–718.

McCarthy, E.M., Liu, J., Gao, L. and McDonald, J.F. (2002) Long terminal

repeat retrotransposons of Oryza sativa. Genome Biol. 3, research,

0053.1–0053.11.

Murray, M.G. and Thompson, W.F. (1980) Rapid isolation of high molecular

weight plant DNA. Nucleic Acids Res. 8, 4321–4325.

Nagaki, K., Cheng, Z., Ouyang, S., Talbert, P.B., Kim, M., Jones, K.M.,

Henikoff, S., Buell, C.R. and Jiang, J. (2004) Sequencing of a rice centro-

mere uncovers active genes. Nat. Genet. 36, 138–145.

Nagaki, K., Neumann, P., Zhang, D., Ouyang, S., Buell, C.R., Cheng, Z. and

Jiang, J. (2005) Structure, divergence, and distribution of the CRR centro-

meric retrotransposon family in rice. Mol. Biol. Evol. 22, 845–855.

Nei, M. and Gojobori, T. (1986) Simple methods for estimating the numbers of

synonymous and nonsynonymous nucleotide substitutions. Mol. Biol.

Evol. 3, 418–426.

Peterson-Burch, B.D., Nettleton, D. and Voytas, D.F. (2004) Genomic

neighborhoods for Arabidopsis retrotransposons: a role for

targeted integration in the distribution of the Metaviridae. Genome Biol.

5, R78.

Rice Annotation Project. (2008) The Rice Annotation Project Database (RAP-

DB): 2008 update. Nucleic Acids Res. 36, D1028–D1033.

Sasaki, T., Matsumoto, T., Yamamoto, K. et al. (2002) The genome sequence

and structure of rice chromosome 1. Nature, 420, 312–316.

Schueler, M., Higgins, A., Rudd, N.K., Gustashaw, K. and Willard, H.F. (2001)

Genomic and genetic definition of a functional human centromere. Sci-

ence, 294, 109–115.

Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C.,

Haussler, D. and Miller, W. (2003) Human-mouse alignments with BLASTZ.

Genome Res. 13, 103–107.

Smith, G.P. (1976) Evolution of repeated DNA sequences by unequal cross-

over. Science, 191, 528–535.

Talbert, P.B., Masuelli, R., Tyagi, A.P., Comai, L. and Henikoff, S. (2002) Cen-

tromeric localization and adaptive evolution of an Arabidopsis histone H3

variant. Plant Cell, 14, 1053–1066.

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G.

(1997) The CLUSTAL_X windows interface: flexible strategies for multiple

sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25,

4876–4882.

Wu, J., Maehara, T., Shimokawa, T. et al. (2002) A comprehensive rice tran-

script map containing 6591 expressed sequence tag sites. Plant Cell, 14,

525–535.

Wu, J., Yamagata, H., Hayashi-Tsugane, M. et al. (2004) Composition and

structure of the centromeric region of rice chromosome 8. Plant Cell, 16,

967–976.

Yu, J., Wang, J., Lin, W. et al. (2005) The genomes of Oryza sativa: a history of

duplications. PLoS Biol. 3, e38, doi: 10.1371/journal.pbio.0030038.

Zhang, Y., Huang, Y., Zhang, L. et al. (2004) Structural features of the rice

chromosome 4 centromere. Nucl Acids Res. 32, 2023–2030.

Zhong, C.X., Marshall, J.B., Topp, C., Mroczek, R., Kato, A., Nagaki, K.,

Birchler, J.A., Jiang, J. and Dawe, R.K. (2002) Centromeric retroelements

and satellites interact with maize kinetochore protein CENH3. Plant Cell, 14,

2825–2836.

Zhu, Q. and Ge, S. (2005) Phylogenetic relationships among A-genome spe-

cies of the genus Oryza revealed by intron sequences of four nuclear genes.

New Phytol. 167, 249–267.

Rapid-evolution rice centromeres 15

ª 2009 The AuthorsJournal compilation ª 2009 Blackwell Publishing Ltd, The Plant Journal, (2009), doi: 10.1111/j.1365-313X.2009.04002.x