Populus transcriptomics – from noise to biology - DiVA Portal

62
Populus transcriptomics – from noise to biology ANDREAS SJÖDIN Akademisk avhandling som med vederbörligt tillstånd av rektorsämbetet vid Umeå universitet för avläggande av Teknologie doktorsexamen i Växters cell- och molekylärbiologi, framläggs till offentligt försvar i Arbetslivsinstitutets hörsal, Umeå Universitet, fredagen den 30 november 2007 klockan 10.00 Avhandlingen kommer att försvaras på engelska. Fakultetsopponent: Dr. Jan Lohmann Max Planck Institute for Developmental Biology Tübingen, Germany Umeå Plant Science Centre Department of Plant Physiology Umeå university Sweden 2007

Transcript of Populus transcriptomics – from noise to biology - DiVA Portal

Populus transcriptomics – from noise to biology

ANDREAS SJÖDIN

Akademisk avhandling

som med vederbörligt tillstånd av rektorsämbetet vid Umeå universitet föravläggande av Teknologie doktorsexamen i Växters cell- och molekylärbiologi,framläggs till offentligt försvar i Arbetslivsinstitutets hörsal, Umeå Universitet,

fredagen den 30 november 2007 klockan 10.00Avhandlingen kommer att försvaras på engelska.

Fakultetsopponent:Dr. Jan Lohmann

Max Planck Institute for Developmental BiologyTübingen, Germany

Umeå Plant Science CentreDepartment of Plant Physiology

Umeå universitySweden 2007

Populus transcriptomics – from noise to biology

Andreas SjödinUmeå Plant Science Centre, Department of Plant Physiology, Umeå University

ISBN: 978-91-7264-442-7

AbstractDNA microarray analysis today is not just generation of high-throughput data, muchmore attention is paid to the subsequent efficient handling of the generated information.In this thesis, a pipeline to generate, store and analyse Populus transcriptional datais presented A public Populus microarray database - UPSC–BASE - was developedto gather and store transcriptomic data. In addition, several tools were provided tofacilitate microarray analysis without requirements for expert-level knowledge. The aimhas been to streamline the workflow from raw data through to biological interpretation.

Differentiating noise from valuable biological information is one of the challengesin DNA microarray analysis. Studying gene regulation in free-growing aspen treesrepresents a complex analysis scenario as the trees are exposed to, and interacting with,the environment to a much higher extent than under highly controlled conditions in thegreenhouse. This work shows that, by using multivariate statistics and experimentalplanning, it is possible to follow and compare gene expression in leaves from multiplegrowing seasons, and draw valuable conclusions about gene expression from field-grownsamples.

The biological information in UPSC-BASE is intended to be a valuable transcriptomicresource also for the wider plant community. The database provides information fromalmost a hundred different experiments, spanning different developmental stages, tissuetypes, abiotic and biotic stresses and mutants. The information can potentially be usedfor both cross-experiment analysis and for comparisons against other plants, such asArabidopsis or rice. As a demonstration of this, microarray experiments performed onPopulus leaves were merged and genes preferentially expressed in leaves were organisedin to regulons of co-regulated genes. Those regulons were used to define genes ofimportance in leaf development in Populus. Taken together, the work presented in thisthesis provides tools and knowledge for large-scale transcriptional studies and the storedgene expression information has been proven to be a valuable information resource forin-depth studies about gene regulation.

Keywords: Populus, microarray, transcriptomics, UPSC-BASE, leaf development

Populus transcriptomics – from noise to biology

ANDREAS SJÖDIN

Umeå Plant Science CentreDepartment of Plant Physiology

Umeå universitySweden 2007

© Andreas Sjödin, 2007

Umeå Plant Science CentreDepartment of Plant PhysiologyUmeå UniversitySE-901 87 UmeåSweden

ISBN: 978-91-7264-442-7

Cover by: Sofie Sjödin, [email protected] by: Arkitektkopia, Umeå, 2007

v

Abstract

DNA microarray analysis today is not just generation of high-throughputdata, much more attention is paid to the subsequent efficient handling of thegenerated information. In this thesis, a pipeline to generate, store and anal-yse Populus transcriptional data is presented A public Populus microarraydatabase - UPSC–BASE - was developed to gather and store transcriptomicdata. In addition, several tools were provided to facilitate microarray anal-ysis without requirements for expert-level knowledge. The aim has been tostreamline the workflow from raw data through to biological interpretation.

Differentiating noise from valuable biological information is one of the chal-lenges in DNA microarray analysis. Studying gene regulation in free-growingaspen trees represents a complex analysis scenario as the trees are exposedto, and interacting with, the environment to a much higher extent than underhighly controlled conditions in the greenhouse. This work shows that, by us-ing multivariate statistics and experimental planning, it is possible to followand compare gene expression in leaves from multiple growing seasons, anddraw valuable conclusions about gene expression from field-grown samples.

The biological information in UPSC-BASE is intended to be a valuable tran-scriptomic resource also for the wider plant community. The database pro-vides information from almost a hundred different experiments, spanning dif-ferent developmental stages, tissue types, abiotic and biotic stresses and mu-tants. The information can potentially be used for both cross-experimentanalysis and for comparisons against other plants, such as Arabidopsis orrice. As a demonstration of this, microarray experiments performed on Pop-ulus leaves were merged and genes preferentially expressed in leaves wereorganised in to regulons of co-regulated genes. Those regulons were used todefine genes of importance in leaf development in Populus. Taken together,the work presented in this thesis provides tools and knowledge for large-scaletranscriptional studies and the stored gene expression information has beenproven to be a valuable information resource for in-depth studies about generegulation.

vi

Sammanfattning

Mikromatriser handlar numera inte bara om att alstra genuttrycksdata isnabb takt, utan det är minst lika viktigt att effektivt ta hand om infor-mationen efteråt. I den här avhandlingen presenteras ett arbetsflöde för attmäta, lagra och analysera genuttrycksdata i asp och poppel (Populus spp.).En Populus mikromatrisdatabas - UPSC–BASE - tillgänglig för alla intres-serade, utvecklades i syfte att samla in och lagra genuttrycksdata. Flertaletanalysverktyg gjordes samtidigt tillgängliga, för att möjliggöra ett smidigtarbetsflöde från rådata till biologiska slutsatser.

En av de stora utmaningarna i analys av mikromatriser är att kunna särskiljabruset från värdefull biologisk information. Att studera träd som växer ut-omhus är komplext eftersom de interagerar med omgivningen i mycket störreutsträckning än vad som är fallet i växthusets kontrollerade miljö. Det härarbetet visar att det är möjligt, med hjälp av avancerad statistik och god för-söksplanering, att följa och jämföra genuttrycket i blad från aspar utomhusunder flera år för att dra värdefulla slutsatser om geners reglering.

Den lagrade biologiska informationen i UPSC-BASE är avsedd att vara envärdefull tillgång för växtfältet i stort. I databasen finns nästan hundra olikaexperiment som innefattar alltifrån unga blad till ved, insektsangrepp, kylaoch torka samt studier av genmodifierade växter. Informationen kan användasbåde för jämförande studier inom olika aspförsök, men också för att jämföramed andra växter. För att illustrera möjligheterna, studerades och gruppe-rades gener i blad baserat på hur de uppträder över alla dessa experiment.Dessa grupperingar användes sedan för att definiera gener som är viktiga ibladutvecklingen. Sammanfattningsvis ger arbetet som presenteras i dennaavhandling tillgång till verktyg och kunskap för storskaliga studier av genut-tryck och den lagrade informationen har bevisats vara en värdefull tillgångför mer ingående studier av geners reglering.

LIST OF PAPERS vii

List of papers

The thesis is based on the following papers, which will be referred to in the text bytheir Roman numerals.

I. Bylesjö M, Eriksson D, Sjödin A, Sjöstrom M, Jansson S, Antti H, Trygg J(2005) MASQOT: a method for cDNA microarray spot quality control. BMCBioinformatics 6: 250

II. Bylesjö M, Sjödin A, Eriksson D, Antti H, Moritz T, Jansson S, Trygg J(2006) MASQOT-GUI: spot quality assessment for the two-channel microarrayplatform. Bioinformatics 22: 2554-2555

III. Bylesjö M, Eriksson D, Sjödin A, Jansson S, Moritz M and Trygg J (2007) Or-thogonal Projections to Latent Structures as a strategy for cDNA microarraydata normalization. BMC Bioinformatics 8: 207

IV. Sjödin A, Bylesjö M, Skogström O, Eriksson D, Nilsson P, Rydén P, Jansson Sand Karlsson J (2006) UPSC-BASE – Populus transcriptomics online. PlantJournal 48: 806-817.

V. Sjödin A, Wissel K, Bylesjö M, Trygg J and Jansson S: Global expressionprofiling in leaves of free-growing aspen (manuscript)

VI. Sjödin A, Street NR, Bylesjö M, Trygg J, Gustafsson P and Jansson S: A cross-species transcriptomics approach to identify genes involved in leaf development(manuscript)

viii

Also published by the auhor but not printed in the thesis

VII. Andersson A, Keskitalo J, Sjödin A, Bhalerao R, Sterky F, Wissel K, TandreK, Aspeborg H, Moyle R, Ohmiya Y, Brunner A, Gustafsson P, KarlssonJ, Lundeberg J, Nilsson O, Sandberg G, Strauss S, Sundberg B, Uhlen M,Jansson S, Nilsson P (2004) A transcriptional timetable of autumn senescence.Genome Biology 5: R24

VIII. Taylor G, Street NR, Tricker PJ, Sjödin A, Graham L, Skogström O, Calfapi-etra C, Scarascia-Mugnozza G, Jansson S (2005) The transcriptome of Populusin elevated CO2. New Phytologist 167: 143-154

IX. Druart N, Rodriguez-Buey M, Barron-Gafford G, Sjödin A, Bhalerao R, HurryV (2006) Molecular targets of elevated [CO2] in leaves and stems of Populusdeltoides: implications for future tree growth and carbon sequestration. Func-tional Plant Biology 33: 121-131

X. Klimmek F, Sjödin A, Noutsos C, Leister D, Jansson S (2006) Abundantlyand rarely expressed Lhc protein genes exhibit distinct regulation patterns inplants. Plant Physiology 140: 793-804

XI. García-Lorenzo M, Sjödin A, Jansson S, Funk C (2006) Protease gene familiesin Populus and Arabidopsis. BMC Plant Biology 6:30

XII. Street NR, Skogstrom O, Sjödin A, Tucker J, Rodriguez-Acosta M, Nilsson P,Jansson S, Taylor G (2006) The genetics and genomics of the drought responsein Populus. Plant Journal 48: 321-341

XIII. Druart N, Johansson A, Baba K, Schrader J, Sjödin A, Bhalerao R, ResmanL, Trygg J, Moritz T and Bhalerao R (2007) Environmental and hormonalregulation of the activity-dormancy cycle in the cambial meristem involvesstage specific modulation of transcriptional and metabolic networks. PlantJournal 50: 557-573

XIV. Lee H, Bae E-K, Park S-Y, Sjödin A, Lee J-S, Noh E-W and Jansson S (2007)Growth-Phase-Dependent Gene Expression Profiling of Poplar (Populus albax P. tremula var. glandulosa) Suspension Cells. Physiologia Plantarum 131:599-613

ABBREVIATIONS ix

Abbreviations

ANOVA Analysis of varianceBASE BioArray Software EnvironmentDNA Deoxyribonucleic acidEST Expressed Sequence TagFv/Fm Optimum quantum yieldGEO Gene Expression OmnibusGMOD Generic Model Organism DatabaseGO Gene OntologyGSEA Gene Set Enrichment AnalysisJGI Joint Genome InstituteKEGG Kyoto encyclopaedia of Genes and GenomesLIMS Laboratory information management systemLOWESS Locally weighted scatterplot smoothingMASQOT MicroArray Spot Quality cOnTrolMeSH Medical Subject HeadingMIAME Minimum Information About a Microarray ExperimentMIPS Munich Information Center for Protein SequencesmiRNA microRNAMPSS Massive Parallel Signature SequencingmRNA messenger RNANCBI National Center for Biotechnology InformationOPLS Orthogonal projections to latent structuresPCR Polymerase Chain ReactionPI Principal InvestigatorPICME Platform for Integrated Clone ManagementPOP1 UPSC Populus microarray generation 1POP2 UPSC Populus microarray generation 2PLS-DA Partial Least Squares Discriminant AnalysisQTL Quantitative trait locusRLS Restricted Linear ScalingRNA Ribonucleic AcidRT-PCR Reverse Transcription Polymerase Chain ReactionSAGE Serial Analysis of Gene ExpressionUBC University of British ColombiaUPSC Umeå Plant Science CentreVSN Variance Stabilisation Normalisation

Contents

List of papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAbbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Contents xi

1 Background 31.1 Gene regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Microarray technology . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Alternative transcriptional technologies . . . . . . . . . . . . 71.2.2 Data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Storing and sharing the information . . . . . . . . . . . . . . 9

1.3 Populus as model organism . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Why study mature free-growing Populus trees . . . . . . . . . 11

2 Results achieved 132.1 Building the infrastructure for transcriptomics . . . . . . . . . . . . 13

2.1.1 Comparison of different Populus microarray platforms . . . . 142.1.2 UPSC-BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.3 Improving the hybridisation quality of microarray slides . . . 172.1.4 Image analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.5 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.1.6 In silico reference designs . . . . . . . . . . . . . . . . . . . . 21

2.2 Turning UPSC-BASE into a valuable resource . . . . . . . . . . . . . 222.2.1 Integrating transcriptomics with physiology . . . . . . . . . . 222.2.2 Trees in future climate . . . . . . . . . . . . . . . . . . . . . . 272.2.3 Applying stress treatments . . . . . . . . . . . . . . . . . . . 282.2.4 How to use stored information . . . . . . . . . . . . . . . . . 29

2.3 Reconstruction of gene regulatory networks in Populus . . . . . . . . 30

3 Conclusions 333.1 The next generation Populus genome browser . . . . . . . . . . . . . 343.2 Future perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Acknowledgement 37

Bibliography 39

xi

Preface

A dramatic shift has been seen in biology during the last decade. Modern biologyhas entered a genomic era with new genomes being sequenced at an ever-increasingspeed. High-throughput technologies allow the simultaneous measurement of thou-sands of characteristics in biological systems such as proteins (Perez and Nolan,2002), metabolites (Lu et al., 2006) and RNA (Schena et al., 1995; Lu et al., 2005).The amount of public data available has open possibilities to study thousands ofgenes at the same time. This drift from a univariate view of biology to systemsbiology demands totally different competences from researchers. It is no longerenough to be skilled at lab bench work only; basic knowledge in bioinformatics anddata handling are needed. Modern biology can’t continue to evolve without thesupport of solid bioinformatic infrastructures.

This thesis aims to increase knowledge of how to handle and analyse microarrays in aconcise, streamlined procedure. There are no golden rules for microarrays analysisand no method fits all circumstances. The research presented in the followingchapters should be seen as recommendations rather than golden rules. It coversa journey of setting up an infrastructure for Populus transcriptomics. It is veryimportant for researchers to trust public repositories and to achieve data with highquality. Focus in this work has therefore been to standardise and automate thework flow from hybridisation through to candidate gene list generation. The finalpart of the thesis shows how to use generated data to draw biological conclusionsand to define regulons to understand gene regulation machinery in Populus leaves.

1

Chapter 1

Background

What is happening inside a Populus leaf, from bud burst in the spring until itfinally changes colour from green to yellow or red in the autumn? Populus trees andother multi-cellular organisms have evolved complex gene regulatory mechanisms inorder to coordinate signals between cells. Plants cannot move as rapidly as animals.The fastest way is to spread seeds but this is still far from rapid, and plants aretherefore more dependent on regulation to cope with changes in environmentalconditions. Until recently it was only possible to study a handful of genes orproteins at the same time. This can be compared to sampling random balls froman urn and drawing conclusion about a small, spotlighted area without knowing therelation to the rest of the system. The drift that has occurred over the last decadethat has seen a shift ffrom such gene-wise approaches to large-scale studies is aneffect of technology and computer development and improvement. Measuring generegulation is not the problem any longer and new unknown connection betweenpreviously separated pathways has been discovered. The main problem remains tounderstand the biology behind the building blocks and the connections within thesignalling system. The expression of many genes is regulated after transcription(i.e. by microRNAs) and increased mRNA concentration doesn’t directly implyincreased protein levels. Nevertheless, mRNA levels can be quantitatively measuredby transcriptional methods, such as Northern blots or microarrays and even ifmRNA levels are not always informative beyond the level of the transcriptome, inmany cases they will be and mRNA profiling indicates which proteins have thepotential to be translated within cells.

1.1 Gene regulation

The central dogma, which describes the route from DNA to RNA to protein, formsthe basis of molecular biology. See Figure 1.1 for a simplified overview of involvedsteps. The genetic information of a cell is stored in the DNA consisting of thou-sands of genes. Each gene serves as a blueprint of how to build a protein. Thenucleus contains the DNA, which is organised into chromosomes. This informationis identical for all cells of an organism and the DNA must be precisely and accu-rately replicated before a cell divides. For protein-coding genes, the information is

3

4 CHAPTER 1. BACKGROUND

Exon

Transcriptionfactor

DNA

RNA

Protein

Translation

Transcription

microRNA

Degradation

Protein-DNAinteraction

Protein-proteininteraction

Figure 1.1. The central dogma of eukaryotic regulatory mechanism. Modified afterWittkopp (2007)

transcribed into RNA. The non-coding parts are removed and transported out tothe cytosol where the RNA finally is translated into proteins.

Gene regulation is a research field progressing fast to understand how the regula-tion systems are connected. The discovery of a group of regulatory elements, tran-scription factors (TF), can explain some parts but a lot of questions still remainsunanswered. It has been shown that expression of mRNAs not always reflects theprotein levels (Anderson and Seilhamer, 1997). The recent discovery of microRNA(miRNA) has given some evidence that it might be a major gene regulator (Bartel,2004; Zhang et al., 2006, 2007; Kim and Nam, 2006). It might, in addition to turnover rates, be one of the missing links explaining the inconsistency between mRNAand protein expression levels. It has been shown in other organisms that they con-tain about 1-5% miRNA genes (Lai et al., 2003; Lim et al., 2003) compared to thetotal protein-coding genes. miRNA might also be responsible to control 30% of theprotein-coding genes (Lewis et al., 2005).

1.2 Microarray technology

Quantification of large numbers of messenger RNA (mRNA) transcripts using mi-croarray technology provides detailed insight into cellular processes involved in theregulation of gene expression. Large-scale generation of gene regulation informationwas not possible until recently. Array technology (larger dot blots) was already inuse in mid 1980s (Augenlicht et al., 1984, 1987) but didn’t become important untilthe mid 1990s, when cDNA microarrays emerged (DeRisi et al., 1996; Lockhartet al., 1996). Pioneering work on plant microarrays was presented a decade ago

1.2. MICROARRAY TECHNOLOGY 5

using Arabidopsis thaliana (Schena et al., 1995) followed some years later by workin Populus spp. (hereafter Populus) (Hertzberg et al., 2001). For information anddiscussion about the laboratory part of microarray technology see Figure 2 in PaperIV (Sjodin et al., 2006). Although the laboratory part stages of microarray tech-nology are important, the major work-effort lies in the subsequent data processingand analysis.

Microarray experiments are extremely powerful and provide information on a genome-wide scale. However, they are also very complex and time-consuming compared toother biological techniques. The outcome is large and complicated data sets re-quiring large endeavours to analyse and validate. To cope with those problemsthe microarray community agreed to comply with standard rules to simplify datasharing and comparison of experiments. Minimum Information About a Microar-ray Experiment (MIAME) (Brazma et al., 2001) stated what information and datashould always be accessible for published microarray experiments. A schematicoverview of the different steps is shown in Figure 1.2 with indication of the maincoverage for each paper. For a more extensive description see Figure 2 and 3 inPaper IV.

The following data needs to be stored and provided for all microarray publications:

1. Raw data for each hybridisation

2. Normalised data

3. Sample annotation

4. Experimental design

5. Annotation of the array elements

6. Laboratory and data processing protocols

The raw data for each hybridisation includes both raw images from the scans andextracted raw intensity data. The raw intensity data is filtered and normalised toremove systematic noise. Normalised data is what is interesting to distribute andstore as it is what is used for later analysis. Information describing sample anno-tation and design of experiments are very important to allow others to understandand interpret the biological results. The remaining points are more infrastructure-orientated to describe how the experiment was performed. However, without properannotation information it is impossible to draw valid biological conclusions.

The first version of the MIAME standard had mainly the medical field in mindand a special updated version for plants was later published to better cope with

6 CHAPTER 1. BACKGROUND

Microarray slide

Raw intensity data

Scanning

Image analyis

Image

Corrected intensity data

Restricted linear scaling

Normalisation

Normalised data

Experimental design Protocols

Biological conclusion

Quality controlClustering Hypothesis testingVisualisation

Biological samples

Paper I.Paper II.

Paper III.

Paper IV.

Paper V.-VI.Paper VII.-XIV.

Figure 1.2. Schematic overview of methods used in analysis of microarrays. It isindicated what part is covered in the different papers.

1.2. MICROARRAY TECHNOLOGY 7

the special situation for plants (Zimmermann et al., 2006). When are microarraysan appropriate method to use? Microarray experiments shouldn’t be performedwithout spending time understanding the technique and to design appropriate ex-periments. Beginners to the use of microarray technology are usually recommendedto consult an experienced expert before beginning to prepare samples. There aremany examples of new users wasting both time and money trying to do microar-rays with inappropriate biological samples. Microarrays are nice screening tools forhypothesis generation for later gene-wise experimental approaches.

1.2.1 Alternative transcriptional technologies

Several alternatives to cDNA microarray have emerged for analysing gene expres-sion. The most closely related is spotted oligonucleotides, which have the advantageof allowing more gene-specific measurements in comparison to EST based platforms.Affymetrix has patented a special method allowing probes to be synthesised in situ.Those slides are supposed to be more reproducible between different laboratories,although they are much more expensive.

Different sequencing approaches exist as alternatives to closed microarray plat-forms. Quantitative real-time reverse-transcription PCR (qRT-PCR) (Heid et al.,1996) is usually used for smaller gene expression comparisons. For larger scale stud-ies, Expressed Sequence Tag (EST) library construction has been widely used forgene expression profiling (Bhalerao et al., 2003; Sterky et al., 2004; Kohler et al.,2003; Brosche et al., 2005; Ranjan et al., 2004; Ralph et al., 2006). The high costof sequencing has led to other alternatives, such as Serial Analysis of Gene Expres-sion (SAGE) (Velculescu et al., 1995) and Massive Parallel Signature Sequencing(MPSS)(Brenner et al., 2000; Reinartz et al., 2002), which both aim to maximisethe amount of information that can be extracted per sequence run. Recently, tech-niques based on de novo sequencing (454 and Alexa ™) have allowed increasedhigh-throughput profiling at a feasible cost. From here on the term microarray willrefer only to spotted cDNA microarrays unless stated otherwise.

1.2.2 Data analysis

Analysis of microarrays has been covered extensively in the literature (for recentreview see Allison et al. (2006)).The process can be summarized in five key com-ponents: design, pre-processing, inference, classification and validation. The ex-perimental design is the basis and affects efficiency downstream in experiments.Biological replication and sample size need to be considered as well as if the sam-ples should be pooled or not. The pre-processing stages including image analysis,

8 CHAPTER 1. BACKGROUND

normalisation and transformation is an important part of this thesis. These stepshave a huge impact on the final conclusion from microarray experiments.

Classification of samples can be divided in supervised where a prior informationabout the samples are used and unsupervised where no known information is pro-vided. Unsupervised clustering using hierarchical clustering (Eisen et al., 1998)has been widely used in microarray experiments. This approach provides a goodoverview of the data structure, but supervised methods are, in general, more pow-erful for classification And using known information about systems improves resultgeneration and interpretation. Most analysis method initially generate a, often long,list of genes and such lists are hard to summarise in an objective way. Applying sta-tistical tests with tools looking for over-represented gene categories in classificationsystems (Hosack et al., 2003; Banerjee and Zhang, 2002), such as Gene Ontology(GO) (Ashburner et al., 2000), Munich Information Center for Protein Sequences(MIPS) (Mewes et al., 1999; Schoof et al., 2002) and Kyoto encyclopaedia of genesand genomes (KEGG) (Kanehisa and Goto, 2000) will outperform most individualsskills in interpreting gene functions. Checking only gene presence in results doesn’ttake expression values into consideration. Lately Gene Set Enrichment Analysis(GSEA) has emerged as a complement using both classification system and expres-sion information to find consistent, but small changes in categories.

Validation has been regarded as an important part of microarray results. Howeverit still remains unclear how validation should be achieved. To what level shouldthe results obtained in validation agree with the initial results? Both samplingvariability and measurement errors negatively influence validation. Most validationstudies apply Northern blots or real-time PCR to confirm a handful of positiveresults with validation is rarely performed on genes not showing interesting patterns(negative controls) as well as those showing interesting patterns (positive control).

Commercial software for microarray analysis, such as Genespring, Genepix, hasplayed an important role in many microarray labs to provide simple and extensivetools. The drawback of commercial software, in addition to expensive license fees,is the time lag it takes for companies to include new analysis methods presentedin the literature and the black box nature of implemented analysis methods. Thebioinformatic fields have a tradition of home-made software. In most cases thefree software can provide the same or more extensive set of analysis tools in user-friendly graphical interfaces (Dudoit et al., 2003). The TIGR multi expressionviewer (Saeed et al., 2003) provides many clustering and classification algorithmsin an user-friendly software. The Bioconductor suite (Gentleman et al., 2004),mainly written in the statistical language R (Ihaka and Gentleman, 1996), is theoption for more advanced users and it offers an extensive collection of methods andalgorithms. The suite was originally presented in 2002 consisting of 20 packages formicroarray analysis. The project has now grown to >100 packages for technologies,such as microarray, proteomics, mass spectrometry, SAGE, cell-based assays and

1.3. POPULUS AS MODEL ORGANISM 9

genetics with additional infrastructure tools for database connection and documentproduction.

1.2.3 Storing and sharing the information

Computer-aided tools are required for handling and analysing the tremendousamounts of data produced by microarray experiments. Mainly two types of databaseare needed, an annotation database containing information about biological mate-rial on the slide, and a storage and analysis database of expression data. Forthe later it can be divided into databases storing raw data, databases containingdata analysis tools and finally databases containing curated, analysed data. Pub-lic repositories such as Arrayexpress (Brazma et al., 2003, 2006), Gene expressionomnibus (GEO) (Barrett et al., 2007; Edgar et al., 2002), CIBEX (Ikeo et al.,2003) are designed to store MIAME-compliant microarray raw and processed data.Submission of microarray data to one of the two databases is now demanded bymany scientific journals. Both databases are mainly created to provide informationabout experiments and how to reproduce the results. However no tools are providedfor performing the actual analysis within the database system. BioArray SoftwareEnvironment (BASE) (Saal et al., 2002; Troein et al., 2006) and similar systemsprovide both storiage and analysis possibilities. Curated databases with processeddata have gained increased attention lately mainly due to their user-friendly ap-proach. Connecting to public databases, such as Genevestigator (Zimmermann etal., 2004; Grennan, 2006; Zimmermann et al., 2005), AthaMap (Galuschka et al.,2007; Bulow et al., 2006; Steffens et al., 2005, 2004), PathoPlant (Bulow et al.,2007), The Botany Array Resource (Winter et al., 2007; Toufighi et al., 2005) al-lows extraction of information for genes of interest. In this way, information aboutgene expression can be accessed without needing to perform additional laboratorywork.

1.3 Populus as model organism

Around 35-40 plants have been partly sequenced until now. The two first completedwere thale cress (Arabidopsis thaliana) (Arabidopsis Genome Initiative, 2000) andrice (Oryza sativa, japonica cultivar-group) (International Rice Genome SequencingProject, 2005). Another variety of rice, (Yu et al., 2005, 2002), grape (Vitis vinifera)(The French Italian Public Consortium for Grapevine Genome Characterization,2007) and Populus trichocarpa (Tuskan et al., 2006) have also been sequenced andassembled but are currently awaiting final genome presentation. The draft genomesequence of Populus trichocarpa was presented in 2006. Populus plays an impor-tant role as a forest tree, producing timber, pulp, paper, and energy (Tuskan and

10 CHAPTER 1. BACKGROUND

Walsh, 2001), as well as being a keystone species in many ecosystems. The genomesequence, together with complementary genomic resources (Segerman et al., 2007;Andersson et al., 2004; Sterky et al., 2004), high growth rate, clonal propagationand the possibility to transform (Bradshaw et al., 2000; Taylor, 2002; Wullschlegeret al., 2002; Brunner et al., 2004; Jansson and Douglas, 2007) have resulted inpoplar emerging as the model system for tree research.

1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005

Num

ber o

f pub

licat

ions

050

100

150

Populus genome draft

UPSC genome program

Figure 1.3. Number of Populus publications in Pubmed 1976-2005.

The number of publications in Pubmed classified as Populus in Medical SubjectHeading (MeSH) field was extracted using Hubmed (Eaton, 2006). The figure 1.3shows the increasing number of Populus publications in Pubmed. The records havegrown from less than ten to hundreds papers per year. The UPSC Populus genomeprogramme was started around the first surge of publications and a second majorincrease in publication came around the presentation of the Populus genome draft.In summary, Populus has emerged as a plant model system where Arabidopsis isnot well suited.

1.3. POPULUS AS MODEL ORGANISM 11

1.3.1 Why study mature free-growing Populus trees

Studies to depict how environmental and developmental factors influence gene ex-pression can addressed in several ways. The following list summarises three maingroups:

1. Monitor effects of mutations.

2. Expose plants to highly controlled conditions and monitor effects.

3. Monitor effects of exposing plants to natural, uncontrolled conditions.

The first two strategies are used frequently while the third has been much less fre-quently employed. The main reason for this are the considerable challenges facedin performing this kind of approach. Weather and biotic interactions may have amajor effect and bias gene regulation of interest. The result is highly dependent onselection of the measured parameters, but a proper experimental design can sepa-rate developmental factors from environmental factors influencing gene expression.Multivariate statistics has been applied on gene expression to study plants grownunder uncontrolled, highly variable, conditions with good outcome (Wissel et al.,2003).

Controlled conditions in the greenhouse represent a highly over-simplified repre-sentation of the complex conditions that exist in the field. Plants in the field areexposed to different biotic (fungi, herbivore etc.) and abiotic (rain, wind, nutrientdeficiency etc.) conditions. Kulheim et al. (2002) shows nicely the power of fieldexperiments to reveal phenotypic differences that are undetectable in controlledconditions. The natural variation in environmental conditions offers unique ex-perimental possibilities that are impossible to mimic in greenhouses. In the field,trees are in their natural environment and not in artificial binary light-dark andtemperature condition.

Another parameter to consider is the difference between juvenile and mature plants.For instance, in medicine, research is performed on individuals matching requiredcriteria. It is usually not a good idea to study cancer treatments in children fora cancer where most cases occur in more elderly persons. In tree biology mostresearch is performed on very young plants. As a result, studying mature plants intheir natural conditions will provide an important complement to the more usualresearch carried out in greenhouses on young, small plants.

Chapter 2

Results achieved

This thesis covers both generation and biological interpretation of microarray data.Robust laboratory protocols are essential for the generation of good-quality datafor the final interpretation of results. The initial laboratory stages involved in theuse of microarraysare not the focus here, with the main achievements of this thesisregarding the analysis stages, from information extraction in properly designed mi-croarray experiments through to drawing final biological conclusions. This chapteris divided in three sections; method development, biological studies and large-scaledata mining. The first section describes the different Populus microarray platformsavailable, data extraction from microarray slides (image analysis), removal of sys-tematic noise (normalisation) and storage of data. The biological study starts withthe integration of genomic tools and physiology on free-growing Populus tremula(aspen) trees, which is the most novel part of the biological studies in this thesis.The other microarray experiments presented are more generally described to give ashort introduction of Populus leaf data generated in this project. The data is usedin the third section to extract information from multiple experiments that wouldnot have been visible in individual experiments. For the third section, UPSC-BASEwas screened for leaf microarray experiments and groups of co-regulated genes (reg-ulons) in Populus leaves were defined to find candidate genes important for leafdevelopment. Second, public data from several different microarray platforms weremerged for large-scale stress comparisons.

2.1 Building the infrastructure for transcriptomics

Huge amounts of data are generated when laboratory and analysis of transcrip-tomic techniques have been set up and are running. For a detailed overview of thelaboratory and analysis pipeline, see Paper IV (Sjodin et al., 2006). Each singlemicroarray slide is digitalised and stored in large raw TIFF-image files. Those filesneed to be quantified to extract raw data results. In total, the final product will,depending on the number of scans for each slide and the resolution of the capturedimage, need large amounts of storage space. It is hard to keep track of all fileswithout a proper infrastructure and storing the information as files on individualpersonal computers should be avoided. For larger groups of scientists working on

13

14 CHAPTER 2. RESULTS ACHIEVED

the same microarray platform, a well designed database system is the only possiblesolution. In addition to storing data, the database will also encourage informationsharing and provide a safe backup system.

2.1.1 Comparison of different Populus microarray platforms

The different microarray platforms available for use in Populus were comparedbefore setting up the large-scale, cross-platform comparison (meta-analysis). Allpublic EST sequences in GenBank (Benson et al., 2007) were downloaded and nu-cleotide BLAST searches were performed against the poplar genome. Out of 45555predicted genes (gene models) in Populus (Tuskan et al., 2006) the calculation gave24687 genes that were supported by at least 1 EST. Not surprisingly, the most repre-sented gene is Ribulose bisphosphate carboxylase, small chain (eugene3.01230087),which was represented by 2287 ESTs.

Number of spots Gene modelsUPSC POP1 13872 8916UPSC POP2 25648 13882Helsinki 9K 4 5536Helsinki 10K 4 5536PICME3 4 8004PICME4 4 8547UBC 4 9409

Table 2.1. Comparison of different Populus microarray platforms. The columnsdescribes number of clones spotted and gene models represented on different PopuluscDNA microarrays.

Helsinki 10K PICME4 UBCUPSC POP2 3834 5830 6471Helsinki 10K 5160 3071PICME4 4705

Table 2.2. Number of gene models common to different Populus cDNA microarrays

Table 2.1 shows coverage of gene models by the different Populus microarray plat-forms and Table 2.2 show gene models in common. The UPSC POP2 microarraycontains all clones from the POP1 microarray. The two Helsinki microarrays con-tains exactly the same of ESTs as the 10K version of the microarray, with the10K version only containing more control clones. The majority of the EST clonesfrom the Helsinki microarray are also present on the PICME microarrays. In abroader perspective, 1817 gene models were represented on the latest versions of allof the platforms. POP2 is the microarray platform with largest gene model support

2.1. BUILDING THE INFRASTRUCTURE FOR TRANSCRIPTOMICS 15

(30%). In conclusion, cDNA microarrays only cover a small subset of the predictedgene models in Populus. As a complement to the cheaper cDNA microarrays, Nim-bleGen and Affymetrix has developed a full genome array based on Populus genomerelease version 1.0. (Ramirez-Carvajal et al., 2007)

2.1.2 UPSC-BASE

BioArray Software Environment (BASE) (Saal et al., 2002; Troein et al., 2006) wasselected as the UPSC microarray database because it has large community supportin the academic field and is being developed at a national university. The systemis build upon a comprehensive open-source MIAME-compliant web-based databasesolution. The database uses multiple-layer architectures with a database storingthe data in the background. Scripts are responsible of sending information betweenthe server and the viewable layer in the web browser.

Modifications and improvements of the database

UPSC-BASE is built upon the BASE version 1.2.10 release. BASE was the bestavailable option for large-scale storage of microarray databut several modificationsfocusing on improving the interface to make it more user-friendly and fit the specialneeds for the microarray community at UPSC were implemented. A short summaryof the most important changes is presented here.

The access level has been modified to better fit a large research centre with severalvisitors. The database categorises each user into one of four different levels.

1. Administrative users

2. Principal Investigators (PIs)

3. Scientists (Postdoc and PhD students)

4. Visiting scientists

Storing information in databases can seem frightening where limiting access to datais important and a common misunderstanding is that data is accessible for otherspersons. Restrictions are used in databases to limit access to functions and data forcertain types of users. Specific user levels allow the user interface to be as simpleas possible.

16 CHAPTER 2. RESULTS ACHIEVED

The second major modification apparent for the user is that experiments are re-quested and not created as in the original BASE version. This allows the adminis-trator together with a steering group to have total control over experiments addedto the database. Only PI users hve the right to request experiments and are re-sponsible for adding the data in the database once the data has been generated.Information about produced microarray slides is imported directly by the admin-istrator and individual slides are identified by unique bar codes. This modificationprovides a tight connection between the production, ordering and storing of mi-croarray slides. The result is that end-users don’t need to define slide layouts andslide generation as long correct bar codes are provided to the database.

The third user-friendly modification is the batch-wise import function. In theoriginal installation, all data, such as TIFF images and original raw data files, hadto be imported one by one. This was a very time-consuming procedure for largeexperiments and import to UPSC-BASE can instead be achieved by first uploadingall data files to the ftp-server and then linking the data using a spreadsheet file. Thebatch-import is flexible and can be extended to any number of slides and handlemultiple scans for each slide. If the data files are named consistently, R-code forgenerating ’skeleton’ spreadsheet files can be used, which minimises the amount ofwork needed to input all MIAME-required fields of the experiment.

In addition to those major improvements several minor modifications have beenmade to streamline to the daily work. The Laboratory Information ManagementSystem (LIMS) is extended to allow more advanced external analysis tools. Fre-quent users can set up analysis tools to run in sequence which saves a lot of time forroutine analysis. The alternative would be waiting for each analysis step to finishbefore starting the next. The automatic analysis procedure is often sufficient fornon-experienced microarray users to study expression patterns of favourite genes.All experiments in UPSC-BASE are automatically transformed using RestrictedLinear Scaling (RLS) (Ryden et al., 2006), and subsequently normalised using thestep-wise normalisation method (Wilson et al., 2003). This data is denoted ’auto-finalized’ in the database and can be compared between experiments. This alsosimplifies extended searches performed between experiments.

Finally, extended flexibility for data sharing has been implemented to promote moreopen communities to improve the results. The permission structure was modified toallow permission levels handling multiple groups throughout the database system.In the installation, the user can be added as a second (additional) group to theexperiment with the correct access privileges, thus encourage collaboration betweendifferent research groups.

The many modifications have unfortunately made it impossible to upgrade thedatabase to newer none-modified versions of BASE. This is a very common problemfor scientist. Provided software are often not adequate in the standard version and

2.1. BUILDING THE INFRASTRUCTURE FOR TRANSCRIPTOMICS 17

modifications are performed to fit special needs. Unfortunately, these modificationsmake it impossible to use subsequent updates from the community (Swertz andJansen, 2007). A lot of these problems could be avoided if more software usedmodule-based extension possibilities.

2.1.3 Improving the hybridisation quality of microarray slides

Sample collection, extraction of RNA and cDNA synthesis all contribute sourcesof error. Hybridisation is the first step in the microarray pipeline introducing biaspurely from non-biological sources. Hybridisation was originally carried out onmicroscope slides with spotted cDNA using cover slips. The hybridisation solutionwas not mobile and spatial patterns in the hybridisation pattern were normal. Usinga framed cover slip (which raises the cover slip from the array surface slightly)significantly increased the quality of microarray slide hybridisations. In addition toquality problems in hybridisations, the time consumption for manual hybridisationand washing was a bottle neck. Only a handful of microarrays could be performedper day, even for a trained person. Automatic systems, such as the Automated SlideProcessor (Amersham, Sweden) advanced the microarray technology from labour-intensive to high-throughput. Reproducible results can be achieved with minorhuman involvement resulting in better quality data being generated, which allowsfor easier biological interpretations.

2.1.4 Image analysis

The focus in the microarray field has been more on statistical analysis rather thanimproving the quality of microarray slides. Image analysis is a very crucial step inthe work flow (Yang et al., 2001, 2002a) and problems will have a large impact onlater biological interpretations. For instance the glass slide is frequently contami-nated during the preparation steps, see Figure 2.1. This generates spots that donot reflect the true expression level of the gene.

The obvious question is ’What makes a good spot good?’. Typically the qualityassessment of spots is performed manually by visual inspection. Considering that anormal sized cDNA microarray contains 25000 spots and a decent experiment has20 slides results in more than half a million spots to inspect. The main problem isto decide if a spot looks strange due to technical issues or rather due to biologicaleffects. It is often also almost impossible to differentiate between the two possiblesources of noise andall incorrect measurements will affect the final analysis resultin the normalisation and hypothesis tests. Spots affected by technical issues willthus affect other spots during analysis and the biological interpretation might be

18 CHAPTER 2. RESULTS ACHIEVED

A. B.

C. D.

Figure 2.1. Examples of different types of microarray spots. A. Good spot B.Intensity distribution issue C. Morphological issue D. Background issue

misleading or even incorrect. Existing image analysis software for microarrays,such as GenePix, ImaGene, QuantArray and ScanAlyze, are all highly dependenton user interaction. Alternative, more advanced, semi-automatic software suchUCSF Spot (Jain et al., 2002) and WaveRead (Bidaut et al., 2006) can handlebatch wise image analysis but do not include an advanced routine for spot qualityclassification. Bluefuse (BlueGene) offers a commercial software that offers user-friendly, high-throughput image analysis although with the limitation it isnt free.

MASQOT

MicroArray Spot Quality cOnTrol (MASQOT), described more extensively in Pa-per I (Bylesjo et al., 2005) was developed to streamline image analysis. Briefly,the image analysis is divided into spot characterisation, spot quality assessment,and spot classification. The main aim was to mimic human, visual inspection andprovide significant time-savings for the biologist. A training dataset was based on5 slides from the Populus second generation microarray (POP2). Roughly 80000spots (split in half into training and test set) were manually inspected by 3 microar-ray users and the quality of each spot was assigned as bad (flagged) or not bad (notflagged). Multivariate methods were applied to find spot characterisations impor-tant for classification of bad spots and finally class discrimination of the classeswere found using PLS-DA. For the training set containing 38627 spots 98.6% ofthe spots are correctly predicted and in the independent test set containing 39421spots 98,1% of the spots are correctly predicted. This proves that the methodsrobustness in finding and differentiating good spots from bad spots.

2.1. BUILDING THE INFRASTRUCTURE FOR TRANSCRIPTOMICS 19

A robust methodology for quality control is nice but is useless if nobody knows howto use the method. Just as most people playing a computer game do not carehowthe game is working to produce the graphics, microarray users don’t want to studycomplex algorithms in order to classify spots as good or bad. The solution forMASQOT was to implemented the algorithm in an application called MASQOT-GUI, presented in Paper II (Bylesjo et al., 2006). The interface is designed to makeit easy for the beginner to get started using the software. The focus has beenon providing a tool with batch-mode possibilities using as few options as possiblein default settings. The experienced user still has other more detailed options tohandle specific problematic cases.

Restricted linear scaling

Microarray slides are digitalised and converted to 16-bit TIFF images in high-resolution scanners. This limits the intensity range to 216 = 65536 discrete steps,which is a problem for comparing lowly and highly expressed genes. Physical mi-croarray slides are routinely scanned multiple times to extend the linear rangebefore importing into UPSC-BASE and the multiple scans are then merged usinga regression method, Restricted Linear Scaling (RLS) (Ryden et al., 2006). Theimplemented RLS is a slightly modified version compared to the algorithm pre-sented by Dudley et al. (2002) handling the problems associated with missing andsaturated signals, which occur in most types of microarray experiments. The im-plemented method has a limitation of not using all data values from the slides.Only data points within the linear range are used. The constrained model (CM)(Bengtsson et al., 2004) uses all information from separate scans and might be acandidate for implementation in BASE in the future as a complement to RLS.

2.1.5 Normalisation

Microarrays are an expensive technology and replication was not always routinelyused. When microarrays were first introduced, several single-array methods for in-ferring differential expression were developed and they were sometimes used with-out applying normalisation. The simplest method only considered ratios belowor above a pre-determined cut-off, often referred as the fold-change method. Anobvious shortcoming of this approach is the use of an arbitrary cut-off without con-sidering the variability in the data. Ratio-based expression values from microarraysdon’t have constant variation within the intensity range or between different slides.Lowly expressed spots tend to have higher variation due to random error. Threesingle-slide experiment methods, usually referred as Chen’s (Chen et al., 1997),Churchill’s and Newton’s, aimed to handle those problems. Quite rapidly those

20 CHAPTER 2. RESULTS ACHIEVED

single slide methods become obsolete as the experiment sizes grew. The two later,not even published. At the same time, need of proper normalisation methodsstarted to recieve more attention in the literature and the underlying technicalproblems of microarray technology became apparent. A first approach was to ap-ply median normalisation to centre the expression ratios on the overall medianexpression. However this method had the severe limitation of assuming linearitybetween logged expression ratios and total intensity levels. Different levels of dyeincorporation leads to unbalanced channel intensities and a non-linear relationshipappears for low intensity spots. The reason is mainly due to the fact that theyare very close to the detection limit. The most commonly used approach to handlethose problem is to use LOWESS normalisation (Yang et al., 2002b) that wasorig-inally used on global slide area, but later applied on the print-tip groups to handlespatial problem in a discrete approach (Smyth, 2004). Continuous methods using2D sliding windows (Dudoit and Yang, 2002), optimised local intensity-dependentnormalisation (Futschik and Crompton, 2004, 2005) and neural nets normalisation(Tarca et al., 2005) emerged to take care the spatial effects that were often seen forcDNA microarray hybridisations. All the described methods have several param-eters affecting the outcome of normalisation. It is hard to choose settings to gainoptimal results. Step-wise normalisation performs multiple normalisation methodsin sequence and only keeps steps improving the normalisation results. In this wayparameter modification is no longer an issue and focus can be on interpreting thedata. For a more general overview of normalisation methods see Smyth and Speed(2003).

OPLS microarray normalisation

Normalisation using Orthogonal Projections to Latent Structures (OPLS) is in con-trast to other normalisation methods mentioned so far as it is not ratio-based orperformed within single microarray slides. Approaches to incorporate between-slide normalisation for microarrays, such as spline normalisation (Smyth and Speed,2003), have been presented in the literature but not really implemented in stan-dard analysis pipelines. Using a two-step approach of ratio-based normalisationfollowed by between-slide normalisation might be problematic. The initial within-slide normalisation might introduce slide dependency that was not visible beforethe normalisation step. The Variance Stabilisation Normalisation (VSN) method(Huber et al., 2002, 2003) takes a different approach in performing channel-wiselinear and non-linear transformation to reduce dependence between mean valueand variance. None of those methods takes the sample and experimental designinto consideration. The idea presented in Paper III is to use information on theexperimental design in the normalisation process, similar to normalisation based onANalysis Of VAriance (ANOVA) (Kerr et al., 2000). The integration of experimen-tal design in normalisation makes non-correlated variation separable from variation

2.1. BUILDING THE INFRASTRUCTURE FOR TRANSCRIPTOMICS 21

of interest (Bylesjo et al., 2007). Unwanted correlation might result from dye, slideand other technical effects. Using the OPLS normalisation method will reduce thenoise and make data interpretation clearer.

2.1.6 In silico reference designs

Pair-wise relationship in cDNA microarrays makes experimental design more com-plex than for many other fields. Most early studies hybridised a common referencein one channel of the microarrays to circumvent the problem. The advantage withthat strategy is mainly easier data interpretation. Figure 2.2 and Table 2.3 showsthe number of biological samples and microarray slides needed for loop and com-mon reference designs (Churchill, 2002). In this particular small example, eachbiological sample is measured twice as many times for the same number of slidesand cost. The bottle necks in microarray experiments are often either economyor amount of biological material. In both cases it is rational to skip the commonreference design and instead perform loop or more advanced experimental set-ups.(Vinciotti et al., 2005)

X

Y Z

XYZ

Reference(X + Y + Z)

A.

B.

Figure 2.2. Graphical representation of two commonly used experimental designsfor microarray experiments. A. Pooled reference design containing three differentsamples compared to a mixture of them. B. A loop design containing three samplescompared against each other.

Common reference LoopUnits 4 4Slides 6 6Measurements 2 4

Table 2.3. Summary of number of slides and biological material needed for twotypes of microarray experimental designs, common reference and loop design.

22 CHAPTER 2. RESULTS ACHIEVED

The problem of interpretation in loop designs might be circumvented by usingtransformation to in silico references (Diaz et al., 2003). Paper IV (Sjodin et al.,2006) shows how two experiments using different designs in the end will give thesame results. The published in silico algorithm used all samples used in microar-ray experiments. An extended algorithm able to cope with subsets of the samplescollection was developed to handle special cases. The method was initially used inDruart et al. (2007) to improve the interpretability of the results. The experimentwas performed using a pooled reference design. Unfortunately, the pooled referencewas not an equal distribution of the biological samples used in the study. The solu-tion was to remove the reference sample from the analysis and focus on biologicalalyrelevant samples by constructing an in silico pool as a reference point. The secondproblem with the experimental set-up was the huge difference in expression levelsthrough the time-series. Small changes within shorter time periods were maskedby dramatic changes in gene expression over the complete series. The time-serieswas divided into smaller biologicalaly relevant time spans by applying the in silicoreference method. Important smaller changes within each time period could thenbe visualised.

2.2 Turning UPSC-BASE into a valuable resource

A chain is only as strong as the weakest link: A well designed database mightbe very impressive in its architecture, but totally worthless without interestingcontent. The following section will describe some of the work to transform theempty, useless database into something valuable for plant scientists. UPSC-BASEwas, in the beginning, made to gather microarray results produced from the UPSCPopulus microarray. The main contribution is still in-house results, but its generalarchitecture has allowed the content to be extended to other Populus microarrayplatforms and other plant species, such as rice and Arabidopsis. UPSC-BASEcontains at present several published experiments with full open-access to raw dataand additional non-published experiments searchable for an external users. Thisallows extraction of microarray data about interesting genes valuable for planningother experimental approaches.

2.2.1 Integrating transcriptomics with physiology

Transcriptional studies of free-growing Populus trees are the laboratory contributionin this thesis. A key difference between Populus and Arabidopsis is the lifespan andthe possibilities to study the plant longer in natural conditions. A perennial plant,such as Populus, has a high probability of being exposed to all kinds of biotic andabiotic stresses and conditions. Instead of growing plants in the greenhouse under

2.2. TURNING UPSC-BASE INTO A VALUABLE RESOURCE 23

different, artificial, constant conditions, the trees were studied in the field withgenomic tools. This approach is much less common and has several limitationsand complications. However it is a good complement to studies performed in thegreenhouse and it will give novel information about the responses caused by differentnatural conditions.

Figure 2.3. The Populus tree T2 over the growing season. A. Leaf in early June(Phase Ia) B. Fully expanded leaf in mid June (Phase Ic) C. Mid season leaf (PhaseII) D. Senescent leaf (Phase III) E-H Overview of the tree corresponding to the sameleaf ages as A-D.

The studies of free-growing Populus tremula trees started in autumn 1999, as a pre-study of the Populus senescence project (Bhalerao et al., 2003; Andersson et al.,2004; Keskitalo et al., 2005). Two Populustrees, T1 and T2 were included in variousperiods of the study. During the first year, samples were collected twice per week.From 2003 to 2006 samples were collected on a daily basis during the growing season,which in Umeå is from late May to early October. See Figure 2.3 for an overview ofthe growing season in northern Sweden. The growing season of the leaf can roughlybe divided in three main phases. Phase I for leaf development during the firstmonth, Phase II when mature leaves are mainly producing energy and Phase III forleaf senescence. In addition to leaf sample collection, several measurements havebeen performed for leaf length, chlorophyll and florescence. Finally an overviewpicture was taken of the tree for weather and leaf-state documentation.

24 CHAPTER 2. RESULTS ACHIEVED

Influence of weather on gene regulation in Populus leaves

Temperature has a major impact on processes in living organisms. The temperaturehas been tracked at a weather station at Umeå University campus. Temperaturesum was calculated on daily mean temperatures higher than +1◦C and this cut-offwas based on experiments done on willow trees (Hannerz, 1999). Between 1999 and2006 the temperature sum showed quite a consistent pattern with some exceptions,see Figure 2.4. The largest difference was seen in 2002, where the spring wasunusually warm in late April and early May. Smaller difference in temperaturewere also observed in 1999 and 2004. In 2004 the temperature showed a peak inearly May before it become colder again. The opposite pattern was true for 1999when it took a long time for the temperature sum to reach normal levels.

020

040

060

080

0Cu

mul

ativ

e Te

mpe

ratu

re

19992000200120022003200420052006

January February March April JuneMay

Figure 2.4. Temperature sum recorded at Umeå University during 1999-2006.Most of the years are overlayed with exception of 2002 and smaller time periods of1999 and 2004.

Weather parameters were used for both seasonal (Wissel et al., 2003; Sjodin et al.,2007b)(Paper V) and autumn senescence projects (Andersson et al., 2004; Bhaleraoet al., 2003; Keskitalo et al., 2005)(Paper VII) to keep track of environmental condi-tions. An extensive weather experiment containing ten samples from two differentyears is presented in Paper V. The samples were chosen to have large variation inweather parameters and to be within the Phase II. The modelling results showedtemperature to be the major impact on gene expression. This could resulted forseveral reasons, such as too small a dataset to allow separation of noise from biolog-ical information. To refine results, the time should been spanning a longer periodto capture more difference in weather conditions and ultimately from even moreyears. Wissel et al. (2003) shows the validity of the method for gene-wise modellingand this appraoch should work at a larger scale, such as using microarrays.

2.2. TURNING UPSC-BASE INTO A VALUABLE RESOURCE 25

Physiological measurements

Bringing genomic tools out in the field highlights several issues to be solved. Forexample, electric power needs to be provided during the hours it takes to measurephotosynthesis parameters for multiple leaves. Also the weather has an influenceon how efficiently the measurements can be made. It is tricky handling electronicequipment during windy or rainy conditions. All those kind of problems had to besolved before starting field studies. An overview of the experimental procedure andequipment can be seen in Figure 2.5.

Figure 2.5. A. Physiological measurements and sample collection performed atlower branches of the tree. B. Dark adaptation on small leaf area for photosynthesismeasurements. C. Experimental set up of photosynthesis measurements includinglaptop and camera. D. Dark adaptation of complete leaves. E. Digitalised images ofphotosynthesis parameters overlaid on the leaf area.

Briefly, leaf length was measured using a ruler and area from a digital photographevery day through the growing season. At the same time chlorophyll content wasestimated using a non-destructive method (van den Berg and Perkins, 2004). Theresults are partly presented in Figure 2 in Paper V (Sjodin et al., 2007b). Inaddition, during one complete growing season three leaves were dark-adapted for

26 CHAPTER 2. RESULTS ACHIEVED

30 minutes and analysed using a FluorCam fluorometer. The fluorometer usesa range of flashing lights to measure photosynthesis. One of the most commonparameters used in fluorescence is the Fv/Fm ratio. A leaf is kept in the dark toprocess all residual energy, indicated by small amount s of fluorescence. Then theleaf is flashed with bright light and the fluorescence signal will increase to maximumlevel (Fm) as photo system II cannot use all of this energy. The difference betweenthe maximum and minimum fluorescence is the variable fluorescence, Fv. The ratiobetween Fm and Fv indicates the proportion of the maximum possible fluorescence,which was used for photosynthesis. In normal cases the efficiency is about 80%. Aspresented in Figure 2.6, Optimum quantum yield (Fv/Fm) increased steadily duringthe two first weeks and was then stable until the start of senescence. Chlorophyllcontent on the other hand took almost one month to reach maximum level andbreakdown started earlier.

0

0,2

0,4

0,6

0,8

1

Fv /

Fm

Fv / Fm

0

5

10

15

20

25

30

Chl

orop

hyll

Con

tent

Inde

x (C

CI)CCI

June July August SeptemberMay

Figure 2.6. Optimum quantum yield (Fv/Fm) and chlorophyll content in Populusleaves over the growning season of 2004.

An interesting observation was made in 2004 when bud burst was almost two weeksearlier than normal due to warm weather (described above), see Figure 2.4. Un-usually cold weather then stopped leaf development and leaves were finally maturearound the same time as normal.

Connecting physiology and genomics

Measurements in the field give information about the physiological state of the treeduring the season. A genomic approach using microarrays was applied and initiallypresented in Keskitalo et al. (2005); Bhalerao et al. (2003); Andersson et al. (2004).Paper V is based on same procedure but it is extended to the complete growing

2.2. TURNING UPSC-BASE INTO A VALUABLE RESOURCE 27

season and not only the senescence phase. Weather data were first analysed withmultivariate methods to find eleven dates with similar weather to minimise theenvironmental effect. Before the data were analysed a second generation of thePopulus microarray became available. Another microarray experiment was per-formed to cover more genes and to achieve higher time resolution from multipleyears. The results showed a developmental trend over the season with high cor-relation to temperature sum for the first month. To confirm the effect of weatheron gene regulation, ten days from two different years were chosen. The resultsshowed that approximately four times more genes are involved in leaf developmentcompared to responses to weather differences, which gives an indication of howPopulus gene regulation is distributed between developmental processes and stressresponses.

2.2.2 Trees in future climate

Several publications during the last years have reported an increase in global at-mospheric CO2 concentration. Knowing the effect of CO2 concentration is of highrelevance because it might have a major impacts on the response to the environ-ment. It has been shown that higher [CO2] leads to faster leaf development andhigher biomass (Taylor et al., 2003). The underlying reasons for those effects werenot known and two parallel projects, POPFACE (Calfapietra et al., 2003) and Bio-sphere (Barron-Gafford et al., 2005; Murthy et al., 2005; Zabel et al., 1999) aimedat finding out how gene expression is regulated in response to altered atmosphereic[CO2]. The POPFACE experiment used two levels of CO2 concentration; ambient(350-380ppm) and elevated (550ppm) with the elevated level being selected as itrepresented the predicted concentration for 2050 (Paper VIII) (Taylor et al., 2005)while Biosphere had three different levels; 400, 800 and 1200ppm (Paper IX) (Dru-art et al., 2006). Both studies report minor responses to elevated [CO2] at thetranscriptome level. Leaves are probably only growing faster and longer and there-fore expand more but no major difference can been seen in mature leaves. Reporteddifference in gene expression might even be an artefact of different physiological leafages instead of direct treatment effect (Taylor 2005). It seems strange to continuestudying mature leaves in elevated CO2 when the main effect is increased biomass.Leaf developmental series would be of more interest if more leaf transcriptionalstudies should be performed. Druart et al. (2006) found larger differences in woodsamples and further experiments in elevated CO2 should focus on increased biomassin wood tissues.

28 CHAPTER 2. RESULTS ACHIEVED

2.2.3 Applying stress treatments

Microarrays are especially suitable for studying large regulation differences in stresstreatments. Early transcription profiling studies on Arabidopsis were performed forseveral common stresses, such as dehydration, salinity and cold (Oono et al., 2003;Seki et al., 2002; Kreps et al., 2002). Populus shows, in comparison to Arabidopsis,much greater genome sequence difference between different genotypes (Marron etal., 2002; Tschaplinski et al., 1998; Harvey and Driessche, 1997), which makes ita good model for studying the influence of different genetic architectures on in-duced stress responses. Street et al. (2006) (Paper XIII) contribute with valuabletranscriptional information about contrasting drought responses in two Populusgenotypes. Changes in gene expression are often usually heavily induced by stresstreatments and only some are directly involved in adjustment while the rest maybe secondary consequences. Many microarray studies don’t take variability withinor between species into consideration. However, the approaches, such as in Streetet al. (2006), of combining genetic mapping and expression tools will ultimately beneeded to identify the underlying genetic control of the primary response.

Trees growing in natural conditions are exposed for various other living organisms.The EST library of senescent leaves and bark contains large amount of sequencesnot originating from Populus (Segerman et al., 2007). The presence of non PopulusESTs is easy to explain as a result of infection in senescent leaves of the free-growing tree used for construction of the EST library. However, it is harder toknow why bark from the greenhouse contains high amounts of foreign ESTs. It isimportant to understand the similarity and difference between responses derivedfrom different conditions. Controlled studies have been performed examining virusinfection (Smith et al., 2004), fungi (Rinaldi et al., 2007) and herbivore attack(Ralph et al., 2006; Babst et al., 2005) to characterise different stress responses.Those individual datasets create a solid base to study stress.

All stress experiments in UPSC-BASE and external published datasets were mergedin order to provide a better understanding of the relation between different stresstreatments (Figure 2.7). Briefly, individual clones on the microarrays were aggre-gated based on gene model information and the intercept between all used mi-croarray platforms were analysed. The included Populus microarray platforms arethe above decribed cDNA microarrays and the Nimblegen full Populus genome mi-croarray (Tuskan et al., 2006). The CO2 microarray experiment (Taylor et al.,2005) is not a stress treatment in the traditional meaning but was included in thecomparison to relate it to abiotic and biotic stresses. As expected, CO2 treatmentdid not show any major gene expression changes compared to other stresses. Onefungi treatment and ozone are influencing gene expression in similar ways. Anotherinteresting observation is the location of senescence between cold treatment andmosaic virus infection. The reason might be that senescence is affected by multiple

2.2. TURNING UPSC-BASE INTO A VALUABLE RESOURCE 29

abiotic and biotic factors or that certain stresses induce a response more akin tosenescence. Acute drought samples showed a similar behaviour to the senescenceand mosaic virus samples, which is far away from the other, less severe drought sam-ples of the chronic drought treatment. This stress comparison is mainly includedto show the power of merging data from multiple experiments and platforms toprovide an overview that is not seen in the separate experiments alone. Individualgenes responsible for the separation between stress treatments may have a hugeeconomical potential for industry. Understanding how trees respond to differentstresses is necessary to inform improvement of trees to produce new varieties thatare stress-resistant and biomass productive.

−20 −10 0 10 20

−20

−10

010

20

t[1]

t[2]

Acute drought

Senescence

Cold

Mosaic Virus

Fungi

[CO2]

Fungi

OzoneChronic drought

Figure 2.7. Overview of relationship between stress treatments in Populus. Ozonand UV seem to group together in the same way as cold groups together with senes-cence and acute drought with mosaic virus infection.

2.2.4 How to use stored information

The information in PopulusDB (Segerman et al., 2007; Sterky et al., 2004) is mainlyused in other projects to produce digital Northerns (Ewing et al., 1999). ESTlibraries constructed with non-subtractive techniques allow comparisons betweenabundance of ESTs in different tissues and stresses. Interesting results from mi-croarray experiments can be evaluated against the EST distribution as a means of

30 CHAPTER 2. RESULTS ACHIEVED

confirmation. This method has been used to classify and validate results in woodcell death (Moreau et al., 2005), drought stress (Paper XIII) (Street et al., 2006),protease distribution (Paper XII) (Garcia-Lorenzo et al., 2006), seasonal gene regu-lation (Paper V) (Sjodin et al., 2007b) and for leaf specific genes (Paper VI) (Sjodinet al., 2007a). It provides an external validation without needing to undertake anyextra laboratory work. It provides some weight to the results, even if it providesonly circumstantial validation support.

Another way to maximize the produced data is to re-analyse with different viewsand questions. Results from the first microarray experiment from UPSC, a woodformation series (Hertzberg et al., 2001) have been used extensively to direct otherin-depth results and as a basis for selecting genes to targets by gene silencing(Schrader et al., 2004b,a; Moyle et al., 2002; Israelsson et al., 2003, 2005). Severalresearch projects are spin-offs from this initial wood transcriptional study, but sofar no large-scale database data mining has been performed for Populus microarraydata.

2.3 Reconstruction of gene regulatory networks in Populus

The aim of the outcome from this thesis was, from the beginning, to find regulonsin Populus leaves. A regulon is defined as a group of co-regulated genes (Dongand Horvath, 2007; Zhang and Horvath, 2005; Yip and Horvath, 2007). Geneswith highly correlated expression levels are biologically interesting, since they areassumed to have common regulatory mechanisms. Paper VI shows how generatedmicroarray data was used to find co-regulated genes in Populus leaves (Sjodin et al.,2007a). Understanding complete systems instead of reporting a list of isolated genesadds another dimension to biological research. Here the focus on regulons (modulesin network terminology) as opposed to individual genes. Tissue specificity of generegulation was predicted in addition to complete leaf regulon determination. Briefly,an experiment containing several different tissues was used, and modelled usingOPLS to find genes with significantly higher expression in specific groups of tissues.1116 gene models were significantly more highly expressed in leaves samples. Notsurprisingly, leaves showed high amounts of photosynthesis related genes togetherwith cell differentiation, structural molecule activity and lipid metabolism, as seenas the Plant Gene Ontology overview in Figure 3 in paper VI.

Those genes are, according to the Populus dataset, very characteristic for leaves.However drawing conclusions from only one dataset might be very dangerous. Re-sults were confirmed using two additional dataset, from Populus (Tuskan et al.,2006) and Arabidopsis (Schmid et al., 2005). The two Populus datasets had around451 gene models in common, corresponding to 40% of the gene models in the

2.3. RECONSTRUCTION OF GENE REGULATORY NETWORKS IN POPULUS 31

GO

molecular function

cellular component

biological process

5.00E-2 <5.00E-7

Figure 2.8. Overview of the over-represented Gene Ontology groups in leaf specificgenes. The colour indicates how significant the over-representation is within eachGene Ontology category and the size is proportional to number of included genes ineach category.

32 CHAPTER 2. RESULTS ACHIEVED

initial dataset from POP2 microarrays. The leaf specific genes contained high over-representation of Gene Ontology categories connected to photosynthesis. Figure2.8 shows the three different categories for the full Gene Ontology. It is in principlethe same figure as shown in Paper VI but the complexity in better visualised inthe full Gene Ontology. Especially cellular component pinpoints the connection tophotosynthesis components.

All available leaf experiments in UPSC-BASE (Sjodin et al., 2006) were downloadedand step-wise normalised. To not bias the results based on different EST abundanceon the microarray slide, the results were aggregated based on gene model andnetworks of gene expression data were constructed by calculating correlation for allgene pairs. Gene expression network analysis aims to identif modules and hub genes(i.e. highly connected genes) present in the dataset. Microarray data is often noiseyand correlation values were weighted to emphasise strong correlations (Zhang andHorvath, 2005). Groups of genes with similar patterns of expression were identifiedinside the network. Those groups of genes will be defined as modules and representsthe regulons of gene expression in Populus leaves. Each of them can be seen as anew network with tight internal connections. Hub genes have been shown to beimportant key regulatory genes (Lehner et al., 2006; Carlson et al., 2006).

The focus in the paper was on gene regulation of leaf development and transcrip-tion factors controlling gene expression. Developing tissues share many commonprocesses and gene models particularly helpful for discriminating leaf tissues fromother tissues were selected to pinpoint regulation that might be specific to leaves.The constructed networks based on this selection showed clear separation betweentwo main types of genes; photosynthesis and cell differentiation. Results were con-firmed using external experiments from Arabidopsis (Schmid et al., 2005), multiplePopulus microarray platforms (Matsubara et al., 2006; Tuskan et al., 2006; Streetet al., 2006) and digital Northern (Segerman et al., 2007; Sterky et al., 2004; Ewinget al., 1999) to not be biased by microarray results from a single source. A morenovel approach in Paper VI is the combination of microarray results and QTL forleaf development (Street et al., 2006) to screen for candidate genes that may beresponsible for the control of leaf development. This approach has huge potential,but at the moment it is very limited due to lack of publicly available QTL datafrom published articles. The information provided in QTL publications is nearlyalways not sufficient to allow re-analysis or re-plotting of the results. Other fields inbiology should learn from the microarray community and develop their own stan-dard rules for publishing corresponding to the MIAME standard (Brazma et al.,2001). The outcome of this collocation analysis was the identification of two goodcandidates transcription factors (YABBY and zinc finger homeodomain) that havebeen shown to be important in leaf development.

Chapter 3

Conclusions

In the first part of this thesis I show how to build a pipeline for streamlined microar-ray analysis from actual lab work to interpretation of regulated genes. Setting uproutines for hybridisation, scanning, image analysis and normalisation significantlyincreases the quality of the generated biological data. Securing the data quality isa crucial task to allow other scientists to trust and use the produced data. As afinal part of the work flow, a database was developed for the storing and analysis ofdata. The database provides large value for experimental biologist by letting themaccess advanced methods from a web interface.

In the second part of the thesis I show how the created infrastructure tools canbe used to convert the database to a useful resource of transcriptomic informa-tion. First, as a comprehensive study bringing genomic tools out in the field andthen a general overview of other project spanning different stresses and develop-mental stages. The goal with all of the experiments has always been to span asmany different developmental stages and environmental conditions as possible tobe able to reconstruct the gene regulatory network to predict regulons in Populusleaves. The last paper connects all earlier Populus leaf microarray experiments in ameta-analysis to achieve this goal. The regulon information is a valuable resourcefor constructing gene markers for fast fingerprinting using cheaper low-throughputtechnologies, such as RT-PCR. This will make it possible to perform large-scalefingerprinting in populations of trees, such as SwAsp (Luquez et al., 2007; Hall etal., 2007).

The draft genome has made it possible to work on putative gene models insteadof contigs based on EST sequences. It has helped the Populus field to take a stepforward, but the immature state of the Populus genome is still something of a bottle-neck. Gathering the Populus community to work in an open-access environmentis necessary if Populus should gain any importance for plant science outside theexisting small Populus field. More focus should put on learning from the mistakesand solutions made by the Arabidopsis community. The main issue to be solved isthe data sharing problem. (Last, 2003; Cech et al., 2003) Populus will never be awidely-used model organism if the data sharing issue isn’t resolved. An importantdecision to take is a uniform naming of gene models. At present the gene modelsin Populus are named according to best gene model prediction from four differentsoftwares. Names containing up to 50 characters are not very convenient to handle.

33

34 CHAPTER 3. CONCLUSIONS

This systems should be replaced by an implementation of the relative position namesystem used by the the Arabidopsis Genome Initiative (Mayer et al., 1999). Solvingthe name issues in Populus would simplify work significantly and make the genomefar more accessible to the wider community.

It was clear from the microarray platform comparison that the overlaps betweendifferent microarrays were minor compared to the total amount of predicted genesin the Populus genome. According to the calculation only 24687 of the 45555official gene models have EST support. Constructing a universal, affordable fullgenome Populus microarray is urgently required to allow more advanced large-scalemeta-analysis and to confirm the validity of the gene models.

Although this work has developed tools and generated information about gene reg-ulation in Populus leaves, it is still not known in detail how gene regulatory systemsworks during leaf development or autumn senescence. The processes are too com-plex to be solved by a single approach. Further development and integration ofmethods studying transcriptomic, metabolomics and proteomics as well and phys-iological data is needed. At the same time ’omics’ technologies can’t only be usedalone on plants in artificial environments. To understand the complexity in bio-logical systems the modern ’omics’ field needs to merge with genetics and ecology.The will allow furthering of understanding of how variation within and betweendifferent biological systems is achieved and controlled.

3.1 The next generation Populus genome browser

UPSC-BASE is a valuable resource for transcriptomics in Populus. However, end-users need to have knowledge about the microarray technology to handle and sortstored information. There is now enough microarray data publicly available inPopulus to allow creation of a Populus resource similiar to Genevestigator. Thisresource would allow extraction of gene expression data for genes of interest withoutthe need to understand advanced analysis tools.

During the work with microarray and sequence data for Populus it has become clearthat the present official JGI (Joint Genome Institute) genome browser doesn’t fulfilthe demands of advanced or the average user of the genome resource. A majorproblem with the browser is a lack of data sharing and description of how the datahas been collected and treated. The Arabidopsis Resource (TAIR) have recentlydecided to exchange their genome browser to an open-source solution used by manymodel systems. Efforts need to be taken if Populus to ensure that poplar remainsas the model systems for trees now that additional genomes such as grape andeucalyptus are on the way to being sequenced. A Populus genome browser basedon the GMOD (Generic Model Organism Database) project (Stein et al., 2002)

3.2. FUTURE PERSPECTIVES 35

would solve those problems. The GMOD project includes WormBase(Stein et al.,2002; Harris et al., 2003, 2004; Chen et al., 2004, 2005), FlyBase (Crosby et al.,2007), Mouse Genome Informatics (Eppig et al., 2005, 2007), Gramene (Ware etal., 2002a,b), the Rat Genome Database (Twigger et al., 2007), TAIR (Huala et al.,2001), EcoCyc (Karp et al., 2007; Keseler et al., 2005), DictyBase (Chisholm et al.,2006; Kreppel et al., 2004), wFleaBase (Colbourne et al., 2005) and SaccharomycesGenome Database (Dwight et al., 2004; Cherry et al., 1997). Collecting data fromvarious public resources in a public database makes it possible to spend effort onbiological interpretation instead of bioinformatic methods to merge data. This re-source is valuable both for bioinformatics performing global-scale analyses as wellfor ’favourite gene’ studies. BLAST searches against subsets of gene models, suchas the Jamboree set, are easily performed in contrast to the JGI genome browser,where all BLAST searches are performed against unfiltered genome sequence. Asearch at JGI will always result in multiple gene models at the same physical posi-tion on and results need to be filtered manually. A genome browser database willallow integration of additional data sources, such as QTL or microarray data. Itwould then be possible to easily extract expression and QTL information aboutinteresting genome regions. As an integrated resource, end-users would achieve themost out of generated biological data.

3.2 Future perspectives

In the future, tools and information presented in this thesis could be used in severalways. An option for further work would be to combine transcriptomics with genomesequence information so that it would be possible to find and understand how keyregulatory motifs control gene expression. Finally, it would be interesting to tostudy in greater detail key genes in leaf development, both in natural populationsand in transgenic plants. More work is needed before we can understand thesecomplex processes and ultimately produce trees optimised for growth in specificconditions.

Chapter 4

Acknowledgement

Stefan Janson first I would like to thank you as my supervisor belived in me fronthe beginning and gave me the opportunity study whatever I wanted. . . (mainlyoutside my PhD project)

Collaborators and co-authors My PhD time wouldn’t be the same would allnational and international collaboration I have been involved in.

Jansson group Former and present members of the Jansson group during myPhD: Kirsten, Jenny A, Ulrika, Rupali, Carsten, Johanna, Frank, Abdul, Os-kar, Viginia, Yvan, Jakob, Hanna, Lars, Roger. . .Martin is not to be forgot-ten. Thanks also for the biochemistry part, Maribel, Adriana and Agnieszkafor fun moments.

Jan Karlsson my former master thesis supervisor that together with Marisa andJenny J (tried to) introduced me to the plant biology. I am not really surethat you really succeeded. . .

Umeå Plant Science Centre I will miss all nice colleagues in the enormous fikaroom and your biological knowledge. You are welcome to contact SjödinConsulting for technical support and questions. . .

My project students during the years several students helped me to escape fromthe lab bench and to be outside in the rain together with the Aspen tree. Tackså mycket, Johan, Daniel, Sara, Sylvia, Therese, Ali and Julia.

TBi plant team : The first engineering students going for plant science, Linus,Jens and Lars. Now we are just waiting for you Linus. . . I am also grateful tosee how many that followed us to USCP and the plant field.

The chemometric nerds Max Bylesjö and Mattias Rantalainen, thanks for thescientific and social discussions at different bars around the world. . .

The fish tank My office has moved several times during my PhD time but thelast 4 years have been in the Fish tank with smaller modification of the crew.Version 1.01 (Charleen, Nathalie, Oskar), 2.0 (Charleen, Micke) and 2.5 (Nat,Micke). You made the work so much funnier during the boring periods. ’Julenvarar ända fram till påska!!!’ or in our case for several years.

37

38 CHAPTER 4. ACKNOWLEDGEMENT

The French mafia for introducing me to all strange French food and culture. Ipromise to be a better French language student in the future. Shout ’putaind’ordinateur de merde’ and you will have free computer support. . .

Midsummer Eve heroes Matthieu and Gaia (2004) and Nadia (2003), your leafsampling on the Midsummer Eve was invaluable! I still owe you a big favour.Just tell me when you need it!!

Nadia Goué for always being my hardest critics. The cup quote ’It is never toolate to be what you might have been’ motive me through the dark months ofwriting. I hope you found your place in the country of sun rise to let peoplearound you enjoying your happy face. I miss you and the ’gympa’. . .

Oskar Skogström it has been nice to have you here during all years in Umeå.Even if you have been a little bit ’invisible’ the last years. . .

Natalie Druart and Charleen Moreau you made a brave attempt to teach meFrench. However I am sorry to tell I almost only remember ’Ta Gueule!!!’. . . maybebecause it was the only time you both got silient in the office.

Sofia Frankki for convincing me to stay in Umeå and all nice moments togetherduring the years. You will always remind me about Star Wars.

Mélanie & Yohann thanks for nice company and dinners. You gave me a newperspective on French people . . .

Rozenn Le Hir you finally realized that nothing is impossible as long you wantand dare to do it. . . Be brave girl!

Nathaniel Street deserves special thank for nice company around the world (Umeå,Southampton, Gatlinburg, Pretoria and Venice). The last months of workwouldn’t have been possible or done without your help and motivation. It issad that we couldn’t work together for longer time. Let’s enjoy Sweden now!

Joséli Schwambach Hej, hej, hemskt mycket hej! The Lofoten trip was a greatvacation (at the right moment to forget work). I miss the all nice tea breakswith you. Next time in Brasil!

Dephine Gendre for all cool moments and travels. I will miss you and yourhonest opinions in our discussions! Nobody can climb like you. . . ps. Don’tforget to call me when you want to open a café!!!

Sunday movie It has been a really nice to finish the weekend with nice companyin front of a good movie. Keep the traditions going!

Family here is your opportunity to understand what I have been doing the lastyears in Umeå. . . ps. Det finns en svensk sammanfattning. . . å tack lillasysterför det fina omlaget! Troligen är det fler som lägger märke till ditt arbete änmitt. . .

Bibliography

Allison, D. B., Cui, X., Page, G. P., and Sabripour, M. 2006. Microarray data analysis: from disarrayto consolidation and consensus. Nature Reviews Genetics, 7(1):55–65.

Anderson, L. and Seilhamer, J. 1997. A comparison of selected mRNA and protein abundances in humanliver. Electrophoresis, 18(3-4):533–537.

Andersson, A., Keskitalo, J., Sjodin, A., Bhalerao, R., Sterky, F., Wissel, K., Tandre, K., Aspeborg,H., Moyle, R., Ohmiya, Y., Bhalerao, R., Brunner, A., Gustafsson, P., Karlsson, J., Lundeberg, J.,Nilsson, O., Sandberg, G., Strauss, S., Sundberg, B., Uhlen, M., Jansson, S., and Nilsson, P. 2004.A transcriptional timetable of autumn senescence. Genome Biology, 5(4):R24.

Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsisthaliana. Nature, 408(6814):796–815.

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski,K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S.,Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. 2000. Gene ontology:tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 25(1):25–9.

Augenlicht, L. H., Kobrin, D., Pavlovec, A., and Royston, M. E. 1984. Elevated expression of anendogenous retroviral long terminal repeat in a mouse colon tumor. Journal of Biological Chemistry,259(3):1842–1847.

Augenlicht, L. H., Wahrman, M. Z., Halsey, H., Anderson, L., Taylor, J., and Lipkin, M. 1987. Expressionof cloned sequences in biopsies of human colonic tissue and in colonic carcinoma cells induced todifferentiate in vitro. Cancer Research, 47(22):6017–6021.

Babst, B. A., Ferrieri, R. A., Gray, D. W., Lerdau, M., Schlyer, D. J., Schueller, M., Thorpe, M. R.,and Orians, C. M. 2005. Jasmonic acid induces rapid changes in carbon transport and partitioningin Populus. New Phytologist, 167(1):63–72.

Banerjee, N. and Zhang, M. Q. 2002. Functional genomics as applied to mapping transcription regulatorynetworks. Current Opinion in Microbiology, 5(3):313–7.

Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I. F., Soboleva,A., Tomashevsky, M., and Edgar, R. 2007. NCBI GEO: mining tens of millions of expression profiles–database and tools update. Nucleic Acids Research, 35(Database issue):D760–D765.

Barron-Gafford, G., Martens, D., Grieve, K., Biel, K., Kudeyarov, V., McLain, J. E. T., Lipson, D., andMurthy, R. 2005. Growth of eastern cottonwoods (Populus deltoides) in elevated [CO2] stimulatesstand-level respiration and rhizodeposition of carbohydrates, accelerates soil nutrient depletion, yetstimulates above- and belowground biomass production. Global Change Biology, 11(8):1220–1233.

Bartel, D. P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116(2):281–297.

Bengtsson, H., Jonsson, G., and Vallon-Christersson, J. 2004. Calibration and assessment of channel-specific biases in microarray data with extended dynamical range. BMC Bioinformatics, 5:177.

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. 2007. GenBank.Nucleic Acids Res, 35(Database issue):D21–D25.

Bhalerao, R., Keskitalo, J., Sterky, F., Erlandsson, R., Bjorkbacka, H., Birve, S. J., Karlsson, J.,Gardestrom, P., Gustafsson, P., Lundeberg, J., and Jansson, S. 2003. Gene expression in autumnleaves. Plant Physiology, 131(2):430–442.

39

40 BIBLIOGRAPHY

Bidaut, G., Manion, F. J., Garcia, C., and Ochs, M. F. 2006. WaveRead: automatic measurementof relative gene expression levels from microarrays using wavelet analysis. Journal of BiomedicalInformatics, 39(4):379–388.

Bradshaw, H. D., Ceulemans, R., Davis, J., and Stettler, R. 2000. Emerging model systems in plantbiology: Poplar (Populus) as a model forest tree. Journal of Plant Growth Regulation, 19(3):306–313.

Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., An-sorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim, I. F.,Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart,J., Taylor, R., Vilo, J., and Vingron, M. 2001. Minimum information about a microarray experiment(MIAME)-toward standards for microarray data. Nature Genetics, 29(4):365–371.

Brazma, A., Kapushesky, M., Parkinson, H., Sarkans, U., and Shojatalab, M. 2006. Data storage andanalysis in ArrayExpress. Methods in Enzymology, 411:370–386.

Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., Holloway, E.,Kapushesky, M., Kemmeren, P., Lara, G. G., Oezcimen, A., Rocca-Serra, P., and Sansone, S.-A.2003. ArrayExpress–a public repository for microarray gene expression data at the EBI. NucleicAcids Research, 31(1):68–71.

Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D. H., Johnson, D., Luo, S., McCurdy, S.,Foy, M., Ewan, M., Roth, R., George, D., Eletr, S., Albrecht, G., Vermaas, E., Williams, S. R., Moon,K., Burcham, T., Pallas, M., DuBridge, R. B., Kirchner, J., Fearon, K., Mao, J., and Corcoran, K.2000. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbeadarrays. Nature Biotechnology, 18(6):630–634.

Brosche, M., Vinocur, B., Alatalo, E. R., Lamminmaki, A., Teichmann, T., Ottow, E. A., Djilianov, D.,Afif, D., Bogeat-Triboulot, M.-B., Altman, A., Polle, A., Dreyer, E., Rudd, S., Paulin, L., Auvinen,P., and Kangasjarvi, J. 2005. Gene expression and metabolite profiling of Populus euphratica growingin the Negev desert. Genome Biology, 6(12):R101.

Brunner, A. M., Busov, V. B., and Strauss, S. H. 2004. Poplar genome sequence: functional genomicsin an ecologically dominant plant species. Trends in Plant Science, 9(1):49–56.

Bulow, L., Schindler, M., and Hehl, R. 2007. PathoPlant: a platform for microarray expression data toanalyze co-regulated genes involved in plant defense responses. Nucleic Acids Research, 35(Databaseissue):D841–D845.

Bulow, L., Steffens, N. O., Galuschka, C., Schindler, M., and Hehl, R. 2006. AthaMap: from in silicodata to real transcription factor binding sites. In Silico Biology, 6(3):243–252.

Bylesjo, M., Eriksson, D., Sjodin, A., Jansson, S., Moritz, T., and Trygg, J. 2007. Orthogonal projectionsto latent structures as a strategy for microarray data normalization. BMC Bioinformatics, 8:207.

Bylesjo, M., Eriksson, D., Sjodin, A., Sjostrom, M., Jansson, S., Antti, H., and Trygg, J. 2005.MASQOT: a method for cDNA microarray spot quality control. BMC Bioinformatics, 6:250.

Bylesjo, M., Sjodin, A., Eriksson, D., Antti, H., Moritz, T., Jansson, S., and Trygg, J. 2006. MASQOT-GUI: spot quality assessment for the two-channel microarray platform. Bioinformatics, 22(20):2554–2555.

Calfapietra, C., Gielen, B., Galema, A. N. J., Lukac, M., Angelis, P. D., Moscatelli, M. C., Ceulemans,R., and Scarascia-Mugnozza, G. 2003. Free-air CO2 enrichment (FACE) enhances biomass productionin a short-rotation poplar plantation. Tree Physiology, 23(12):805–814.

Carlson, M. R. J., Zhang, B., Fang, Z., Mischel, P. S., Horvath, S., and Nelson, S. F. 2006. Gene connec-tivity, function, and sequence conservation: predictions from modular yeast co-expression networks.BMC Genomics, 7:40.

41

Cech, T. R., Eddy, S. R., Eisenberg, D., Hersey, K., Holtzman, S. H., Poste, G. H., Raikhel, N. V.,Scheller, R. H., Singer, D. B., Waltham, M. C., and on Responsibilities of Authorship in the Bio-logical Sciences, N. A. C. 2003. Sharing publication-related data and materials: responsibilities ofauthorship in the life sciences. Plant Physiology, 132(1):19–24.

Chen, N., Harris, T. W., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., Bradnam, K., Canaran,P., Chan, J., Chen, C.-K., Chen, W. J., Cunningham, F., Davis, P., Kenny, E., Kishore, R., Lawson,D., Lee, R., Muller, H.-M., Nakamura, C., Pai, S., Ozersky, P., Petcherski, A., Rogers, A., Sabo, A.,Schwarz, E. M., Auken, K. V., Wang, Q., Durbin, R., Spieth, J., Sternberg, P. W., and Stein, L. D.2005. WormBase: a comprehensive data resource for Caenorhabditis biology and genomics. NucleicAcids Research, 33(Database issue):D383–D389.

Chen, N., Lawson, D., Bradnam, K., Harris, T. W., and Stein, L. D. 2004. WormBase as an integratedplatform for the C. elegans ORFeome. Genome Research, 14(10B):2155–2161.

Chen, Y., Dougherty, E. R., and Bittner, M. L. 1997. Ratio-based decisions and the quantitative analysisof cDNA microarray images. Journal of Biomedical Optics, 2:364–374.

Cherry, J. M., Ball, C., Weng, S., Juvik, G., Schmidt, R., Adler, C., Dunn, B., Dwight, S., Riles, L.,Mortimer, R. K., and Botstein, D. 1997. Genetic and physical maps of Saccharomyces cerevisiae.Nature, 387(6632 Suppl):67–73.

Chisholm, R. L., Gaudet, P., Just, E. M., Pilcher, K. E., Fey, P., Merchant, S. N., and Kibbe, W. A.2006. dictyBase, the model organism database for Dictyostelium discoideum. Nucleic Acids Research,34(Database issue):D423–D427.

Churchill, G. A. 2002. Fundamentals of experimental design for cDNA microarrays. Nature Genetics,32 Suppl:490–5.

Colbourne, J. K., Singan, V. R., and Gilbert, D. G. 2005. wFleaBase: the Daphnia genome database.BMC Bioinformatics, 6:45.

Crosby, M. A., Goodman, J. L., Strelets, V. B., Zhang, P., Gelbart, W. M., and Consortium, F. 2007.FlyBase: genomes by the dozen. Nucleic Acids Research, 35(Database issue):D486–D491.

DeRisi, J., Penland, L., Brown, P. O., Bittner, M. L., Meltzer, P. S., Ray, M., Chen, Y., Su, Y. A., andTrent, J. M. 1996. Use of a cDNA microarray to analyse gene expression patterns in human cancer.Nature Genetics, 14(4):457–460.

Diaz, E., Yang, Y. H., Ferreira, T., Loh, K. C., Okazaki, Y., Hayashizaki, Y., Tessier-Lavigne, M., Speed,T. P., and Ngai, J. 2003. Analysis of gene expression in the developing mouse retina. Proceedings ofthe National Academy of Sciences, USA, 100(9):5491–5496.

Dong, J. and Horvath, S. 2007. Understanding Network Concepts in Modules. BMC Systems Biology,1(1).

Druart, N., Rodriguez-Buey, M., Barron-Gafford, G., Sjodin, A., Bhalerao, R., and Hurry, V. 2006.Molecular targets of elevated [CO2] in leaves and stems of Populus deltoides: implications for futuretree growth and carbon sequestration. Functional Plant Biology, 33(2):121–131.

Druart, N., Johansson, A., Baba, K., Schrader, J., Sjodin, A., Bhalerao, R. R., Resman, L., Trygg,J., Moritz, T., and Bhalerao, R. P. 2007. Environmental and hormonal regulation of the activity-dormancy cycle in the cambial meristem involves stage-specific modulation of transcriptional andmetabolic networks. Plant Journal, 50(4):557–573.

Dudley, A. M., Aach, J., Steffen, M. A., and Church, G. M. 2002. Measuring absolute expression withmicroarrays with a calibrated reference sample and an extended signal intensity range. Proceedingsof the National Academy of Sciences, USA, 99(11):7554–7559.

Dudoit, S. and Yang, Y. H. 2002. Bioconductor R packages for exploratory analysis and normalizationof cDNA microarray data, chapter 3, pages 73–101. Springer, NY.

42 BIBLIOGRAPHY

Dudoit, S., Gentleman, R. C., and Quackenbush, J. 2003. Open source software for the analysis ofmicroarray data. Biotechniques, Suppl:45–51.

Dwight, S. S., Balakrishnan, R., Christie, K. R., Costanzo, M. C., Dolinski, K., Engel, S. R., Feierbach,B., Fisk, D. G., Hirschman, J., Hong, E. L., Issel-Tarver, L., Nash, R. S., Sethuraman, A., Starr, B.,Theesfeld, C. L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Weng, S., Botstein,D., and Cherry, J. M. 2004. Saccharomyces genome database: underlying principles and organisation.Briefings in Bioinformatics, 5(1):9–22.

Eaton, A. D. 2006. HubMed: a web-based biomedical literature search interface. Nucleic Acids Research,34(Web Server issue):W745–W747.

Edgar, R., Domrachev, M., and Lash, A. E. 2002. Gene Expression Omnibus: NCBI gene expressionand hybridization array data repository. Nucleic Acids Research, 30(1):207–210.

Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. 1998. Cluster analysis and display ofgenome-wide expression patterns. Proceedings of the National Academy of Sciences, USA, 95(25):14863–14868.

Eppig, J. T., Blake, J. A., Bult, C. J., Kadin, J. A., Richardson, J. E., and Group, M. G. D. 2007. Themouse genome database (MGD): new features facilitating a model system. Nucleic Acids Research,35(Database issue):D630–D637.

Eppig, J. T., Bult, C. J., Kadin, J. A., Richardson, J. E., Blake, J. A., Anagnostopoulos, A., Baldarelli,R. M., Baya, M., Beal, J. S., Bello, S. M., Boddy, W. J., Bradt, D. W., Burkart, D. L., Butler, N. E.,Campbell, J., Cassell, M. A., Corbani, L. E., Cousins, S. L., Dahmen, D. J., Dene, H., Diehl, A. D.,Drabkin, H. J., Frazer, K. S., Frost, P., Glass, L. H., Goldsmith, C. W., Grant, P. L., Lennon-Pierce,M., Lewis, J., Lu, I., Maltais, L. J., McAndrews-Hill, M., McClellan, L., Miers, D. B., Miller, L. A.,Ni, L., Ormsby, J. E., Qi, D., Reddy, T. B. K., Reed, D. J., Richards-Smith, B., Shaw, D. R., Sinclair,R., Smith, C. L., Szauter, P., Walker, M. B., Walton, D. O., Washburn, L. L., Witham, I. T., Zhu, Y.,and Group, M. G. D. 2005. The Mouse Genome Database (MGD): from genes to mice–a communityresource for mouse biology. Nucleic Acids Research, 33(Database issue):D471–D475.

Ewing, R., Poirot, O., and Claverie, J. M. 1999. Comparative analysis of the Arabidopsis and riceexpressed sequence tag (EST) sets. In Silico Biology, 1(4):197–213.

Futschik, M. and Crompton, T. 2004. Model selection and efficiency testing for normalization of cDNAmicroarray data. Genome Biology, 5(8):R60.

Futschik, M. E. and Crompton, T. 2005. OLIN: optimized normalization, visualization and qualitytesting of two-channel microarray data. Bioinformatics, 21(8):1724–1726.

Galuschka, C., Schindler, M., Bulow, L., and Hehl, R. 2007. AthaMap web tools for the analysis andidentification of co-regulated genes. Nucleic Acids Research, 35(Database issue):D857–D862.

Garcia-Lorenzo, M., Sjodin, A., Jansson, S., and Funk, C. 2006. Protease gene families in Populus andArabidopsis. BMC Plant Biology, 6:30.

Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B., Gautier,L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Leisch, F.,Li, C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang, J.Y. H., and Zhang, J. 2004. Bioconductor: open software development for computational biology andbioinformatics. Genome Biology, 5(10):R80.

Grennan, A. K. 2006. Genevestigator. Facilitating web-based gene-expression analysis. Plant Physiology,141(4):1164–1166.

Hall, D., Luquez, V., Garcia, V. M., Onge, K. R. S., Jansson, S., and Ingvarsson, P. K. 2007. AdaptivePopulation Differentiation in Phenology Across A Latitudinal Gradient in European Aspen (PopulusTremula, L.): A Comparison of Neutral Markers, Candidate Genes and Phenotypic Traits. Evolution.

43

Hannerz, M. 1999. Evaluation of temperature models for predicting bud burst in Norway spruce. Cana-dian Journal of Forest Research/Revue Canadienne De Recherche Forestiere, 29(1):9–19.

Harris, T. W., Chen, N., Cunningham, F., Tello-Ruiz, M., Antoshechkin, I., Bastiani, C., Bieri, T.,Blasiar, D., Bradnam, K., Chan, J., Chen, C.-K., Chen, W. J., Davis, P., Kenny, E., Kishore, R.,Lawson, D., Lee, R., Muller, H.-M., Nakamura, C., Ozersky, P., Petcherski, A., Rogers, A., Sabo,A., Schwarz, E. M., Auken, K. V., Wang, Q., Durbin, R., Spieth, J., Sternberg, P. W., and Stein,L. D. 2004. WormBase: a multi-species resource for nematode biology and genomics. Nucleic AcidsResearch, 32(Database issue):D411–D417.

Harris, T. W., Lee, R., Schwarz, E., Bradnam, K., Lawson, D., Chen, W., Blasier, D., Kenny, E.,Cunningham, F., Kishore, R., Chan, J., Muller, H.-M., Petcherski, A., Thorisson, G., Day, A., Bieri,T., Rogers, A., Chen, C.-K., Spieth, J., Sternberg, P., Durbin, R., and Stein, L. D. 2003. WormBase:a cross-species database for comparative genomics. Nucleic Acids Research, 31(1):133–137.

Harvey, H. P. and Driessche, R. V. D. 1997. Nutrition, xylem cavitation and drought resistance inhybrid poplar. Tree Physiology, 17(10):647–654.

Heid, C. A., Stevens, J., Livak, K. J., and Williams, P. M. 1996. Real time quantitative PCR. GenomeResearch, 6(10):986–994.

Hertzberg, M., Aspeborg, H., Schrader, J., Andersson, A., Erlandsson, R., Blomqvist, K., Bhalerao,R., Uhlen, M., Teeri, T. T., Lundeberg, J., Sundberg, B., Nilsson, P., and Sandberg, G. 2001. Atranscriptional roadmap to wood formation. Proceedings of the National Academy of Sciences,USA, 98(25):14732–14737.

Hosack, D. A., G., J. D., Sherman, B. T., Lane, H. C., and Lempicki, R. A. 2003. Identifying biologicalthemes within lists of genes with EASE. Genome Biology, 4(10):R70.

Huala, E., Dickerman, A. W., Garcia-Hernandez, M., Weems, D., Reiser, L., LaFond, F., Hanley, D.,Kiphart, D., Zhuang, M., Huang, W., Mueller, L. A., Bhattacharyya, D., Bhaya, D., Sobral, B. W.,Beavis, W., Meinke, D. W., Town, C. D., Somerville, C., and Rhee, S. Y. 2001. The Arabidopsis In-formation Resource (TAIR): a comprehensive database and web-based information retrieval, analysis,and visualization system for a model plant. Nucleic Acids Research, 29(1):102–5.

Huber, W., von Heydebreck, A., Sueltmann, H., Poustka, A., and Vingron, M. 2003. Parameter estima-tion for the calibration and variance stabilization of microarray data. Stat Appl Genet Mol Biol, 2(1):Article3.

Huber, W., von Heydebreck, A., Sultmann, H., Poustka, A., and Vingron, M. 2002. Variance stabi-lization applied to microarray data calibration and to the quantification of differential expression.Bioinformatics, 18 Suppl 1:S96–104.

Ihaka, R. and Gentleman, R. 1996. R: a language for data analysis and graphics. Journal of Compu-tational and Graphical Statistics, 5:299–314.

Ikeo, K., Ishi-i, J., Tamura, T., Gojobori, T., and Tateno, Y. 2003. CIBEX: center for informationbiology gene expression database. Comptes Rendus Biologies, 326(10-11):1079–1082.

International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome.Nature, 436(7052):793–800.

Israelsson, M., Eriksson, M. E., Hertzberg, M., Aspeborg, H., Nilsson, P., and Moritz, T. 2003. Changesin gene expression in the wood-forming tissue of transgenic hybrid aspen with increased secondarygrowth. Plant Molecular Biology, 52(4):893–903.

Israelsson, M., Sundberg, B., and Moritz, T. 2005. Tissue-specific localization of gibberellins and expres-sion of gibberellin-biosynthetic and signaling genes in wood-forming tissues in aspen. Plant Journal,44(3):494–504.

44 BIBLIOGRAPHY

Jain, A. N., Tokuyasu, T. A., Snijders, A. M., Segraves, R., Albertson, D. G., and Pinkel, D. 2002. Fullyautomatic quantification of microarray image data. Genome Research, 12(2):325–332.

Jansson, S. and Douglas, C. J. 2007. Populus: a model system for plant biology. Annual Review ofPlant Biology, 58:435–458.

Kanehisa, M. and Goto, S. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic AcidsResearch, 28(1):27–30.

Karp, P. D., Keseler, I. M., Shearer, A., Latendresse, M., Krummenacker, M., Paley, S. M., Paulsen,I., Collado-Vides, J., Gama-Castro, S., Peralta-Gil, M., Santos-Zavaleta, A., Penaloza-Spinola, M. I.,Bonavides-Martinez, C., and Ingraham, J. 2007. Multidimensional annotation of the Escherichia coliK-12 genome. Nucleic Acids Research, doi:10.1093/nar/gkm740.

Kerr, M. K., Martin, M., and Churchill, G. A. 2000. Analysis of variance for gene expression microarraydata. Journal of Computational Biology, 7(6):819–837.

Keseler, I. M., Collado-Vides, J., Gama-Castro, S., Ingraham, J., Paley, S., Paulsen, I. T., Peralta-Gil,M., and Karp, P. D. 2005. EcoCyc: a comprehensive database resource for Escherichia coli. NucleicAcids Research, 33(Database issue):D334–D337.

Keskitalo, J., Bergquist, G., Gardestrom, P., and Jansson, S. 2005. A cellular timetable of autumnsenescence. Plant Physiology, 139(4):1635–1648.

Kim, V. N. and Nam, J.-W. 2006. Genomics of microRNA. Trends in Genetics, 22(3):165–173.

Kohler, A., Delaruelle, C., Martin, D., Encelot, N., and Martin, F. 2003. The poplar root transcriptome:analysis of 7000 expressed sequence tags. FEBS Letters, 542(1-3):37–41.

Kreppel, L., Fey, P., Gaudet, P., Just, E., Kibbe, W. A., Chisholm, R. L., and Kimmel, A. R. 2004.dictyBase: a new Dictyostelium discoideum genome database. Nucleic Acids Research, 32(Databaseissue):D332–D333.

Kreps, J. A., Wu, Y., Chang, H.-S., Zhu, T., Wang, X., and Harper, J. F. 2002. Transcriptome changesfor Arabidopsis in response to salt, osmotic, and cold stress. Plant Physiology, 130(4):2129–2141.

Kulheim, C., Agren, J., and Jansson, S. 2002. Rapid regulation of light harvesting and plant fitness inthe field. Science, 297(5578):91–3.

Lai, E. C., Tomancak, P., Williams, R. W., and Rubin, G. M. 2003. Computational identification ofDrosophila microRNA genes. Genome Biology, 4(7):R42.

Last, R. L. 2003. Sandbox ethics in science: sharing of data and materials in plant biology. PlantPhysiology, 132(1):17–18.

Lehner, B., Crombie, C., Tischler, J., Fortunato, A., and Fraser, A. G. 2006. Systematic mappingof genetic interactions in Caenorhabditis elegans identifies common modifiers of diverse signalingpathways. Nature Genetics, 38(8):896–903.

Lewis, B. P., Burge, C. B., and Bartel, D. P. 2005. Conserved seed pairing, often flanked by adenosines,indicates that thousands of human genes are microRNA targets. Cell, 120(1):15–20.

Lim, L. P., Lau, N. C., Weinstein, E. G., Abdelhakim, A., Yekta, S., Rhoades, M. W., Burge, C. B.,and Bartel, D. P. 2003. The microRNAs of Caenorhabditis elegans. Genes & Development, 17(8):991–1008.

Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M.,Wang, C., Kobayashi, M., Horton, H., and Brown, E. L. 1996. Expression monitoring by hybridizationto high-density oligonucleotide arrays. Nature Biotechnology, 14(13):1675–1680.

45

Lu, J., Getz, G., Miska, E. A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Sweet-Cordero, A., Ebert,B. L., Mak, R. H., Ferrando, A. A., Downing, J. R., Jacks, T., Horvitz, H. R., and Golub, T. R.2005. MicroRNA expression profiles classify human cancers. Nature, 435(7043):834–838.

Lu, W., Kimball, E., and Rabinowitz, J. D. 2006. A high-performance liquid chromatography-tandemmass spectrometry method for quantitation of nitrogen-containing intracellular metabolites. Journalof The American Society for Mass Spectrometry, 17(1):37–50.

Luquez, V., Hall, D., Albrectsen, B., Karlsson, J., Ingvarsson, P., and Jansson, S. 2007. Naturalphenological variation in aspen ( Populus tremula ): the SwAsp collection. Tree Genetics & Genomes,doi:10.1007/s11295-007-0108-y.

Marron, N., Delay, D., Petit, J.-M., Dreyer, E., Kahlem, G., Delmotte, F. M., and Brignolas, F. 2002.Physiological traits of two Populus x euramericana clones, Luisa Avanzo and Dorskamp, during awater stress and re-watering cycle. Tree Physiology, 22(12):849–858.

Matsubara, S., Hurry, V., Druart, N., Benedict, C., Janzik, I., Chavarria-Krauser, A., Walter, A., andSchurr, U. 2006. Nocturnal changes in leaf growth of Populus deltoides are controlled by cytoplasmicgrowth. Planta, 223(6):1315–1328.

Mayer, K., Schuller, C., Wambutt, R., Murphy, G., Volckaert, G., Pohl, T., Dusterhoft, A., Stiekema,W., Entian, K. D., Terryn, N., Harris, B., Ansorge, W., Brandt, P., Grivell, L., Rieger, M., We-ichselgartner, M., de Simone, V., Obermaier, B., Mache, R., Muller, M., Kreis, M., Delseny, M.,Puigdomenech, P., Watson, M., Schmidtheini, T., Reichert, B., Portatelle, D., Perez-Alonso, M.,Boutry, M., Bancroft, I., Vos, P., Hoheisel, J., Zimmermann, W., Wedler, H., Ridley, P., Lang-ham, S. A., McCullagh, B., Bilham, L., Robben, J., der Schueren, J. V., Grymonprez, B., Chuang,Y. J., Vandenbussche, F., Braeken, M., Weltjens, I., Voet, M., Bastiaens, I., Aert, R., Defoor, E.,Weitzenegger, T., Bothe, G., Ramsperger, U., Hilbert, H., Braun, M., Holzer, E., Brandt, A., Peters,S., van Staveren, M., Dirske, W., Mooijman, P., Lankhorst, R. K., Rose, M., Hauf, J., Kotter, P.,Berneiser, S., Hempel, S., Feldpausch, M., Lamberth, S., den Daele, H. V., Keyser, A. D., Buysshaert,C., Gielen, J., Villarroel, R., Clercq, R. D., Montagu, M. V., Rogers, J., Cronin, A., Quail, M., Bray-Allen, S., Clark, L., Doggett, J., Hall, S., Kay, M., Lennard, N., McLay, K., Mayes, R., Pettett,A., Rajandream, M. A., Lyne, M., Benes, V., Rechmann, S., Borkova, D., Blocker, H., Scharfe, M.,Grimm, M., Lohnert, T. H., Dose, S., de Haan, M., Maarse, A., Schafer, M., Müller-Auer, S.,Gabel, C., Fuchs, M., Fartmann, B., Granderath, K., Dauner, D., Herzl, A., Neumann, S., Argiriou,A., Vitale, D., Liguori, R., Piravandi, E., Massenet, O., Quigley, F., Clabauld, G., Mündlein, A.,Felber, R., Schnabl, S., Hiller, R., Schmidt, W., Lecharny, A., Aubourg, S., Chefdor, F., Cooke,R., Berger, C., Montfort, A., Casacuberta, E., Gibbons, T., Weber, N., Vandenbol, M., Bargues,M., Terol, J., Torres, A., Perez-Perez, A., Purnelle, B., Bent, E., Johnson, S., Tacon, D., Jesse, T.,Heijnen, L., Schwarz, S., Scholler, P., Heber, S., Francs, P., Bielke, C., Frishman, D., Haase, D.,Lemcke, K., Mewes, H. W., Stocker, S., Zaccaria, P., Bevan, M., Wilson, R. K., de la Bastide, M.,Habermann, K., Parnell, L., Dedhia, N., Gnoj, L., Schutz, K., Huang, E., Spiegel, L., Sehkon, M.,Murray, J., Sheet, P., Cordes, M., Abu-Threideh, J., Stoneking, T., Kalicki, J., Graves, T., Harmon,G., Edwards, J., Latreille, P., Courtney, L., Cloud, J., Abbott, A., Scott, K., Johnson, D., Minx, P.,Bentley, D., Fulton, B., Miller, N., Greco, T., Kemp, K., Kramer, J., Fulton, L., Mardis, E., Dante,M., Pepin, K., Hillier, L., Nelson, J., Spieth, J., Ryan, E., Andrews, S., Geisel, C., Layman, D.,Du, H., Ali, J., Berghoff, A., Jones, K., Drone, K., Cotton, M., Joshu, C., Antonoiu, B., Zidanic,M., Strong, C., Sun, H., Lamar, B., Yordan, C., Ma, P., Zhong, J., Preston, R., Vil, D., Shekher,M., Matero, A., Shah, R., Swaby, I. K., O’Shaughnessy, A., Rodriguez, M., Hoffmann, J., Till, S.,Granat, S., Shohdy, N., Hasegawa, A., Hameed, A., Lodhi, M., Johnson, A., Chen, E., Marra, M.,Martienssen, R., and McCombie, W. R. 1999. Sequence and analysis of chromosome 4 of the plantArabidopsis thaliana. Nature, 402(6763):769–777.

Mewes, H. W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., and Frishman, D. 1999.MIPS: a database for genomes and protein sequences. Nucleic Acids Research, 27(1):44–8.

Moreau, C., Aksenov, N., Lorenzo, M. G., Segerman, B., Funk, C., Nilsson, P., Jansson, S., and Tuomi-nen, H. 2005. A genomic approach to investigate developmental cell death in woody tissues of Populustrees. Genome Biology, 6(4):R34.

46 BIBLIOGRAPHY

Moyle, R., Schrader, J., Stenberg, A., Olsson, O., Saxena, S., Sandberg, G., and Bhalerao, R. P. 2002.Environmental and auxin regulation of wood formation involves members of the Aux/IAA gene familyin hybrid aspen. Plant Journal, 31(6):675–85.

Murthy, R., Barron-Gafford, G., Dougherty, P. M., Engel, V. C., Grieve, K., Handley, L., Klimas, C.,Potosnak, M. J., Zarnoch, S. J., and Zhang, J. W. 2005. Increased leaf area dominates carbon fluxresponse to elevated CO2 in stands of Populus deltoides (Bartr.). Global Change Biology, 11(5):716–731.

Oono, Y., Seki, M., Nanjo, T., Narusaka, M., Fujita, M., Satoh, R., Satou, M., Sakurai, T., Ishida,J., Akiyama, K., Iida, K., Maruyama, K., Satoh, S., Yamaguchi-Shinozaki, K., and Shinozaki, K.2003. Monitoring expression profiles of Arabidopsis gene expression during rehydration process afterdehydration using ca 7000 full-length cDNA microarray. Plant Journal, 34(6):868–887.

Perez, O. D. and Nolan, G. P. 2002. Simultaneous measurement of multiple active kinase states usingpolychromatic flow cytometry. Nature Biotechnology, 20(2):155–162.

Ralph, S., Oddy, C., Cooper, D., Yueh, H., Jancsik, S., Kolosova, N., Philippe, R. N., Aeschliman,D., White, R., Huber, D., Ritland, C. E., Benoit, F., Rigby, T., Nantel, A., Butterfield, Y. S. N.,Kirkpatrick, R., Chun, E., Liu, J., Palmquist, D., Wynhoven, B., Stott, J., Yang, G., Barber, S.,Holt, R. A., Siddiqui, A., Jones, S. J. M., Marra, M. A., Ellis, B. E., Douglas, C. J., Ritland, K.,and Bohlmann, J. 2006. Genomics of hybrid poplar (Populus trichocarpax deltoides) interacting withforest tent caterpillars (Malacosoma disstria): normalized and full-length cDNA libraries, expressedsequence tags, and a cDNA microarray for the study of insect-induced defences in poplar. MolecularEcology, 15(5):1275–1297.

Ramirez-Carvajal, G. A., Morse, A. M., and Davis, J. M. 2007. Transcript profiles of the cytokininresponse regulator gene family in Populus imply diverse roles in plant development. New Phytologist.

Ranjan, P., Kao, Y.-Y., Jiang, H., Joshi, C. P., Harding, S. A., and Tsai, C.-J. 2004. Suppressionsubtractive hybridization-mediated transcriptome analysis from multiple tissues of aspen (Populustremuloides) altered in phenylpropanoid metabolism. Planta, 219(4):694–704.

Reinartz, J., Bruyns, E., Lin, J.-Z., Burcham, T., Brenner, S., Bowen, B., Kramer, M., and Woychik,R. 2002. Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative geneexpression profiling in all organisms. Briefings in Functional Genomics and Proteomics, 1(1):95–104.

Rinaldi, C., Kohler, A., Frey, P., Duchaussoy, F., Ningre, N., Couloux, A., Wincker, P., Thiec, D. L.,Fluch, S., Martin, F., and Duplessis, S. 2007. Transcript profiling of poplar leaves upon infection withcompatible and incompatible strains of the foliar rust Melampsora larici-populina. Plant Physiology,144(1):347–366.

Ryden, P., Andersson, H., Landfors, M., Naslund, L., Hartmanova, B., Noppa, L., and Sjostedt, A.2006. Evaluation of microarray data normalization procedures using spike-in experiments. BMCBioinformatics, 7:300.

Saal, L. H., Troein, C., Vallon-Christersson, J., Gruvberger, S., Borg, A., and Peterson, C. 2002. BioAr-ray Software Environment (BASE): a platform for comprehensive management and analysis of mi-croarray data. Genome Biology, 3(8):SOFTWARE0003.

Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N., Braisted, J., Klapa, M., Currier,T., Thiagarajan, M., Sturn, A., Snuffin, M., Rezantsev, A., Popov, D., Ryltsov, A., Kostukovich, E.,Borisovsky, I., Liu, Z., Vinsavich, A., Trush, V., and Quackenbush, J. 2003. TM4: a free, open-sourcesystem for microarray data management and analysis. Biotechniques, 34(2):374–8.

Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. 1995. Quantitative monitoring of gene expressionpatterns with a complementary DNA microarray. Science, 270(5235):467–470.

47

Schmid, M., Davison, T. S., Henz, S. R., Pape, U. J., Demar, M., Vingron, M., Scholkopf, B., Weigel,D., and Lohmann, J. U. 2005. A gene expression map of Arabidopsis thaliana development. NatureGenetics, 37(5):501–6.

Schoof, H., Zaccaria, P., Gundlach, H., Lemcke, K., Rudd, S., Kolesov, G., Arnold, R., Mewes, H. W.,and Mayer, K. F. 2002. MIPS Arabidopsis thaliana Database (MAtDB): an integrated biologicalknowledge resource based on the first complete plant genome. Nucleic Acids Research, 30(1):91–3.

Schrader, J., Moyle, R., Bhalerao, R., Hertzberg, M., Lundeberg, J., Nilsson, P., and Bhalerao, R. P.2004a. Cambial meristem dormancy in trees involves extensive remodelling of the transcriptome.Plant Journal, 40(2):173–87.

Schrader, J., Nilsson, J., Mellerowicz, E., Berglund, A., Nilsson, P., Hertzberg, M., and Sandberg, G.2004b. A high-resolution transcript profile across the wood-forming meristem of poplar identifiespotential regulators of cambial stem cell identity. Plant Cell, 16(9):2278–92.

Segerman, B., Jansson, S., and Karlsson, J. 2007. Characterization of genes with tissue-specific differ-ential expression patterns in Populus. Tree Genetics & Genomes, 3(4):351–362.

Seki, M., Narusaka, M., Ishida, J., Nanjo, T., Fujita, M., Oono, Y., Kamiya, A., Nakajima, M., Enju,A., Sakurai, T., Satou, M., Akiyama, K., Taji, T., Yamaguchi-Shinozaki, K., Carninci, P., Kawai, J.,Hayashizaki, Y., and Shinozaki, K. 2002. Monitoring the expression profiles of 7000 Arabidopsis genesunder drought, cold and high-salinity stresses using a full-length cDNA microarray. Plant Journal,31(3):279–292.

Sjodin, A., Bylesjo, M., Skogstrom, O., Eriksson, D., Nilsson, P., Ryden, P., Jansson, S., and Karlsson,J. 2006. UPSC-BASE–Populus transcriptomics online. Plant Journal, 48(5):806–817.

Sjodin, A., Street, N. R., Bylesjo, M., Trygg, J., Gustafsson, P., and Jansson, S. 2007a. A cross-speciestranscriptomics approach to identify genes involved in leaf development. Manuscript.

Sjodin, A., Wissel, K., Bylesjo, M., Trygg, J., and Jansson, S. 2007b. Global expression profiling inleaves of free-growing aspen. Manuscript.

Smith, C., Rodriguez-Buey, M., Karlsson, J., and Campbell, M. 2004. The response of the poplartranscriptome to wounding and subsequent infection by a viral pathogen. New Phytologist, 164(1):123–136.

Smyth, G. 2004. Linear models and empirical Bayes methods for assessing differential expression inmicroarray experiments. Stat Appl Genet Mol Biol, 3(1):Article 3.

Smyth, G. and Speed, T. 2003. Normalization of cDNA microarray data. Methods, 31(4):265–273.

Steffens, N. O., Galuschka, C., Schindler, M., Bulow, L., and Hehl, R. 2004. AthaMap: an onlineresource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. NucleicAcids Research, 32(Database issue):D368–D372.

Steffens, N. O., Galuschka, C., Schindler, M., Bulow, L., and Hehl, R. 2005. AthaMap web tools fordatabase-assisted identification of combinatorial cis-regulatory elements and the display of highlyconserved transcription factor binding sites in Arabidopsis thaliana. Nucleic Acids Research, 33(Web Server issue):W397–W402.

Stein, L. D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J. E.,Harris, T. W., Arva, A., and Lewis, S. 2002. The generic genome browser: a building block for amodel organism system database. Genome Research, 12(10):1599–1610.

Sterky, F., Bhalerao, R. R., Unneberg, P., Segerman, B., Nilsson, P., Brunner, A. M., Charbonnel-Campaa, L., Lindvall, J. J., Tandre, K., Strauss, S. H., Sundberg, B., Gustafsson, P., Uhlen, M.,Bhalerao, R. P., Nilsson, O., Sandberg, G., Karlsson, J., Lundeberg, J., and Jansson, S. 2004. APopulus EST resource for plant functional genomics. Proceedings of the National Academy of Sci-ences, USA, 101(38):13951–13956.

48 BIBLIOGRAPHY

Street, N. R., Skogstrom, O., Sjodin, A., Tucker, J., Rodriguez-Acosta, M., Nilsson, P., Jansson, S., andTaylor, G. 2006. The genetics and genomics of the drought response in Populus. Plant Journal, 48(3):321–341.

Swertz, M. A. and Jansen, R. C. 2007. Beyond standardization: dynamic software infrastructures forsystems biology. Nature Reviews Genetics, 8(3):235–243.

Tarca, A. L., Cooke, J. E., and Mackay, J. 2005. A robust neural networks approach for spatial andintensity-dependent normalization of cDNA microarray data. Bioinformatics, 21(11):2674–83.

Taylor, G. 2002. Populus: arabidopsis for forestry. Do we need a model tree? Annals of Botany, 90(6):681–689.

Taylor, G., Street, N. R., Tricker, P. J., Sjodin, A., Graham, L., Skogstrom, O., Calfapietra, C.,Scarascia-Mugnozza, G., and Jansson, S. 2005. The transcriptome of Populus in elevated CO2.New Phytologist, 167(1):143–154.

Taylor, G., Tricker, P. J., Zhang, F. Z., Alston, V. J., Miglietta, F., and Kuzminsky, E. 2003. Spatialand temporal effects of free-air CO2 enrichment (POPFACE) on leaf growth, cell expansion, and cellproduction in a closed canopy of poplar. Plant Physiology, 131(1):177–185.

The French Italian Public Consortium for Grapevine Genome Characterization. 2007. The grapevinegenome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature.

Toufighi, K., Brady, S. M., Austin, R., Ly, E., and Provart, N. J. 2005. The Botany Array Resource:e-Northerns, Expression Angling, and promoter analyses. Plant Journal, 43(1):153–163.

Troein, C., Vallon-Christersson, J., and Saal, L. H. 2006. An introduction to BioArray Software Envi-ronment. Methods in Enzymology, 411:99–119.

Tschaplinski, T., Tuskan, G., Gebre, G., and Todd, D. 1998. Drought resistance of two hybrid Populusclones grown in a large-scale plantation. Tree Physiology, 18(10):653–658.

Tuskan, G. A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph,S., Rombauts, S., Salamov, A., Schein, J., Sterck, L., Aerts, A., Bhalerao, R. R., Bhalerao, R. P.,Blaudez, D., Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell, M., Carlson, J., Chalot, M.,Chapman, J., Chen, G.-L., Cooper, D., Coutinho, P. M., Couturier, J., Covert, S., Cronk, Q., Cun-ningham, R., Davis, J., Degroeve, S., Dejardin, A., Depamphilis, C., Detter, J., Dirks, B., Dubchak,I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K., Goodstein, D., Gribskov, M., Grimwood, J.,Groover, A., Gunter, L., Hamberger, B., Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D., Holt,R., Huang, W., Islam-Faridi, N., Jones, S., Jones-Rhoades, M., Jorgensen, R., Joshi, C., Kangas-jarvi, J., Karlsson, J., Kelleher, C., Kirkpatrick, R., Kirst, M., Kohler, A., Kalluri, U., Larimer, F.,Leebens-Mack, J., Leple, J.-C., Locascio, P., Lou, Y., Lucas, S., Martin, F., Montanini, B., Napoli,C., Nelson, D. R., Nelson, C., Nieminen, K., Nilsson, O., Pereda, V., Peter, G., Philippe, R., Pilate,G., Poliakov, A., Razumovskaya, J., Richardson, P., Rinaldi, C., Ritland, K., Rouze, P., Ryaboy, D.,Schmutz, J., Schrader, J., Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai, C.-J.,Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S., Yang, G., Yin, T., Douglas, C.,Marra, M., Sandberg, G., de Peer, Y. V., and Rokhsar, D. 2006. The genome of black cottonwood,Populus trichocarpa (Torr. & Gray). Science, 313(5793):1596–1604.

Tuskan, G. and Walsh, M. 2001. Short-rotation woody crop systems, atmospheric carbon dioxide andcarbon management: A US case study. Forestry Chronicle, 77(2):259–264.

Twigger, S. N., Shimoyama, M., Bromberg, S., Kwitek, A. E., Jacob, H. J., and Team, R. G. D. 2007. TheRat Genome Database, update 2007–easing the path from disease to data and back again. NucleicAcids Research, 35(Database issue):D658–D662.

van den Berg, A. K. and Perkins, T. D. 2004. Evaluation of a portable chlorophyll meter to estimatechlorophyll and nitrogen contents in sugar maple (Acer saccharum Marsh.) leaves. Forest Ecologyand Management, 200(1-3):113–117.

49

Velculescu, V. E., Zhang, L., Vogelstein, B., and Kinzler, K. W. 1995. Serial analysis of gene expression.Science, 270(5235):484–487.

Vinciotti, V., Khanin, R., D’Alimonte, D., Liu, X., Cattini, N., Hotchkiss, G., Bucca, G., de Jesus, O.,Rasaiyaah, J., Smith, C. P., Kellam, P., and Wit, E. 2005. An experimental evaluation of a loopversus a reference design for two-channel microarrays. Bioinformatics, 21(4):492–501.

Ware, D., Jaiswal, P., Ni, J., Pan, X., Chang, K., Clark, K., Teytelman, L., Schmidt, S., Zhao, W.,Cartinhour, S., McCouch, S., and Stein, L. 2002a. Gramene: a resource for comparative grassgenomics. Nucleic Acids Research, 30(1):103–105.

Ware, D. H., Jaiswal, P., Ni, J., Yap, I. V., Pan, X., Clark, K. Y., Teytelman, L., Schmidt, S. C., Zhao,W., Chang, K., Cartinhour, S., Stein, L. D., and McCouch, S. R. 2002b. Gramene, a tool for grassgenomics. Plant Physiology, 130(4):1606–1613.

Wilson, D. L., Buckley, M. J., Helliwell, C. A., and Wilson, I. W. 2003. New normalization methods forcDNA microarray data. Bioinformatics, 19(11):1325–1332.

Winter, D., Vinegar, B., Nahal, H., Ammar, R., Wilson, G. V., and Provart, N. J. 2007. An ’electronicfluorescent pictograph’ browser for exploring and analyzing large-scale biological data sets. PLoSONE, 2(1):e718.

Wissel, K., Pettersson, F., Berglund, A., and Jansson, S. 2003. What affects mRNA levels in leaves offield-grown aspen? A study of developmental and environmental influences. Plant Physiology, 133(3):1190–7.

Wittkopp, P. J. 2007. Variable gene expression in eukaryotes: a network perspective. Journal ofExperimental Biology, 210(Pt 9):1567–1575.

Wullschleger, S. D., Jansson, S., and Taylor, G. 2002. Genomics and forest biology: Populus emergesas the perennial favorite. Plant Cell, 14(11):2651–2655.

Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. 2002a. Comparison of methods for imageanalysis on cDNA microarray data. Journal of Computational and Graphical Statistics, 11(1):108–136.

Yang, Y. H., Buckley, M. J., and Speed, T. P. 2001. Analysis of cDNA microarray images. Briefings inBioinformatics, 2(4):341–349.

Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J., and Speed, T. P. 2002b. Normalizationfor cDNA microarray data: a robust composite method addressing single and multiple slide systematicvariation. Nucleic Acids Research, 30(4):e15.

Yip, A. and Horvath, S. 2007. Gene network interconnectedness and the generalized topological overlapmeasure. BMC Bioinformatics, 8:22.

Yu, J., Hu, S., Wang, J., Wong, G. K.-S., Li, S., Liu, B., Deng, Y., Dai, L., Zhou, Y., Zhang, X., Cao,M., Liu, J., Sun, J., Tang, J., Chen, Y., Huang, X., Lin, W., Ye, C., Tong, W., Cong, L., Geng, J.,Han, Y., Li, L., Li, W., Hu, G., Huang, X., Li, W., Li, J., Liu, Z., Li, L., Liu, J., Qi, Q., Liu, J.,Li, L., Li, T., Wang, X., Lu, H., Wu, T., Zhu, M., Ni, P., Han, H., Dong, W., Ren, X., Feng, X.,Cui, P., Li, X., Wang, H., Xu, X., Zhai, W., Xu, Z., Zhang, J., He, S., Zhang, J., Xu, J., Zhang, K.,Zheng, X., Dong, J., Zeng, W., Tao, L., Ye, J., Tan, J., Ren, X., Chen, X., He, J., Liu, D., Tian, W.,Tian, C., Xia, H., Bao, Q., Li, G., Gao, H., Cao, T., Wang, J., Zhao, W., Li, P., Chen, W., Wang,X., Zhang, Y., Hu, J., Wang, J., Liu, S., Yang, J., Zhang, G., Xiong, Y., Li, Z., Mao, L., Zhou, C.,Zhu, Z., Chen, R., Hao, B., Zheng, W., Chen, S., Guo, W., Li, G., Liu, S., Tao, M., Wang, J., Zhu,L., Yuan, L., and Yang, H. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. indica).Science, 296(5565):79–92.

50 BIBLIOGRAPHY

Yu, J., Wang, J., Lin, W., Li, S., Li, H., Zhou, J., Ni, P., Dong, W., Hu, S., Zeng, C., Zhang, J., Zhang,Y., Li, R., Xu, Z., Li, S., Li, X., Zheng, H., Cong, L., Lin, L., Yin, J., Geng, J., Li, G., Shi, J., Liu,J., Lv, H., Li, J., Wang, J., Deng, Y., Ran, L., Shi, X., Wang, X., Wu, Q., Li, C., Ren, X., Wang, J.,Wang, X., Li, D., Liu, D., Zhang, X., Ji, Z., Zhao, W., Sun, Y., Zhang, Z., Bao, J., Han, Y., Dong,L., Ji, J., Chen, P., Wu, S., Liu, J., Xiao, Y., Bu, D., Tan, J., Yang, L., Ye, C., Zhang, J., Xu, J.,Zhou, Y., Yu, Y., Zhang, B., Zhuang, S., Wei, H., Liu, B., Lei, M., Yu, H., Li, Y., Xu, H., Wei, S.,He, X., Fang, L., Zhang, Z., Zhang, Y., Huang, X., Su, Z., Tong, W., Li, J., Tong, Z., Li, S., Ye, J.,Wang, L., Fang, L., Lei, T., Chen, C., Chen, H., Xu, Z., Li, H., Huang, H., Zhang, F., Xu, H., Li, N.,Zhao, C., Li, S., Dong, L., Huang, Y., Li, L., Xi, Y., Qi, Q., Li, W., Zhang, B., Hu, W., Zhang, Y.,Tian, X., Jiao, Y., Liang, X., Jin, J., Gao, L., Zheng, W., Hao, B., Liu, S., Wang, W., Yuan, L., Cao,M., McDermott, J., Samudrala, R., Wang, J., Wong, G. K.-S., and Yang, H. 2005. The Genomes ofOryza sativa: a history of duplications. PLoS Biol, 3(2):e38.

Zabel, B., Hawes, P., Stuart, H., and Marino, B. D. V. 1999. Construction and engineering of a createdenvironment: Overview of the Biosphere 2 closed system. Ecological Engineering, 13(1-4):43–63.

Zhang, B., Pan, X., and Anderson, T. A. 2006. MicroRNA: a new player in stem cells. Journal ofCellular Physiology, 209(2):266–269.

Zhang, B., Wang, Q., and Pan, X. 2007. MicroRNAs and their regulatory roles in animals and plants.Journal of Cellular Physiology, 210(2):279–289.

Zhang, B. and Horvath, S. 2005. A general framework for weighted gene co-expression network analysis.Statistical Applications in Genetics and Molecular Biology, 4:Article17.

Zimmermann, P., Hennig, L., and Gruissem, W. 2005. Gene-expression analysis and network discoveryusing Genevestigator. Trends in Plant Science, 10(9):407–409.

Zimmermann, P., Hirsch-Hoffmann, M., Hennig, L., and Gruissem, W. 2004. GENEVESTIGATOR.Arabidopsis microarray database and analysis toolbox. Plant Physiology, 136(1):2621–2632.

Zimmermann, P., Schildknecht, B., Craigon, D., Garcia-Hernandez, M., Gruissem, W., May, S., Mukher-jee, G., Parkinson, H., Rhee, S., Wagner, U., and Hennig, L. 2006. MIAME/Plant - adding value toplant microarrray experiments. Plant Methods, 2:1.