Genetic Characteristics of Korean Patients with Autosomal...

저 시-비 리- 경 지 2.0 한민

는 아래 조건 르는 경 에 한하여 게

l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.

다 과 같 조건 라야 합니다:

l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.

l 저 터 허가를 면 러한 조건들 적 되지 않습니다.

저 에 른 리는 내 에 하여 향 지 않습니다.

것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.

Disclaimer

저 시. 하는 원저 를 시하여야 합니다.

비 리. 하는 저 물 리 목적 할 수 없습니다.

경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/legalcode

http://creativecommons.org/licenses/by-nc-nd/2.0/kr/

의학박사 학위논문

Genetic Characteristics of Korean

Patients with Autosomal Dominant

Polycystic Kidney Disease by

Targeted Exome Sequencing

2019년 4월

서울대학교 대학원

임상의과학과 임상약리학

김 현 숙

Genetic Characteristics of Korean




지도교수 장 인 진

이 논문을 김현숙 박사 학위논문으로 제출함

2019년 4월

서울대학교 대학원

임상의과학과 임상약리학

김 현 숙

김현숙의 박사 학위논문을 인준함

2019년 월

위 원 장 (인)

부 위 원 장 (인)

위 원 (인)

위 원 (인)

위 원 (인)

- I -

Abstract




Hyunsuk Kim

Department of Clinical Medical Sciences

The Graduate School

Seoul National University

Autosomal dominant polycystic kidney disease (ADPKD) is a monogenic

disorder that can affect the kidneys as well as the liver and one of the main

causes of end-stage renal disease (ESRD). Genetic information is of the utmost

importance in understanding pathogenesis of ADPKD. Therefore, this study

aimed to demonstrate the genetic characteristics of ADPKD and their effects on

kidneys and liver in 749 Korean ADPKD subjects from 524 unrelated families.

Genetic studies of PKD1/2 were performed using targeted exome sequencing

combined with Sanger sequencing in exon 1 of the PKD1 gene and a multiple

ligation probe assay. Total kidney volume was calculated by ellipsoid method,

total liver volume was measured by stereology and adjusted for height (htTKV,

htTLV). Using htTLV, disease severity was classified as no or mild (htTLV

<1,600 mL/m), moderate (htTLV 1,600-2,400 mL/m), and severe PLD (htTLV>

2,400 mL/m). Associations between the severity of PKD or PLD and the type

and position of genetic mutations were analyzed using a multilevel model

controlling for family-level clustering (within-family correlations). The mutation

- II -

detection rate was 80.7% (423/524 families, 331 mutations) and 70.7% was novel.

PKD1 protein-truncating (PKD1-PT) genotype was associated with younger

age at diagnosis, larger htTKV, lower renal function compared to

PKD1-non-truncating and PKD2 genotypes. The PKD1 genotype showed

earlier onset of ESRD compared to PKD2 genotype (53.3[46.7, 58.7] vs. 63.1[54.0,

67.5] years old, P=0.003). In frailty model controlled for age, gender, and familial

clustering effect, PKD2 genotype had 0.2 times lower risk for reaching ESRD

than PKD1-PT genotype (p=0.037).

As for liver phenotype, severe PLD showed more rapid progression to

end-stage renal disease and higher mortality. Sex, age, multiple pregnancies, and

female sex hormone use were associated with moderate to severe PLD.

Family-level clustering effects existed in moderate to severe PLD implying

genetic link to PLD phenotype, however, unlike kidney, PLD severity did not

correlate with PKD1/2 genotype but the frequency of No Mutation was

increased only in mild PLD group. Additionally, the C-type lectin domain might

be a potential warm spot for moderate to severe PLD in both PT and

non-truncating PKD1 mutations.

In conclusion, the results suggest that genotyping can contribute to selecting

rapid progressors of kidneys for new emerging therapeutic interventions among

Koreans. However, additional genetic effects on PLD have to be investigated.

……………………………………………………………………………………………

keywords : Autosomal dominant polycystic kidney disease,

targeted exome sequencing, rapid progressor, moderate to severe

polycystic liver disease, Genetic variation, phenotype

Student Number : 2015-30809

- III -

CONTENTS

Abstract·…………………………………………………………………Ⅰ

Contents…………………………………………………………………III

List of tables and figures……………………………………………IV

1st part: Kidney

Introduction………………………………………………………………1

Results……………………………………………………………………3

Discussione………………………………………………………………7

Methods…………………………………………………………………12

Figure Legends and Tables·…………………………………………20

Supplementary information……………………………………………27

2nd part: Liver

Introduction………………………………………………………………67

Results……………………………………………………………………68

Discussion·………………………………………………………………74

Methods·…………………………………………………………………79

Figure Legends and Tables·…………………………………………82

Supplementary information……………………………………………90

References··……………………………………………………………99

Abstract(Korean)····…………………………………………………109

- IV -

LIST OF TABLES AND FIGURES

1st part: Kidney

Table 1. Frequency of each class and grade in 524 families………………22

Table 2. Baseline Clinical Characteristics According to PKD Genotypes…………23

Table 3. A. Multilevel multivariate linear regression of log(htTKV) or eGFR

according to genotype B. Multivariate linear regression of log(htTKV) or eGFR

for comparing each slope according to genotype……………………………………24

Supplementary Table S1. Demographic Findings of 749 Patients (524 Families)…27

Supplementary Table S2. Mutations found in this study……………………………28

Supplementary Table S2A. PKD1 protein-truncating mutations in study families·…28

Supplementary Table S2B. PKD1 in-frame insertion/deletions in study families……41

Supplementary Table S2C. PKD1 non-truncating mutations in study families………42

Supplementary Table S2D. PKD2 protein-truncating mutations in study families·…47

Supplementary Table S2E. PKD2 non-truncating mutations in study families………50

Supplementary Table S3. Design of Custom Bait·……………………………………51

Supplementary Table S4. Quality of Targeted Exome Sequencing·………………51

Supplementary Table S5. Definition of Grades··………………………………………52

Supplementary Table S6. Results from Validation Study(n=80)……………………53

Supplementary Table S7. MLPA validation using Long Range PCR sequencing…56

Figure 1. Summary of identified mutations……………………………………………25

Figure 2. Cross-sectional analysis between age and log(htTKV) or eGFR A.

Scatter plot and trend line of log (htTKV) according to genotype. B. Scatter

plot and trend line of log (htTKV) according to genotype…………………………26

- V -

Figure 3. Renal outcomes and death according to type of PKD1/2 mutation. A.

Survival analysis of ESRD progression. B. Survival analysis of all-cause

mortality………………………………………………………………………………………26

Supplementary Figure S1·…………………………………………………………………58







Supplementary Figure S8 Inverse relationship of GC content and the depth of

coverage for PKD1…………………………………………………………………………·65

Supplementary Figure S9. Normalized read depth in PKD1 and PKD2 of

mutations that detected via MLPA………………………………………………………66

2st part: Liver

Table 1. Frequency of each class and grade in 524 families………………84

Table S1. Distribution of PKD genotype according to liver cysts·…………………93

Table S2. Mutations of moderate to severe PLD families·…………………………95

Table S3. Mutations of moderate to severe PLD males·……………………………95

Figure 1. A. ESRD progression according to PLD severity B. All cause mortality

according to PLD severity…………………………………………………………………86

Figure 2. Impact of female sex hormone on severity of PLD··……………………86

- VI -

Figure 3. A. Distrubution of htTLV among siblings B. htTLV of siblings

excluding the subjects having highest htTLV C. Distrubution of htTLV among

siblings excluding famales over 45 years D. htTLV of siblings excluding the

subjects of highest htTLV and famales over 45 years·…………………………… 87

Figure 4. A. Location of mutation of PKD1 in moderate to severe PLD B.

Location of mutation of PKD1 in moderate to severe PLD…………………………89

Figure S1. Distribution of htTLV according to age and sex·………………………90

Figure S2. Distribution patterns of PKD genotypes according to severity………91

Figure S3. A. htTLV according to sex and class of PKD gene B. htTLV

according to sex and class of PKD gene………………………………………………92

Figure S4. Distribution patterns of PKD genotypes according to no liver cyst

and liver cyst groups·………………………………………………………………………94

Figure S5. Mutations of moderate to severe PLD males……………………………97

Figure S6. Diagram of study patient selection in HOPE PKD cohort……………98

1

1stpart: Kidney

INTRODUCTION

Autosomal dominant polycystic kidney disease (ADPKD) is a potentially lethal

monogenic disorder characterized by numerous cysts in the kidneys and

progressive renal failure1. Since the disease is caused by a mutation either in

PKD1 or PKD2, genetic information is of the utmost importance in

understanding the pathogenesis of ADPKD.

The ADPKD mutation database (PKDB, http://pkdb.mayo.edu/) is the largest

international repository for variants found in PKD genes. By September 2018,

1895 pathogenic mutations in the PKD1 gene and 438 in the PKD2 gene were

registered in the PKDB. However, variants registered in the PKDB mainly came

from the results of genetic studies in Western population. There have been only

a few reports that presented genetic profiles on Asian ADPKD patients, since

analyzing PKD1/2 mutations has been difficult in clinical practice. First,

two-thirds of the 5’ region of PKD1 are duplicated by gene conversion (GC) on

chromosome 16 within six pseudogenes (PKD1P1-P6)2. Because these

pseudogenes have a sequence that is 97.7% identical to the original PKD1 gene

sequence, cloning and sequencing PKD1 has been difficult with direct

sequencing. Moreover, high allelic heterogeneity of both PKD1 and PKD2

makes molecular diagnosis of ADPKD more challenging. Previously, indirect

methods such as two-dimensional gene scanning, denaturing gradient gel

electrophoresis, and single-strand conformation polymorphism analysis have been

applied to find pathogenic mutations3-6. However, these indirect methods

demonstrated false-positive results with a low detection rate compared to direct

sequencing. Therefore, a long-range polymerase chain reaction (PCR) with

subsequent nested PCR was developed to isolate the entire PKD1 gene without

2

interference from the six pseudogenes (PKD1P1-P6)7. However, direct Sanger

sequencing is time-consuming and costly. Therefore, there has been an urgent

need for faster and more efficient way to detect mutations in ADPKD.

Targeted exome sequencing (TES) using next-generation sequencing (NGS) is

one of the most suitable methods to study monogenic disorders such as

ADPKD. It has advantages over Sanger sequencing including that a large

amount of data can be analyzed automatically in a short period of time and can

be done cheaply. In addition, researchers can focus on a few genes and exons to

discover mutation spots. Therefore, TES, rather than whole exome sequencing

or whole genome sequencing, has been used more popularly as a genetic

screening tool to detect mutations in ADPKD. Since TES became available in

clinical practice for genetic diseases, the detection rate gradually improved 7.

The importance of genotype on clinical outcome has been demonstrated from

many previous studies. Cornec-Le Gall et al.8 reported the influence of PKD1

mutation type on renal survival. HALT-PKD and CRISP Investigators showed

the difference of renal survival in PKD1 NT carriers according to their

conservation and chemical change9. Previously, anecdotal report from Asian

countries reported novel mutations in Asian ADPKD populations 10-12. However,

they were either case series or included small samples. No well-designed cohort

studies have been conducted to evaluate genetic profiles and genotype-

phenotype associations in Asian populations. The coHOrt for genotype-

PhenotypE correlation in ADPKD (HOPE-PKD) is a large prospective,

multicenter Korean ADPKD cohort constructed to investigate the natural course

of the disease and genotype-phenotype associations. Demographic and clinical

information of the subjects, as well as family trees, is collected, and specimens

including DNA, serum, and urine sample are collected and stored from the

subjects once a year13. To describe the genetic characteristics of ADPKD in

3

Koreans, I performed a cohort-based genetic analysis using the HOPE-PKD

cohort.

RESULTS

Demographic Characteristics of 749 Subjects of the Korean HOPE-PKD

Cohort

Among 749 subjects, 360 (48.1%) were males and 389 (51.9%) were females.

The mean age was 46.4 ± 13.3 years (male, 44.6 ± 14.4; female, 48.0 ± 12.0; P

value for gender difference, <0.001). The mean estimated glomerular filtration

rate (eGFR) was 65.8 ± 38.9 mL/min/1.73m2 (male, 65.6±39.4; female, 66.0±38.5;

P value for gender difference, 0.896, see Supplementary Table S1 online).

Comprehensive Genetic Analysis by Combining TES, PKD1 Exon

1-Specific Sanger Sequencing, Multiple Ligation Probe Assay, and

Familial Segregation Analysis

Of the entire sample of 524 families, variants were found by TES in 471

families (89.9%). Among them, definitively pathogenic (DP), highly likely

pathogenic (HLP), or likely pathogenic (LP) variants were categorized as

mutations. Thus, mutations were found in 409 families. In 62 families with likely

neutral (LN) or indeterminate (I) variants and 53 families with no variants

(NV), subsequent PKD1 exon 1 Sanger sequencing and/or multiple ligation

probe assay (MLPA) were carried out. In LN or I families, six mutations were

additionally detected through PKD1 exon 1 Sanger sequencing, and three

mutations were detected through MLPA. In NV families, one and four mutations

were found through PKD1 exon 1 Sanger sequencing and MLPA, respectively.

Among 129 families with novel LP variants found by TES, 88 families

underwent additional familial segregation analysis. Forty families were

4

determined to have HLP mutations, and 2 families were proven to have LN

mutations. Thus, the mutation detection rate was 80.7% (423/524 families) by

TES, PKD1 exon 1 Sanger sequencing, MLPA, and familial segregation analysis

(see Supplementary Fig. S1 online).

Baseline Clinical Characteristics According to PKD Genotypes and

Summary of Identified Mutations

A total of 364 variants were found in 476 of the 524 families, and of these,

331 mutations were detected among 423 families in the HOPE-PKD cohort

(Figure 1). Among the 749 subjects, 68% had PKD1 variants, 16% had PKD2

variants, and 16% were classified as having no mutations (NM) (Figure 1A).

Excluding NM, 81% of the 749 subjects and 82% of the 524 families had PKD1

mutations (Figure 1B). Figure 1C shows the distribution of mutations by class

among the 630 subjects with mutations. The frameshift mutation (39%) was the

most common mutation class in the PKD1 genotype, while nonsense mutations

(52%) were most common in the PKD2 genotype. The prevalence of classes of

variants (number of families that shared the same variants) among the 524

families was as follows: PKD1: nonsense, 86 (19); frameshift 136 (24), missense,

90 (17); PKD2: nonsense, 39 (20); frameshift, 11 (0); missense, 12 (4) (Figure

1D, Left, Table 1). The prevalence of the grades of the variants (number of

families that shared the same variants) in the 524 families was as follows:

PKD1: DP, 248 (46); HLP, 61 (20); and LP (0), 39; PKD2, DP, 63 (22); HLP, 10

(4); and LP, 2 (0) (Figure 1D, Right, Table 1).

Comparison of Mutation Frequency in Affected Polycystin Domains

Between HOPE-PKD and other Cohorts

5

Among the 331 mutations, 97 were registered in either the PKDB or Toronto

Genetic Epidemiology Study of PKD14. Therefore 70.7% (234/331) of the

mutations were novel (see Supplementary Table S2 online). Mutations were

found more frequently in the C-type lectin and G protein-coupled receptor

proteolytic site (GPS) domains of polycystin-1 in the Korean HOPE-PKD cohort

than in those previously reported from PKDB. Likewise, mutations were detected

more frequently in the linker domain of polycystin-2 in my patients compared to

those in the PKDB. The standardized residual of the Chi-square test (cut-off of

significance, 2.58) was 2.8 in the C-type lectin of polycystin-1 and the linker

domain of polycystin-2, and 2.6 in the GPS domain of polycystin-1 (see

Supplementary Fig. S2 online).

Baseline Clinical Characteristics According to PKD Genotypes

For genotype-phenotype analysis, PKD1 or PKD2 mutations were further

classified into four PKD genotypes according to the previously described manner

14; PKD1 protein-truncating (PKD1-PT), PKD1 small (fewer than five amino

acids) in-frameshift indels (PKD1-ID), PKD1 non-truncating (PKD1-NT), and

PKD2 genotypes. A total of 630 patients in 423 families after excluding NM

were re-classified into four genotype groups: PKD1-PT, 371 (59%); PKD1-ID,

21 (3%); PKD1-NT, 119 (19%); PKD2, 119 (19%). PKD1-PT genotype was

associated with younger age at the first visit (40.2 ± 12.9), earlier onset of

hypertension (37.3 ± 10.1), younger ESRD age (52.5 ± 9.5), larger

height-adjusted total kidney volume (htTKV) (898 [476, 1546], mL/m), and lower

eGFR (63.5 ± 42.2, mL/min/1.73 m2) compared to other genotypes. The median

follow-up duration was 77.3 months (Table 2).

6

As the Mayo imaging classification changed from 1A to 1E, the proportion of

PKD1-PT genotype increased. Conversely, as the genotype changed from PKD2

to PKD1-PT, the proportion of Mayo imaging classification 1C-E increased (P

<0.001 by linear by linear test, see Supplementary Fig. S3A online). During 77.3

months of follow-up, the intraclass correlation of individuals was 0.84 [0.82,

0.86] (See Supplementary Fig. S3B online). Therefore, the Mayo imaging

classification of each subject was mostly retained over time.

HtTKV and eGFR According to PKD Genotypes

When log(htTKV) was plotted according to age, the PKD1-PT genotype

showed a similar slope to PKD1-ID and PKD2 genotypes (Figure 2A and

Table 3B). However, the PKD1-PT and PKD1-ID genotypes showed larger

htTKV values compared to PKD2 genotype (Table 3A). Overall, the PKD1-PT

and PKD2 genotypes showed a steeper slope than the PKD1-NT genotype and

NM group (Table 3B). In other words, the log(htTKV) in the PKD1-PT and

PKD2 genotypes increased rapidly as age increased, but the log(htTKV) of the

PKD1-NT genotype and NM group increased less intensely. The PKD1-PT

genotype also showed a steeper slope of eGFR than the other genotype groups,

except the PKD1-ID genotype (Figure 2B and Table 3B).

Multilevel multivariate linear regression analysis demonstrated that the

PKD1-PT genotype had a larger htTKV than the PKD1-NT or PKD2

genotypes after adjusting for age, gender, and the familial clustering effect

(Figure 2B and Table 3A). The family-level clustering effect was statistically

significant (P=0.001). The regression model of eGFR showed that the PKD1-PT

genotype was associated with a lower eGFR than PKD1-NT or PKD2

7

genotypes (Figure 2B and Table 3A). The family-level clustering effect was

also significant (P=0.029).

Incidence of ESRD and Death According to PKD Genotypes

The PKD1 genotype showed an earlier onset of ESRD than PKD2 genotype

(53.3[46.7, 58.7] vs. 63.1[54.0, 67.5], P=0.003) (see Supplementary Fig. S4 online).

Among the 630 patients, with the exclusion of the NM subjects (n=119) from

the total of 749 subjects, 122 ESRD events were observed during follow-up

(Table 2). The ESRD incidence was 21.7% (110/508) in PKD1 and 10% (12/119)

in PKD2 (P=0.003). The incidence of ESRD was lower in the PKD2 genotype

than for other genotypes (PKD2, 12 (10.1%) vs. PKD1-PT, 84 (22.6%);

PKD1-ID, 4 (19.0%); PKD1-NT, 22 (18.5%), P for trend = 0.004). A multi-level

multivariate Cox proportional hazard (frailty) model was used to compare the

effect of genotype on ESRD. When PKD1-PT genotype was used as a

reference, PKD2 genotype showed a 0.22 times lower hazard ratio than

PKD1-PT genotype. The familial clustering effect was significant (P=0.037)

(Figure 3A).

There were 23 death events during follow up. Either incidence of death or age

of death did not show a significant difference among genotypes (Table 2). In

the frailty model, the incidence was too small to find any significant difference

according to genotypes (Figure 3B).

DISCUSSION

As new therapeutic agents became available as disease-course modifying

interventions, understanding the natural course of ADPKD and identifying

8

high-risk candidates for new clinical interventions have become important.

Although defining genetic variations and their associations with phenotypes is

the first step to understand the natural course of genetic diseases, there have

been only a few reports on genetic profiles of Asian ADPKD populations4,10,11,15.

In this study, I investigated the genetic characteristics of PKD1/2 mutations

using a large prospective Korean ADPKD cohort. In brief, 331 mutations were

detected by TES using NGS combined with PKD1 exon 1 Sanger sequencing

and MLPA. The detection rate of mutations was 80.7% (423/524 families).

Among them, 70.7% of mutations were novel. The prevalence of PKD1 and

PKD2 among 524 families was 84% and 16%, respectively. Mutations were

more frequently found in the C-type of lectin and GPS domains of polycystin-1

and the linker domain of polycystin-2 among the Korean ADPKD population

than among Western populations.

TES using NGS was applied in this study. There was a concern about the

decreased efficacy of TES in mutation detection because of large amount of

pseudogenic lesion in PKD1. However, demonstrated from the milestone study

by Trujillano and colleagues16, NGS was not inferior in mutation detection and

could be a possible substitute for Sanger sequencing because Sanger sequencing

itself is expensive and time consuming. NGS is economically feasible,

convenient, and has the possibility of automation. However, low or high GC

content can reduce the mutation detection rate by NGS by affecting the probe

hybridization and PCR amplification steps. The false negative rate is expected to

be elevated for PKD1 exon 1, which has many GC-rich regions. Moreover,

large deletions can easily be missed by NGS. In this study, I tried to

compensate for these limitations by exon-specific Sanger sequencing of PKD1

exon 1 and MLPA. However, the detection rate of in my study was lower than

that with Sanger sequencing with long-range PCR11. One possible reason for the

9

low detection rate is that I performed target enrichment with whole genomic

DNA instead of pure PKD1 and PKD2 genomic regions. A previous study

suggested using PCR to generate locus-specific amplicons of PKD1 before

sequencing. Therefore, refining the NGS pipeline to increase sequencing depth

through locus-specific PCR amplicon enrichment should be considered to improve

the detection rate. Another possible explanation is that my population may be

enriched with novel Asian-specific missense variants with unknown significance.

Therefore, further functional analysis and Sanger sequencing of PKD1/2 should

be considered in the NM cases to further improve the detection rate.

The proportions of PKD1 and PKD2 were similar to those from previous

data7. However, the mean age of ESRD in Korean patients with ADPKD was

52.6 years for the PKD1 genotype and 60.5 for the PKD2 genotype, which

seemed to be younger than that of Westerners, especially for PKD2 subjects.

This discrepancy may be related to selection bias, because my study only

included ADPKD patients from tertiary hospitals. This means that milder cases

who tended to visit primary clinics could have been excluded from this analysis.

Nonetheless, further study is needed to evaluate factors causing faster renal

progression in my ADPKD patients compared to those from other ethnic

populations.

Excluding NM, 331 mutations were ultimately detected in the current study.

When compared with those reported in other large cohort studies, the

distribution of the mutation classes was similar to those presented in previous

studies. Interestingly, my study revealed a high proportion of novel variants.

This can be explained by the founder effect or repetitive mutations, as a recent

Chinese study also showed similar findings (personal communication).

As part of that effort, I compared the frequency of mutations in certain

domains of polycystin-1 or polycystin-2 with the results from the PKDB for the

10

C-type lectin and GPS domains of polycystin-1 and the linker domain of

polycystin-2. The C-type lectin domain of polycystin-1 (protein product of

PKD1) binds carbohydrate matrices in vitro and may be involved in protein

carbohydrate interactions in vivo. Polycystin-1 undergoes cleavage at the

juxtamembrane GPS, a process likely to be essential for its biological activity.

This GPS is essential for the intracellular transport. Finally, the interaction of

polycystin-2 with polycystin-1 is mediated by the C-terminal coiled-coil domain

of polycystin-2, and the linker region between the EF-hand and coiled-coil

domains is not sufficient to mediate the association by itself. The presence of

the N-terminal linker to the polycystin-2 coiled-coil domain allows a tighter

association. The implications of these findings for the prognosis for Asian

patients with ADPKD should be clarified in the future.

This study enrolled patients with typical ADPKD imaging findings if the

subjects had no familial history. Therefore, the NM group seems to be an entity

that shares other clinico-genetic characteristics. Cases of NM were not directly

included in the analysis, but most of the patients with NM showed similar or

milder clinical features compared to PKD2 genotype, suggesting that they are

groups with mutations in regions such as introns or mosaicism, which are

difficult to detect using TES. Additionally, Other genes (i.e. GANAB, DNAJB11)

involved in atypical presentations of ADPKD could have been part of the

targeted NGS approach. Further studies to clarify the meaning of NM are

warranted.

I found families that had two or more novel missense variants. In the absence

of a functional assay, assessing the putative pathogenicity of missense variants

is not reliable. To validate the clinical pathogenicity of missense variants, I

analyzed family-based segregation studies including at least one affected and/or

one unaffected member in each family. In most of the cases, family-based

11

segregation analysis was very helpful to discriminate the pathogenicity of

missense variants. However, for a complete segregation study, it is necessary to

investigate all household members accessible through the family tree. Complete

segregation can be determined especially clearly when the affected subject is

examined together with both parents, which is not always possible. When the

affected subject is examined with only his/her siblings or children, the power to

completely discriminate the pathogenicity of the variant may be lowered.

However, I made every effort to overcome these limitations by enrolling

additional families until the pathogenicity of the variants was clearly determined.

My study showed that PKD1-PT genotype was the most common genotype

(59%). The distribution of each genotype was similar to what has been reported

in previous studies7. Subjects with the PKD1-PT genotype had an early onset

of hypertension, larger htTKV, lower eGFR, and more frequent ESRD incidence

than other genotypes. The family-level clustering effect for log(htTKV) and

ESRD onset was significant. However, it was also noteworthy that there existed

discrepancies between genotypes and phenotypes. For example, the proportions of

mild phenotypes of Mayo class 1A and 1B were 7% and 25% in individuals with

the PKD1-PT genotype, which suggests that other modifying factors may exist.

A strength of the current study is that it is unique as it represents the first

large cohort analysis of an Asian population. This study reveals many novel

mutations to the PKD1 and PKD2 genes as well as novel mutation enrichment

sites within the ADPKD proteins. My study made efforts to overcome the

shortcomings of NGS by performing PKD1 exon-specific Sanger sequencing,

MLPA, and a familial segregation study. In addition, the HOPE-PKD cohort is

fairly well designed and has a substantial number of subjects. Nonetheless, there

are some limitations. Since this study was conducted on patients in tertiary

hospitals in South Korea, there is a possibility of selection bias that only severe

12

patients come to the hospital. Thorough evaluation and collection of family

history was difficult since not all the members were registered at the hospital.

Therefore, it was difficult to investigate the exact family-level clustering effect.

Moreover, I could not find familial clustering effects for ESRD or mortality,

because of the small number of events. Due to the lost to follow-up of the

patient and the absence of DNA samples, I were unable to perform PKD1/PKD2

LR-PCR/Sanger sequencing for the subjects with NM. Therefore, I could not

present the exact specificity parameter of NGS. In addition, I used the ellipsoid

equation to measure TKV instead of stereology, which is less accurate.

In conclusion, I performed a comprehensive genotype study using one of the

largest Asian ADPKD cohorts, HOPE-PKD, and demonstrated the characteristics

of variants and mutations in Korean patients. The results also strongly suggest

that an analysis of the ADPKD genotype can contribute to the selection of

rapidly progressing patients for new emerging therapeutic interventions. In

regard to the prognostic score combining genetic and clinical information, the

PRO-PKD score has been developed earlier for the European. Likewise, the

development of a score suitable for Koreans is deemed necessary in future

studies.

METHODS

Subjects

A total of 866 subjects from 641 unrelated families registered in the

HOPE-PKD cohort between 2009 and 2016 were screened. All methods were

carried out in accordance with relevant guidelines and regulations. The study

protocol was approved by the Institutional Review Board of Seoul National

13

University Hospital (IRB No; 1205-112-411). The informed consent was obtained

from all subjects before performing study.

Subjects aged over 18 who had compatible imaging findings using the Unified

criteria17 with or without family history were enrolled in HOPE-PKD cohort.

Among the screened patients, 117 patients were excluded from the analysis

including those who had active cancer (n=4) or liver cirrhosis (n = 4) and

chronic HBV or HCV hepatitis (n = 18) that may change liver volume or affect

renal function and those who had no kidney or liver volume data available

(n=91, see Supplementary Fig. S5 online). A total of 749 subjects from 524

unrelated families were included in the analysis.

Data Collection

Baseline demographic profiles were collected including age, gender, height,

comorbidities (hypertension, diabetes, chronic liver disease, and cancer), familial

history of ESRD and their age at the diagnosis of ESRD. Hypertension was

considered present when the subject had either systolic blood pressure > 140

mmHg or diastolic blood pressure > 90 mmHg or was receiving treatment with

antihypertensive medications.

Serum creatinine was measured using the Jaffe method and traced using

isotope dilution mass spectrometry. The Chronic Kidney Disease Epidemiology

Collaboration formula was used to calculate eGFR18. ESRD was defined as either

an eGFR < 15 mL/min/1.73m2 or initiation of renal replacement therapy.

Participants underwent non-contrast computed tomography (CT) exams every 2

to 3 years using multi-detector CT scanners. TKV was measured by the

ellipsoid method an dadjusted to height19 by a single skilled technician.

14

Variant Screening of PKD1 and PKD2 by TES

Genetic analysis of the PKD1 and PKD2 genes was performed using TES.

For cases with inadequate results, additional Sanger sequencing for exon 1 of

PKD1 gene and MLPA were done. For determining pathogenicity, familial

segregation analysis was performed.

For TES, a customized 120-mer RNA bait was designed to capture all exons

of the PKD1 and PKD2 genes and their flanking intronic sequences16 (see

Supplementary Table S3 and S4 online). The bait was tiled 3 times to increase

the coverage of poorly covered regions. For the generation of standard target-all

exon capture libraries, the Agilent SureSelect™ target enrichment protocol for

the Illumina paired-end sequencing library (version B.3, June 2015) was used

together with an input of 500 ng of genomic DNA and sequenced using the

HiSeq™ 2500 platform (Illumina, San Diego, CA, USA). In all cases, the

SureSelectXT Custom probe set was used.

Sequencing reads were first aligned with the human genome reference

sequence (hg19) using BWA version 0.7.5a with the MEM algorithm (default

options). To minimize false-positive and false-negative calls, I only used

uniquely mapped reads that were properly paired to avoid mapping reads that

aligned to both target regions and pseudogenes. In addition, only selected reads

with a high mapping quality (>30) were used. SAMTOOLS version 1.2 and

Picard version 1.127 (http://picard.sourceforge.net) were used to process

SAM/BAM files to duplicate the marking. Specifically, local realignment to

reduce any misalignment in the duplicated regions was performed.

I used RealignerTargetCreator and IndelRealigner from GATK version 3.3-1

with known single-nucleotide polymorphisms (SNPs) and indels from dbSNP142,

Mills and 1000G gold-standard indels at hg19 sites, and 1000G phase 1 indels at

15

hg19 sites. Known SNPs and indels were also used to perform a base

calibration. For calling variants, Unified Genotyper was used, and the called

variants were recalibrated by GATK based on dbSNP142, Mills indels, HapMap,

and Omni. ANNOVAR was used to annotate the variants. The depth of

coverage approach was applied to detect any large rearrangement or copy

number variations. The CalculateHsMetrics module in PICARD was used to

calculate the statistics of each parameter of the samples and depth of each exon,

and the copy number ratio was characterized by the normalized depth of the

exons from all the patients (see Supplementary Fig. S6 online).

Prioritization of Detected Variants

To identify causal variants, I first selected exonic and splicing variants

including nonsynonymous variants and small indels. Variants with an allele

frequency over 1% were discarded based on NHLBI-ESP 6500, the 1000 Genome

Project, and an in-house database consisting of exomes of 192 (Samsung

Medical Center, Korea) and 397 (Korean Bioinformation Center, Korea) normal

Korean individuals. Variants that were not reported in dbSNP142 were included,

and those with low-quality reads (<20) and low genotype quality (<30) were

excluded (see Supplementary Fig. S6 online).

The detected variants were classified into seven categories of nonsense,

frameshift, typical splicing, large deletion, missense, small in-frame indel and

NV. Seven classes of variants were further divided into five grades according to

their pathogenicity: DP, HLP, LP, LN, and I. The DP variants were defined as

all the variants that result in protein truncation, including nonsense, frameshift,

typical splicing, and large deletion variants or those previously established as DP

in PKDB or previous articles. The HLP variants were defined as missense

16

variants or small in-frame indels that were previously established as HLP in

PKDB or previous articles. The LP variants were defined as missense variants

or small in-frame indels that were previously established as LP in PKDB or

previous articles or variants that were predicted to be damaging using in-silico

analysis and segregated within a family. Variants were thought to be damaging

if the results of in-silico analysis satisfied two of the following conditions: 1)

damaging as predicted by a SIFT score ≤0.05, 2) damaging as predicted by

Polyphen-2 (HumDiv), and 3) GERP ++ score ≥4. The definitions of grades are

described in Supplementary Table S5 and Supplementary Fig. S7 online.

For genotype-phenotype analysis, PKD1 or PKD2 mutations were further

classified into four PKD genotypes according to the previously described manner

14;PKD1-PT, PKD1-ID, PKD1-NT, and PKD2 genotypes.

Validation of variants detected by TES using LR-PCR/Sanger sequencing

The validation set (n=80) was developed to confirm the variants that was

discovered by NGS approach by LR-PCR/Sanger sequencing. The pathogenic

variants detected by TES were confirmed by direct sequencing of the

corresponding gene regions using the ABI3730xl Genetic Analyzer (Applied

Biosystems, Foster City, CA). For the duplicated part of PKD1, long-range PCR

followed by nested-PCR was performed using PKD1-specific primers. The

results are summarized in the Supplementary Table S6 online.

Among 80 variants, 74 were PKD1 variants and 6 were PKD2. Among 74

PKD1 variants, 70 variants were detected in duplicated region (exon 1-33 in

PKD1 gene). I performed target specific Sanger sequencing for all 80 variants

found by NGS approach. Among them, 79 variants were confirmed by Sanger

sequencing (98.8%). The unconfirmed variant was a missense variant c.C2081T

17

(p.Pro694Leu) in exon 10 of PKD1. Although I did not perform LR-PCR/Sanger

sequencing for whole PKD1/PKD2 regions, I believe TES is efficient in variant

detection even in the duplicated region.

Nevertheless, there were two challenges using NGS approach. The first

problem was low read depth or coverage of exon 1 of PKD1 gene. This resulted

from high GC contents of exon 1 compared to other exons (See Supplementary

Figure S8). To overcome the problem, I performed additional exon 1 Sanger

sequencing of PKD1 gene and found seven additional variants. The other

problem was pseudogenic issue. In addition to validation study, I discarded

variants with high allele frequency > 1%. The average allele frequency of the

variants detected in the pseudogenic region was 3.7 * 10-3. Since I performed

variant calling only for the mapping quality > 20, the mapping quality of the

reads simultaneously aligned to PKD1 and PKD1P1 to 6 was filtered out to 0,

and variant calling was performed only using PKD1 unique mapped read.

Therefore, it is difficult to know pseudogene variants with my analysis method.

I have examined the low VAF mutations in the NGS data, but it is difficult to

know and verify whether variants are also in PKD1P1 to 6.

Improving Mutation Detection Rate by Exon 1-Specific Sanger Sequencing

and MLPA

The subjects with LN or I variants or those in whom NV was detected

underwent Sanger sequencing of PKD1 exon 1 and MLPA. Firstly, Sanger

sequencing of PKD1 exon 1 was performed because variants might not be

detected by NGS owing to low coverage of exon 1. For the duplicated part of

PKD1, long-range PCR followed by nested PCR was performed using

PKD1-specific primers as described before with minor modifications 20,21. In

18

brief, 300-400 ng of genomic DNA was amplified in a final volume of 50 μL,

containing, 0.2 μmol/L of each primer (Cosmogentech, Korea), 0.5 mol/L betaine

(Sigma, USA), 5% dimethyl sulfoxide (Sigma, USA) in the AccuPower ProFi

Taq PCR Premix (Bioneer, Korea) as a Touchdown protocol. The long-range

PCR products were purified with the Qiaquick PCR purification kit and were

further amplified using a nested PCR. The long-range PCR for the PKD1 exon

was done using the AccuPower HotStart PCR Premix (Bioneer, Korea). Sanger

sequencing was performed using the ABI3730xl Genetic Analyzer (Applied

Biosystems, Foster City, CA, USA). In addition, MLPA was performed to detect

large deletion cases if both TES and PKD1 exon 1 sequencing revealed NM.

MLPA analysis was performed with a SALSA MLPA KIT P351-B1/P352-B1

PKD1-PKD2 kit (MRC-Holland, Amsterdam, the Netherlands) according to the

manufacturer’s instructions 12 and single probe exon deletions via MLPA was

confirmed by direct sequencing that they are not due to SNP under the probe

(for the primer, see Supplementary Table S7 online). Finally, I looked into 7

variants found through MLPA. The read depth did not show any particular

problems for the cases. However, only 3 out of 6 variants were detected. The

Normalized read depth in PKD1 and PKD2 of mutations that detected via MLPA

was shown in Supplementary Figure S9 online. It was not a matter of read

depth but a technical limitation of NGS approach in the case of large deletion.

Familial Segregation Analysis

When a novel LP variant appeared after variant calling, familial segregation

analysis was performed to confirm pathogenicity of the variant. Family members

of the subject with LP variant were additionally enrolled for familial segregation

study. The parents of the subjects were recruited first. If the parents were

absent, siblings with and without ADPKD were recruited for segregation

19

analysis. The presence or absence of ADPKD was screened with portable

sonography (SONON 300C, Healcerion, Korea), and additional CT scans were

performed in non-diagnostic cases. For familial segregation analysis, I utilized

both direct Sanger sequencing and TES.

Statistical Analysis

Statistical analyses were performed using SPSS version 23.0 (IBM Corp.,

Armonk, NY, USA). Analysis of covariance, the Mann-Whitney test, and the

chi-square test were performed to compare variables between two groups. P

values <0.05 were considered statistically significant. A multilevel linear

regression model in Stata 14 (StataCorp, College Station, TX, USA) was

performed to identify the correlations between htTKV or eGFR and PKD

genotypes. To control the clustering effect, variance estimation was also done

by the clustered sandwich estimator. For survival analysis, multivariate analysis

using a multilevel Cox proportional hazard model (frailty model) adjusting for

age, gender, and the family-level clustering effect was performed to determine

risk factors for ESRD or mortality. The linearity assumption of continuous

variables was verified, and proportional hazards assumption of categorical

variables was verified using a log-minus-log plot.

20

FIGURE LEGENDS and TABLES

Figure 1. Summary of identified mutations

A. Prevalence of PKD1 vs. PKD2 mutations and cases of no mutation

among 749 subjects (524 families); B. Prevalence of PKD1 vs. PKD2

mutations among 630 subjects (423 families without NM families [n=101]);

C. Frequency of each class in 423 families according to PKD1/2 status;

D. Frequency of each mutation class and grade in 524 families.

Of the 749 subjects, 68% were classified as PKD1, 16% as PKD2, and 16%

as no mutation (NM). Excluding NM subjects, 81% of the 630 subjects and 82%

of the 423 families had PKD1 mutations. The mutation classes in the 423

families that were obtained by excluding the 101 NM families from the original

524 families are shown in Fig. 1C. In PKD1, frameshift mutations were the

most common (39%), and in PKD2, nonsense mutations were the most common

(52%). The prevalence of classes of variants (number of families that shared the

same variants) among the 524 families was shown in Fig 1D.

Abbreviations. DP, definitively pathogenic; HLP, highly likely pathogenic; LP,

likely pathogenic; LN, likely neutral; I, indeterminate; NV, no variant.

Figure 2. Cross-sectional analysis between age and log(htTKV) or eGFR

A. Scatter plot and trend line of log (htTKV) according to genotype. B.

Scatter plot and trend line of log (htTKV) according to genotype.

PKD1-PT genotype had a larger htTKV or lower eGFR than the PKD1-NT

or PKD2 genotypes. log(htTKV) increased rapidly with age in the PKD1-PT

and PKD2 genotypes, but it did so less intensely in the PKD1-NT genotype

21

and those with NM. The PKD1-PT genotype also showed a steeper slope for

eGFR than the other genotypes except PKD1-ID genotype.

Abbreviations. htTKV, height-adjusted total kidney volume; eGFR, estimated

glomerular filtration rate; PKD1-PT, PKD1 protein-truncating; PKD1-NT,

PKD1 non-truncating; PKD1-ID, PKD1 small in-frameshift indels.

Figure 3. Renal outcomes and death according to type of PKD1/2

mutation. A. Survival analysis of ESRD progression. B. Survival analysis

of all-cause mortality

When PKD1-PT genotype was used as a reference, the PKD2 genotype

showed a 0.22 times lower HR of ESRD. The familial clustering effect was

significant (P=0.037). The incidence of death did not have difference according to

genotypes.

Abbreviations. ESRD, end-stage renal disease; PKD1-PT, PKD1

protein-truncating; PKD1-NT, PKD1 non-truncating; PKD1-ID, PKD1 small

in-frameshift indels; HR, hazard ratio; CI, confidence interval.

22

Table 1. Frequency of each class and grade in 524 families

A. Class Nonsense

Frameshift

Typ icalsplicing

L a r g edel/dup

Missense

In-frameindel NV

■ PKD1 86 (19) 136 (24) 16 (3) 8 (0) 90 (17) 12 (3)

■ PKD2 39 (20) 11 (0) 8 (1) 5 (1) 12 (4) 0 (0)

■ NM 53 (20) 48

Total 125 (39) 147 (24) 24 (4) 13 (1) 155 (41) 12 (3) 524 (112)

B. Grade DP HLP LP LN I NV

■ PKD1 284 (46) 61 (20) 39 (0)

■ PKD2 63 (22) 10 (4) 2 (0)

■ NM 39 (19) 14 (1) 48

Total 311 (68) 71 (24) 41 (0) 39 (19) 14 (1) 524 (112)

The number in () means the number of families that shared the same variants

Abbreviation. DP, definitively pathogenic; HLP, highly likely pathogenic; LP, likely

pathogenic; LN, likely neutral; I, indeterminate; NV, no variant

23

Table 2. Baseline Clinical Characteristics According to PKD Genotypes

Variables Total PKD1-PT PKD1-ID PKD1-NT PKD 2 P value

N (%) 630 371 (59.0) 21 (3.0) 119 (19.0) 119 (19.0)

Male, n (%) 304 (48.3) 175 (47.2) 12 (57.1) 68 (57.1) 49 (41.2) 0.975

Age, [mean±SD] 45.6±12.9 44.2±12.9 45.0±10.6 43.3±12.6 48.9±13.0 0.001

1st visit age, 569,

[mean±SD]41.9±12.9 40.2±12.9 39.8±10.0 43.3±12.6 45.9±13.0 <0.001

Hypertension, 590,n (%)

451 (71.6) 266 (71.7) 16 (80.0) 88 (77.2) 81 (75.7) 0.969

Hypertension Dx age,

316,[mean±SD]38.8±10.0 37.3±10.1 38.6 ±7.4 39.0 ± 8.9 43.4±10.5 <0.001

ESRD 122 (19.4) 84 (22.6) 4 (19.0) 22 (18.5) 12 (10.1) 0.004

ESRD age, 122,

[mean±SD]53.3±9.7 52.5±9.5 54.5±14.3 52.5±7.7 60.5±10.7 0.085

Death 23 (3.7) 13 (3.5) 1 (4.8) 5 (4.2) 4 (3.4) 0.908

Death age, 23,

[mean±SD]62.8±13.8 61.4±16.2 63.4 63.9±9.1 65.7±14.6 0.745

htTKV,

[median [IQR]]

781

[432, 1356]

898

[476, 1546]

901

[544, 1271]

609

[397, 1179]

629

[373, 1160]<0.001

eGFR by CKD-EPI,

620, [mean±SD]65.9±40.0 63.5± 42.2 64.7±38.3 66.6±37.8 73.1±33.8 0.052

*Grade,

[n(%)]

1A 44 (7.0) 18 (4.9) 1 (4.8) 14 (11.8) 11 (9.2) <0.001

1B 158 (25.1) 70 (18.9) 2 (9.5) 35 (29.4) 51 (42.9)

1C 206 (32.7) 127 (34.2) 11 (52.4) 36 (30.3) 32 (26.9)

1D 153 (24.3) 99 (26.7) 7 (33.3) 25 (21) 22 (18.5)

1E 69 (11.0) 57 (15.4) 0 (0) 9 (7.6) 3 (2.5)

P for trend by the Jonckheere-Terpstra test

The eGFR of subjects with dialysis or kidney transplantation was regarded as 15

mL/min/1.73 m2.

Abbreviations. PT, protein-truncating; ID, insertion/deletion; NT, non-truncating, SD,

standard deviation; Dx, diagnosis, ESRD, end-stage renal disease; eGFR, estimated

glomerular flow rate; CKD-EPI, Chronic Kidney Disease Epidemiology Collaboration

24

Table 3. A. Multilevel multivariate linear regression of log(htTKV) or

eGFR according to genotype B. Multivariate linear regression of

log(htTKV) or eGFR for comparing each slope according to genotype

A. Log(htTKV) eGFR

*Variable Coefficient [95% CI] P value Coefficient [95% CI] P value

Age 0.012 [0.010, 0.0.013] <0.001 -1.840 [-2.012, -1.668] <0.001

Male 0.102 [0.062, 0.143] <0.001 -6.535 [-11.030, -2.041] <0.001

PKD1 PT reference reference

PKD1 ID -0.033 [-0.186, 0.120] 0.670 2.668 [-112.914, 17.250] 0.720

PKD1 NT -0.153 [-0.220, -0.085] <0.001 7.278 [0.521, 14.034] 0.035

PKD2 -0.161 [-0.232, -0.090] <0.001 18.476 [11.469, 25.482] <0.001

Constant 2.378 [2.297, 2.458] <0.001 148.405 [139.565, 157.246] <0.001

B. b t P value b t P value

PKD1 PT vs. PKD1 ID -0.02 -0.103 0.918 -0.114 -0.671 0.502

PKD1 PT vs. PKD1 NT -0.479 -3.189 0.002 0.393 2.891 0.004

PKD1 PT vs. PKD2 -0.505 -4.036 <0.001 0.428 3.659 <0.001

PKD1 PT vs. NM -0.719 -4.785 <0.001 0.622 4.507 <0.001

PKD1 ID vs. PKD1 NT -0.453 -1.032 0.304 0.667 1.68 0.095

PKD1 ID vs. PKD2 0.035 0.086 0.931 0.545 1.524 0.13

PKD1 ID vs. NM -0.713 -1.47 0.144 0.718 2.351 0.02

PKD1 NT vs. PKD2 0.601 2.476 0.014 -0.26 -1.156 0.249

PKD1 NT vs. NM -0.244 -0.936 0.35 0.242 1.006 0.316

PKD2 vs. NM -0.874 -3.614 <0.001 0.544 2.448 0.015

* adjusted for age and gender

• P value of Log(htTKV) for familial influence = 0.001

• P value of eGFR for familial influence = 0.029

** Dependent variable=Log(htTKV), eGFR

• By linear regression for verifying moderating effect adjusting for age and

gender

• b is the linearity between the two genotypes, and when |b|>1, it is significant;

and t is the significance of b coefficient, and when |t| is greater, it is more

significant.

25

Figure 1

26

Figure 2

Figure 3

27

SUPPLEMENTARY INFORMATION

Supplementary Table S1. Demographic Findings of 749 Patients (524

Families)

Variable Total Male Female P value

N (%) 749 360 (48.1) 389 (51.9)

Age [mean±SD] 46.4±13.3 44.6±14.4 48.0±12.0 <0.001

Age groups, n(%)

~30 88 (11.8) 62 (17.2) 26 (6.7) 0.001

30~40 149 (19.9) 69 (19.2) 80 (20.6)

40~50 210 (28.0) 93 (25.8) 117 (30.1)

50~60 183 (24.4) 86 (23.9) 97 (24.9)

60~70 95 (12.7) 41 (11.4) 54 (13.9)

70~ 24 (3.2) 9 (2.5) 15 (3.9)

eGFR, [mean±SD] 65.8±38.9 65.6±39.4 66.0±38.5 0.896

CKD stages, n(%) 0.945

Stage I 233 (32.6) 107 (31.1) 126 (34.0)

Stage II 198 (27.7) 96 (26.7) 102 (27.5)

Stage IIIa 85 (11.9) 42 (11.7) 43 (11.6)

Stage IIIb 40 (5.6) 21 (5.8) 19 (5.1)

Stage IV 22 (3.1) 12 (3.3) 10 (2.7)

Stage V 137 (19.2) 66 (18.3) 71 (19.1)

eGFR of subjects with RRT or kidney transplantation was regarded as 15 mL/min/

1.73 m2.

Abbreviations. SD, standard deviation; eGFR, estimated glomerular flow rate

28

Family ID Exon Codon cDNA change Protein change Predictedeffect

Domain PKDB CanadaDB

Koreanindividuals

Koreanfamilies

716 1 5 c.13_25del p.Ala5TrpfsX53 Frameshift LRR 1 0 1 1

61 1 9 c.25_26dup p.Ala9TrpfsX64 FrameshiftExtracellularsequence

0 0 1 1

126 1 15 c.43_62del p.Leu15AlafsX92 Frameshift Extracellularsequence 0 0 3 1

57 1 33 c.78_96dup p.Cys33ArgfsX87 Frameshift LRR 0 0 1 1

473, 886 1 44 c.129_135delCGGCGCC p.Gly44ProfsX27 Frameshift LRR 0 0 2 2

904 1 48 c.142del p.Arg48AlafsX25 Frameshift LRRCT 0 0 1 1

260 1 56 c.165_171delGCTGCGG p.Leu56ArgfsX15 Frameshift LRR 1 0 1 1

185 4 123 c.369del p.Ser123ArgfsX167 Frameshift Extracellularsequence 0 0 1 1

224 4 142 c.423_424del p.Glu142AlafsX36 Frameshift LRRCT 0 0 2 1

415 5 275 c.822_832del p.Ala275GlyfsX92 Frameshift PKD 1 0 0 1 1

181, 478,657 5 286 c.856_862del p.Gly286X Frameshift PKD 1 1 3x 3 3

570 6 442 c.1324del p.Ala442ProfsX23 Frameshift C-type lectin 0 0 1 1

Supplementary Table S2. Mutations found in this study

Supplementary Table S2A. PKD1 protein-truncating mutations in study families

29

16 7 499 c.1495del p.Glu499SerfsX59 Frameshift C-type lectin 0 0 3 1

67 8 557 c.1669_1670del p.Leu557AspfsX29 FrameshiftExtracellularsequence

0 0 1 1

649 9 592 c.1776del p.Glu592AsnfsX192 Frameshift Extracellularsequence

0 0 1 1

599 10 630 c.1889dup p.Asp630GlyfsX83 FrameshiftExtracellularsequence

0 0 1 1

229 10 668 c.2001dup p.Gly668TrpfsX46 FrameshiftLDL-receptor

class A;aTypical

0 0 2 1

363 10 681 c.2040dup p.Ala681CysfsX33 FrameshiftExtracellularsequence 0 0 1 1

302, 414 10 696 c.2085dup p.Ala696ArgfsX18 Frameshift Extracellularsequence 0 0 2 2

793 11 797 c.2390dup p.Val797GlyfsX19 Frameshift PKD 2 0 0 1 1

611 11 824 c.2470del p.Leu824CysfsX74 Frameshift Extracellularsequence 0 0 2 1

79, 204 11 832 c.2494dup p.Arg832ProfsX40 Frameshift Extracellularsequence 0 0 2 2

252 11 873 c.2618_2621del p.Val873AlafsX24 Frameshift PKD 3 1 1x 5 1

20 11 887 c.2659del p.Trp887GlyfsX11 Frameshift PKD 3 0 0 12 1

73 11 906 c.2716del p.Glu906SerfsX21 Frameshift PKD 3 0 0 2 1

110 11 928 c.2784_2796del p.Glu928ValfsX18 Frameshift PKD 3 0 0 1 1

541 12 955 c.2865del p.Val955TrpfsX20 Frameshift PKD 4 1 0 1 1

318 12 966 c.2896del p.Arg966GlyfsX10 Frameshift PKD 4 0 0 1 1

30

197 13 1029 c.3085_3088del p.Ala1029CysfsX8 Frameshift PKD 5 0 0 1 1

146 14 1075 c.3223_3226del p.Phe1075ArgfsX28 Frameshift PKD 5 0 0 1 1

54 15 1108 c.3323del p.Ser1108LeufsX7 Frameshift PKD 5 0 0 5 1

4 15 1117 c.3349dup p.Gln1117ProfsX19 Frameshift PKD 5 0 0 3 1

748 15 1168 c.3503_3504del p.Pro1168ArgfsX42 Frameshift PKD 6 1 0 1 1

900 15 1174 c.3520_3521del p.Gln1174AlafsX36 Frameshift PKD 6 0 0 1 1

245 15 1205 c.3613_3622del p.Asp1205SerfsX12 Frameshift PKD 6 0 0 3 1

160 15 1228 c.3684del p.Val1228TrpfsX44 Frameshift PKD 7 1 0 2 1

47 15 1248 c.3744_3754del p.Asp1248AlafsX48 Frameshift PKD 7 0 0 2 1

265 15 1303 c.3906dup p.Ala1303ArgfsX8 Frameshift PKD 8 0 0 1 1

209 15 1347 c.4041_4042del p.His1347GlnfsX83 Frameshift PKD 8 1 X1 1 1

365, 887 15 1357 c.4069del p.Leu1357TrpfsX9 Frameshift PKD 8 1 0 2 2

76 15 1357 c.4070del p.Leu1357ArgfsX9 Frameshift PKD 8 1 0 5 1

15 15 1439 c.4315_4324delinsCAACTGTTG p.Gly1439GlnfsX5 Frameshift PKD 9 0 0 3 1

279 15 1460 c.4379_4380del p.Val1460GlyfsX62 Frameshift PKD 9 0 0 1 1

368 15 1467 c.4401_4407del p.Val1467AlafsX64 Frameshift PKD 9 0 0 1 1

369 15 1478 c.4434del p.Leu1478TrpfsX55 Frameshift PKD 8 0 0 1 1

888 15 1529 c.4586del p.Gly1529AlafsX5 Frameshift PKD 10 0 0 1 1

31

732 15 1551 c.4650dup p.Leu1551AlafsX27 Frameshift PKD 11 0 0 1 1

21 15 1560 c.4679_4691del p.Val1560GlyfsX3 Frameshift PKD 11 0 0 4 1

199 15 1599 c.4797del p.Tyr1599X Frameshift PKD 11 0 novel 1 1

500 15 1642 c.4924del p.Arg1642AlafsX80 Frameshift REJ 0 0 1 1

1, 33, 336,457, 542,562, 575,635, 666

15 1672 c.5014_5015del p.Arg1672GlyfsX98 Frameshift PKD 12 1 29x 17 9

147, 553 15 1841 c.5517_5521dup p.Val1841GlyfsX110 Frameshift PKD 14 0 0 2 2

901 15 1877 c.5629del p.Ala1877ProfsX72 Frameshift PKD14 0 0 1 1

582 15 2075 c.6224_6225insGTTG p.Ser2075LeufsX6 Frameshift PKD 11 0 0 1 1

364 15 2114 c.6341_6344del p.Tyr2114X Frameshift PKD 17 0 0 1 1

238 15 2183 c.6549_6550del p.Glu2183ValfsX77 Frameshift REJ 0 0 3 1

215 15 2257 c.6770dup p.Pro2257AlafsX4 Frameshift REJ 0 0 1 1

84 15 2270 c.6808_6811del p.Asp2270HisfsX43 Frameshift REJ 1 0 1 1

376 16 2332 c.6994_7000del p.Ala2332TrpfsX7 Frameshift REJ 1 0 1 1

17 16 2334 c.6994_7000dup p.Val2334GlyfsX88 Frameshift REJ 1 0 6 1

136 16 2335 c.7003dup p.Glu2335GlyfsX85 Frameshift REJ 0 0 2 1

232 18 2449 c.7345dup p.Thr2449AsnfsX52 Frameshift REJ 0 0 1 1

288 18 2470 c.7408_7414del p.Pro2470TrpfsX148 Frameshift REJ 0 0 1 1

32

60 19 2527 c.7579_7580del p.Val2527LeufsX67 Frameshift REJ 0 0 1 1

183 19 2542 c.7625del p.Gly2542ValfsX78 Frameshift REJ 0 0 1 1

285 20 2581 c.7741_7742dup p.Thr2581GlnfsX39 Frameshift REJ 0 0 1 1

157, 655 20 2606 c.7816del p.Gln2606SerfsX14 Frameshift REJ 0 0 4 2

559 21 2648 c.7942del p.Glu2648ArgfsX6 Frameshift REJ 0 0 1 1

85 21 2658 c.7973_7974del p.Val2658GlyfsX2 Frameshift REJ 0 0 2 1

277, 539 22 2674 c.8019dup p.Pro2674AlafsX148 Frameshift REJ 0 0 2 2

78, 128 23 2776 c.8327_8330del p.Leu2776ArgfsX98 Frameshift REJ 0 0 4 2

551 23 2800 c.8399del p.Pro2800GlnfsX75 Frameshift REJ 0 0 1 1

315 23 2859 c.8570_8574dup p.Ala2859ProfsX18 Frameshift Extracellularsequence 0 0 1 1

592 23 2861 c.8581dup p.Ile2861AsnfsX76 Frameshift Extracellularsequence 0 0 1 1

586 23 2881 c.8642_8655del p.Asp2881GlyfsX51 Frameshift Extracellularsequence 0 0 1 1

556, 589,746 24 2951 c.8851dup p.Arg2951ProfsX4 Frameshift Extracellular

sequence 0 0 3 3

49, 789 25 3028 c.9083_9084del p.Glu3028GlyfsX40 Frameshift GPS 1 0 2 2

754 26 3080 c.9240_9241del p.Ala3080CysfsX96 Frameshift Transmembrane 0 0 1 1

397 28 3228 c.9683dup p.Leu3228ProfsX24 Frameshift PLAT 0 0 1 1

425 29 3262 c.9780_9784dup p.Ile3262SerfsX56 Frameshift Cytoplasmic 0 0 1 1

33

sequence

881 29 3281 c.9841del p.Ala3281ProfsX35 FrameshiftCytoplasmicsequence

0 0 1 1

627 29 3281 c.9843del p.Thr3281ProfsX34 FrameshiftCytoplasmicsequence

0 0 1 1

407 31 3382 c.10144dup p.Thr3382AsnfsX8 FrameshiftCytoplasmicsequence

0 0 1 1

766 31 3388 c.10162_10163dup p.Glu3388LeufsX8 FrameshiftCytoplasmicsequence 0 0 1 1

118 33 3419 c.10255dup p.Trp3419LeufsX7 Frameshift Cytoplasmicsequence

0 0 2 1

107 35 3529 c.10585_10586dup p.Gln3529HisfsX56 Frameshift Cytoplasmicsequence 0 0 1 1

39 36 3569 c.10706_10707del p.Val3569GlyfsX56 Frameshift Transmembrane 0 0 2 1

119 36 3581 c.10742del p.Pro3581ArgfsX3 Frameshift Extracellularsequence 1 0 2 1

161 37 3613 c.10836_10837del p.Tyr3613LeufsX12 Frameshift Cytoplasmicsequence 0 0 1 1

96 37 3655 c.10963del p.Leu3655TrpfsX28 Frameshift Cytoplasmicsequence 1 0 1 1

42 40 3757 c.11270_11274del p.Leu3757ProfsX56 Frameshift Extracellularsequence 0 0 2 1

188 40 3789 c.11365_11366del p.Asn3789TrpfsX25 Frameshift Extracellularsequence 0 0 2 1

14 40 3793 c.11376dup p.Thr3793AspfsX22 Frameshift Extracellularsequence 0 0 2 1

786 40 3793 c.11378del p.Thr3793SerfsX32 Frameshift Extracellularsequence 0 0 1 1

34

676 41 3817 c.11450dup p.Tyr3817LeufsX142 FrameshiftExtracellularsequence

1 0 1 1

141 41 3843 c.11527del p.Asp3843ThrfsX101 FrameshiftExtracellularsequence

0 0 1 1

600 42 3851 c.11551del p.Leu3851TrpfsX93 FrameshiftExtracellularsequence

1 0 1 1

9 42 3888 c.11661dup p.Ala3888CysfsX72 Frameshift Cytoplasmicsequence

0 0 3 1

396 42 3895 c.11684del p.Gly3895AlafsX49 FrameshiftExtracellularsequence

0 0 1 1

572 43 3994 c.11973_11980dup p.Leu3994ProfsX47 FrameshiftTransmem

brane0 0 1 1

812 43 3995 c.11984_11996del p.Phe3995SerfsX39 Frameshift Transmembrane 0 0 1 1

298 44 4014 c.12042del p.Phe4014LeufsX24 Frameshift Cytoplasmicsequence 0 0 2 1

322 44 4033 c.12097del p.Val4033TrpfsX5 Frameshift Transmembrane 0 0 1 1

36, 159 44 4034 c.12100del p.Val4034CysfsX4 Frameshift Transmembrane 0 0 6 2

540 45 4059 c.12175dup p.Gln4059ProfsX97 Frameshift Extracellularsequence 1 0 1 1

583 45 4066 c.12197del p.Pro4066LeufsX131 Frameshift Extracellularsequence 0 0 1 1

293 45 4073 c.12217_12218del p.Leu4073ValfsX82 Frameshift Extracellularsequence 0 0 1 1

77 45 4083 c.12247_12259del p.Pro4083TrpfsX110 Frameshift Extracellularsequence 0 0 2 1

94, 501 45 4102 c.12305del p.Ala4102ValfsX95 Frameshift Transmembrane 0 0 3 2

35

522 45 4103 c.12307_12310del p.Val4103PhefsX93 FrameshiftTransmem

brane0 0 1 1

38, 88 46 4202 c.12605_12632del p.Arg4202ProfsX146 FrameshiftCytoplasmicsequence

1 0 2 2

28 46 4224 c.12669_12670del p.Gln4224ValfsX2 FrameshiftCytoplasmicsequence

0 0 2 1

167, 382 3 117 c.350T>G p.Leu117X Nonsense Extracellularsequence

0 0 2 2

105, 860 4 135 c.405G>A p.Trp135X Nonsense LRRCT 0 0 4 2

89 5 232 c.696T>A p.Cys232X Nonsense WSC 0 0 1 1

100, 535 5 236 c.706C>T p.Gln236X Nonsense WSC 1 0 3 2

102 5 400 c.1198C>T p.Arg400X Nonsense Extracellularsequence 1 0 1 1

526 6 437 c.1309C>T p.Gln437X Nonsense C-type lectin 0 0 1 1

132 6 439 c.1316G>A p.Trp439X Nonsense C-type lectin 0 0 3 1

243 7 523 c.1568C>A p.Ser523X Nonsense C-type lectin 0 0 2 1

6 9 597 c.1789C>T p.Gln597X Nonsense Extracellularsequence 0 0 2 1

576 10 680 c.2040T>G p.Tyr680X Nonsense Extracellularsequence 0 0 1 1


557 11 901 c.2703G>A p.Trp901X Nonsense PKD 3 0 0 1 1

385 12 987 c.2959C>T p.Gln987X Nonsense PKD4 1 0 1 1

36

269 13 1023 c.3067C>T p.Gln1023X Nonsense PKD 5 0 0 1 1

2 15 1112 c.3334G>T p.Glu1112X Nonsense PKD 5 0 0 2 1





469 15 1436 c.4306C>T p.Arg1436X Nonsense PKD 9 1 1x 1 1

24, 59, 91,135, 354,

67415 1483 c.4447C>T p.Gln1483X Nonsense PKD 10 1 0 8 6

403 15 1599 c.4797C>A p.Tyr1599X Nonsense PKD 11 0 novel 2 1


66, 127,439, 552 15 1826 c.5477G>A p.Trp1826X Nonsense PKD 14 0 0 5 4




227 15 2039 c.6115C>T p.Gln2039X Nonsense PKD 16 1 1x 2 1

462, 493 15 2067 c.6199C>T p.Gln2067X Nonsense PKD 17 1 0 2 2

667 15 2164 c.6491C>G p.Ser2164X Nonsense REJ 1 0 1 1

37

29, 137,892

15 2184 c.6549dup p.Glu2184X Nonsense REJ 1 0 5 3

156 15 2187 c.6561G>A p.Trp2187X Nonsense REJ 0 0 1 1

81 15 2246 c.6736C>T p.Gln2246X Nonsense REJ 1 0 1 1


35, 43 17 2388 c.7164C>G p.Tyr2388X Nonsense REJ 0 0 3 2

734 18 2405 c.7214G>A p.Trp2405X Nonsense REJ 0 0 1 1

345 18 2430 c.7288C>T p.Arg2430X Nonsense REJ 1 7x 1 1


70, 267 20 2606 c.7816C>T p.Gln2606X Nonsense REJ 1 0 2 2

324, 883 21 2639 c.7915C>T p.Arg2639X Nonsense REJ 1 5x 3 2


629 23 2738 c.8212G>T p.Glu2738X Nonsense REJ 0 0 1 1

452 23 2742 c.8224G>T p.Glu2742X Nonsense REJ 0 0 1 1

797 23 2810 c.8428G>T p.Glu2810X Nonsense REJ 1 3x 1 1


761, 810 27 3183 c.9547C>T p.Arg3183X Nonsense PLAT 1 0 2 2

588 28 3195 c.9585G>A p.Trp3195X Nonsense PLAT 0 0 2 1

388 28 3206 c.9616C>T p.Gln3206X Nonsense PLAT 1 0 1 1

38

890 30 3350 c.10048A>T p.Lys3350X NonsenseCytoplasmicsequence

0 0 1 1

145 32 3395 c.10180C>T p.Gln3394X NonsenseCytoplasmicsequence

1 0 1 1


1 0 2 1

104 34 3478 c.10432G>T p.Glu3478X Nonsense Cytoplasmicsequence

0 0 1 1


1 0 1 1

148 36 3603 c.10806G>A p.Trp3602X NonsenseTransmem

brane0 0 2 1

7 37 3620 c.10855A>T p.Lys3619X Nonsense Cytoplasmicsequence 0 0 2 1


567 39 3755 c.11263G>T p.Glu3755X Nonsense Extracellularsequence 0 0 1 1

607 40 3796 c.11388T>A p.Tyr3796X Nonsense Extracellularsequence 0 0 1 1

239 41 3807 c.11420G>A p.Trp3807X Nonsense Extracellularsequence 1 0 5 1

5, 117 41 3808 c.11423G>A p.Gly3808X Nonsense Extracellularsequence 1 0 5 2

75 42 3871 c.11611G>T p.Glu3871X Nonsense Extracellularsequence 1 0 2 1

74 43 3921 c.11763G>A p.Trp3921X Nonsense Cytoplasmicsequence 1 3x 1 1

93 44 4003 c.12007C>T p.Gln4003X Nonsense Cytoplasmicsequence 1 4x 1 1

39

22 44 4041 c.12121C>T p.Gln4041X Nonsense PKD 3 1 8x 6 1

109 45 4065 c.12195C>A p.Cys4065X NonsenseExtracellularsequence

0 0 1 1

64 45 4126 c.12377dup p.Tyr4126X Nonsense Cytoplasmicsequence

0 0 2 1


0 0 1 1

625 12 951 c.2853+1G>ATypicalsplicing

PKD 4 1 0 1 1

549 15 1098 c.3295+1G>TTypicalsplicing

PKD 5 0 0 1 1

274 20 2568 c.7703+1G>C Typicalsplicing REJ 0 0 1 1

246, 538,656, 884 21 2673 c.8017-2_8017-1d

elTypicalsplicing REJ 1 0 5 4

129 23 2672 c.8017-2A>G Typicalsplicing REJ 1 0 2 1

171 26 3067 c.9201+1G>C Typicalsplicing

Transmembrane 1 0 1 1

638 33 3406 c.10217+1G>A Typicalsplicing

Cytoplasmicsequence 0 0 1 1

213 32 3476 c.10429-2_10429-1delCA

Typicalsplicing


595 39 3671 c.11014-1del Typicalsplicing


34 41 3757 c.11270-2A>C Typicalsplicing

Extracellularsequence 0 0 3 1

608 44 3903 c.11710-1G>T Typicalsplicing

Transmembrane 0 0 1 1

40

194 44 3904 c.11713-1G>ATypicalsplicing

Transmembrane

0 0 1 1

312 45 4000 c.12001-2A>GTypicalsplicing

Transmembrane

0 0 1 1

299 4 exon 4 exon 4 Large deletion 0 0 1 1



628 15 2219 c.6657_6671del p.Arg2219_Pro2224del

Large deletion REJ 1 0 1 1

162 19 exon19 dup exon19 dup Largeduplication 0 0 2 1


23 46 3791 c.11373_11390del p.Gly3791_Ser3797del Large deletion Extracellular

sequence 0 0 3 1

643 46 4201 c.12601_12628del p.Gly4201SerfsX147 Large deletion Cytoplasmicsequence 0 0 1 1

Supplementary Table S2B. PKD1 in-frame insertion/deletions in study families

41

FamilyID

Exon Codon cDNA change Protein change Domain PROVEANscore

PKDB

CanadaDB

Koreanindividu

als

Koreanfamilies

Segregation

patients/control

99 36 3565 c.10694_10696del p.Val3565delTransmem

brane-7.49 0 0 2 1

26, 27,679

10 689 c.2065_2067del p.Ser689delExtracellularsequence

-5.37 0 0 6 3 5,0

133 15 2088 c.6258_6263dup p.Pro2087_Arg2088dup PKD 17 -5.26 0 0 1 1 1,1

216 15 2217 c.6650_6664del p.Val2217_Leu2221del REJ -16.42 1 0 1 1

482 15 2251 c.6752_6754del p.Val2251del REJ -7.19 0 0 1 1

87 20 2613 c.7837_7839del p.Leu2613del REJ -9.98 1 0 3 1 2,0

130 23 2770 c.8308_8310del p.Asn2770del REJ -11.28 0 0 3 1 3,0

124,208 24 2979 c.8935_8937del p.Phe2979del Extracellular

sequence -10.76 1 0 2 2

37 25 3031 c.9092_9094del p.Leu3031del GPS -14.07 0 0 2 1 2,0

Supplementary Table S2C. PKD1 non-truncating mutations in study families

42

FamilyID

Exon Codon cDNAchange

Proteinchange

Domain SIFT polyphen

GERP++

PKDB

CanadaDB

Koreanindividual

Koreanfamily

Segregation

patients/control

334 2 75 c.224C>T p.Ser75Phe LRR 1 0 1 4.27 1 0 4 1

355 2 93 c.278T>C p.Leu93Pro C-typelectin

0 1 4.27 0 0 1 1

898 3 120 c.359T>C p.Ile120ThrExtracellularsequence

00.873

4.26 1 0 1 1

304 5 381 c.1141G>A p.Gly381ArgExtracellularsequence

0.090.999

5.05 0 0 1 1

264 6 419 c.1256G>A p.Cys419Tyr C-typelectin 0 1 5.05 0 0 1 1

186 7 466 c.1396G>A p.Val466Met C-typelectin 0.01 1 4.82 1 2x 2 1

281, 297,465 7 515 c.1543G>T p.Gly515Trp C-type

lectin 0 1 4.85 1 0 4 3

41, 306 7 464 c.1391T>C p.Leu464Pro C-typelectin 0.37 0.99

9 4.82 0 0 3 2 3,0

233 10 693 c.2078G>A p.Gly693Glu Extracellularsequence 0.01 1 5.17 0 0 3 1 3,4

519 10 699 c.2095T>C p.Ser699Pro Extracellularsequence 0.21 0.94

3 4.01 0 0 1 1 2,1

342 11 791 c.2371C>T p.Arg791Trp PKD 4 0.08 0.999 4.43 0 0 1 1 2,0

616 11 869 c.2605C>T p.Arg869Cys PKD 12 0.04 0.992 3.48 0 0 1 1 0,1

248 11 944 c.2830C>T p.Arg944Cys PKD 4 0 1 4.96 1 0 1 1

43

241 11 845 c.2534T>C p.Leu845SerExtracellularsequence

0 1 5.09 1 0 2 1

405 12 960 c.2878G>A p.Gly960Ser PKD 4 0 1 4.79 1 0 1 1 0,1

444 14 1077 c.3229G>A p.Val1077Ile PKD 5 0.230.982 5.33 0 0 1 1

614 14 1055 c.3163T>G p.Trp1055Gly PKD 5 0.09 1 5.29 0 0 1 1

745 15 1587 c.4759C>T p.Arg1587Cys PKD 11 0.02 1 4.14 0 0 1 1

485 15 2085 c.6254C>G p.Pro2085Arg PKD 17 0.04 1 5.49 0 0 1 1

591 15 2215 c.6643C>T p.Arg2215Trp REJ 0.010.999 5.49 1 1x 1 1

530 15 2235 c.6704C>T p.Ser2235Leu REJ 0 0.999 5.35 0 0 2 1

624, 848 15 1414 c.4241G>C p.Trp1414Ser PKD 9 0.01 1 5.71 0 0 2 2 2,0

332 15 1544 c.4630G>A p.Val1544Met PKD 10 0 1 5.36 0 0 2 1 2,0

189 15 1547 c.4640G>A p.Arg1547His PKD 10 0.14 1 4.35 0 0 1 1

219 15 1666 c.4997G>C p.Trp1666Ser PKD 12 0 1 5.41 0 0 1 1

865 15 1832 c.5495G>T p.Gly1832Val PKD14 0.01 1 4.67 0 0 1 1

98 15 2159 c.6475G>T p.Val2159Leu REJ 0.74 1 5.49 0 0 1 1

561 15 1109 c.3327T>A p.Asn1109Lys PKD 5 0.01 0.998 -4.05 0 0 1 1

83 15 1652 c.4955T>C p.Leu1652Pro PKD 12 0 1 5.30 0 0 2 1 2,0

261 15 2006 c.6016T>G p.Trp2006Gly PKD 16 0 1 5.59 0 0 1 1

44

528 15 2132 c.6395T>G p.Phe2132Cys PKD 17 0.18 1 4.38 1 0 1 1

328 18 2445 c.7333A>C p.Thr2445Pro REJ 0.130.999

4.81 0 0 2 1

555 18 2434 c.7300C>T p.Arg2434Trp REJ 0 1 3.78 1 0 1 1

348 18 2436 c.7307G>T p.Gly2436Val REJ 0.04 1 4.81 0 0 1 1

569 18 2476 c.7427G>A p.Cys2476Tyr REJ 0 1 4.77 0 0 1 1

513 19 2557 c.7670A>G p.Asp2557Gly REJ 0 1 5.14 1 0 1 1

668 19 2516 c.7546C>T p.Arg2516Cys REJ 0.01 1 4.96 1 8x 1 1

476 19 2497 c.7490G>T p.Gly2497Val PKD 16 0 1 4.96 0 0 1 1

773 19 2526 c.7576T>C p.Cys2526Arg REJ 0.24 0.999 4.96 0 0 1 1

644, 893 23 2863 c.8587A>G p.Ile2863Val Extracellularsequence 0.33 0.87

4 4.89 0 0 2 2 2,0

108, 169,176, 211,214, 253,471, 777,

854

23 2771 c.8311G>A p.Glu2771Lys REJ 0.01 1 4.89 0 0 12 9 10,1

231 23 2783 c.8347G>C p.Ala2783Pro REJ 0.04 1 4.62 0 0 1 1

713 24 2963 c.8888T>C p.Ile2963Thr Extracellularsequence 0 1 4.55 0 0 2 1 2,0

201, 594 25 3012 c.9034A>C p.Thr3012Pro GPS 0.02 1 3.45 0 0 4 2 3,0

677 25 3046 c.9136C>T p.Arg3046Cys GPS 0 0.994 4.38 0 0 1 1

45

143 25 3043 c.9128G>A p.Cys3043Tyr GPS 0 1 4.38 0 0 1 1

642 25 3014 c.9041T>C p.Leu3014Pro GPS 0.12 1 4.55 0 0 1 1

429 26 3132 c.9395C>T p.Ser3132Leu PLAT 0 1 4.51 1 0 1 1

837 26 3113 c.9338G>A p.Gly3113GluCytoplasmicsequence

0.02 1 4.51 0 0 1 1

158, 203 27 3135 c.9404C>T p.Thr3135Met PLAT 0 1 4.49 1 0 2 2

350, 615 28 3194 c.9581C>T p.Ala3194Val PLAT 0.01 1 4.84 0 0 2 2 4,0

69 29 3253 c.9758T>C p.Leu3253ProCytoplasmicsequence

0.01 1 4.69 0 0 2 1 2,0

50 33 3468 c.10402G>A p.Asp3468Asn

Cytoplasmicsequence 0.01 0.99

9 4.45 1 0 1 1

25 36 3591 c.10769C>T p.Ser3590Phe Transmembrane 0 1 3.98 0 0 2 1 2,0

192 36 3603 c.10807G>A p.Glu3603Lys Cytoplasmicsequence 0.01 1 3.98 1 0 2 1

585 37 3612 c.10835T>C p.Leu3612Pro Cytoplasmicsequence 0.01 1 4.81 0 0 1 1

236 38 3673 c.11018T>C p.Leu3673Pro Extracellularsequence 0.02 0.99

7 3.46 0 0 2 1

164 39 3749 c.11246G>A p.Arg3749Gln Extracellularsequence 0.07 1 4.04 1 0 1 1

894 39 3752 c.11255G>A p.Arg3752Gln Extracellularsequence 0.08 1 4.04 1 0 1 1

90 41 3822 c.11465T>A p.Leu3822Gln Extracellularsequence 0 1 4.49 0 0 1 1

212 42 3846 c.11536A>G p.Ser3846Gly Extracellular 0.02 0.95 3.88 0 0 1 1

46

sequence 8

296 42 3846 c.11538C>G p.Ser3846ArgExtracellularsequence

0.010.996

3.88 0 0 1 1

30 42 3853 c.11558T>C p.Leu3853ProExtracellularsequence

0 1 3.97 0 0 4 1 2,0

527 37 3634 c.10901C>A p.Ala3634AspCytoplasmicsequence

0.570.999

3.91 0 0 1 1

242 4 154 c.461C>T p.Thr154Met LRRCT 00.547 -3.45 0 0 2 1

249 15 1768 c.5303C>A p.Thr1768Asn

PKD 13 0.05 0.984

2.6 0 0 2 1

496 15 2095 c.6285C>A p.Asp2095Glu PKD 17 0.13 0.976 3.51 0 0 1 1

578 25 2997 c.8991C>G p.Ser2997Arg Extracellularsequence 0.07 0.99

9 3.57 0 0 1 1

320, 510 37 3650 c.10948G>A p.Gly3650Ser Cytoplasmicsequence 0.09 1 3.71 1 0 3 2

421 15 1706 c.5116G>A p.Ala1706Thr PKD 12 0.42 0.506 4.44 0 0 1 1

466 15 1951 c.5852G>A p.Arg1951Gln PKD 15 0.82 0.504 4.24 0 0 1 1

646 23 2765 c.8294G>A p.Arg2765His REJ 0.36 0.543 3.87 0 0 1 1

289 43 3949 c.11846T>C p.Leu3949Pro Transmembrane 0.17 0.99

9 3.12 0 0 1 1

47

Family ID Exon Codon cDNA change Protein change PredictedEffect

Domain PKDB

CanadaDB

Koreanindividuals

Koreanfamilies

101 1 63 c.187_188insAG p.Ala63GlufsX55 FrameshiftCytoplasmicsequence

0 0 1 1

902 1 158 c.471dup p.Glu158ArgfsX55 Frameshift Poly-Arg 0 0 1 1

618 1 197 c.588_595del p.Leu197SerfsX13 Frameshift Cytoplasmicsequence

0 0 1 1

134 5 381 c.1142del p.Gly381GlufsX71 Frameshift Extracellularsequence

0 0 2 1

329 7 569 c.1704dup p.Val569CysfsX4 Frameshift Transmembrane 0 0 4 1

72 11 710 c.2127dup p.Lys710X Frameshift EF-handdomain 0 0 1 1

152 11 714 c.2142del p.Lys714AsnfsX2 Frameshift EF-handdomain 0 0 1 1

97 11 720 c.2159dup p.Asn720LysfsX5 Frameshift EF-handdomain 0 0 2 1

346 11 722 c.2164del p.Val722TrpfsX15 Frameshift EF-handdomain 0 0 1 1

220 11 738 c.2213_2214del p.Phe738X Frameshift EF-handdomain 0 0 1 1

366 11 747 c.2240del p.Lys747ArgfsX23 Frameshift EF-handdomain 0 0 1 1

80 1 80 c.239C>A p.Ser80X Nonsense Cytoplasmicsequence 0 0 1 1

596, 641 1 87 c.260G>A p.Trp87X Nonsense Poly-Glu 0 0 2 2

32, 154, 437 1 183 c.547C>T p.Gln183X Nonsense Cytoplasmicsequence 0 0 5 3

474 2 201 c.603G>A p.Trp201X Nonsense Transmembrane 1 0 1 1

Supplementary Table S2D. PKD2 protein-truncating mutations in study families

48

727 2 223 c.667G>T p.Glu223X NonsenseCytoplasmicsequence

0 0 1 1

10, 654 3 266 c.796G>T p.Glu266X NonsenseExtracellularsequence

1 0 3 2

58, 222, 565 4 306 c.916C>T p.Arg306X NonsenseExtracellularsequence

1 0 3 3

389 4 320 c.958C>T p.Arg320X Nonsense Polycystinmotif

1 7x 1 1

142, 237 4 361 c.1081C>T p.Arg361X Nonsense Extracellularsequence

1 0 4 2

40 5 417 c.1249C>T p.Arg417X NonsenseExtracellularsequence 1 10x 2 1

31, 163, 836 6 455 c.1365G>A p.Trp455X NonsenseExtracellularsequence 1 0 4 3

651, 660 6 464 c.1390C>T p.Arg464X Nonsense Extracellularsequence 1 0 3 2

270 9 654 c.1960C>T p.Arg654X Nonsense Extracellularsequence 1 5x 1 1

275 10 684 c.2052C>A p.Tyr684X Nonsense Cytoplasmicsequence 0 0 2 1

106, 198,479, 626,

83810 701 c.2102C>G p.Ser701X Nonsense Cytoplasmic

sequence 0 0 5 5

18, 205,546, 573 11 742 c.2224C>T p.Arg742X Nonsense EF-hand

domain 1 0 6 4

12 12 776 c.2326C>T p.Gln776X Nonsense EF-handdomain 0 0 2 1

123, 155,558, 636 13 803 c.2407C>T p.Arg803X Nonsense Linker 1 6x 10 4

140 14 874 c.2620A>T p.Lys874X Nonsense Coiled coil 0 0 1 1

86 1 198 c.595+1G>C Typicalsplicing


372, 633 4 365 c.1094+1G>T Typicalsplicing

Extracellularsequence 0 0 2 2

49

44 5 365 c.1095-2A>GTypicalsplicing

Extracellularsequence

1 0 4 1

460 5 365 c.1095-2A>TTypicalsplicing

Extracellularsequence

0 0 1 1

319 6 516 c.1548+1G>ATypicalsplicing

Transmembrane

0 0 1 1

401 8 572 c.1717-2A>C Typicalsplicing

Cytoplasmicsequence

0 0 1 1

3 11 747 c.2240+1G>T Typicalsplicing

EF-handdomain

1 0 2 1

349 1 exon 1 exon 1Largedeletion 0 0 1 1

11 2 Intron2-exon5 p.M1fsXLargedeletion


574, 609 3 Exon 3-5 Exon 3-5 Largedeletion 0 0 2 2

897 8 exon 8 exon 8 Largedeletion 0 0 1 1

50

FamilyID

Exon Codon cDNAchange

Proteinchange

Domain SIFT Polyphen

GERP++

PKDB

CanadaDB

Koreanindividuals

Koreanfamilies

Segregation

patients/control

435 1 179 c.536C>A p.Pro179His Cytoplasmicsequence

0.01 0.838 3.98 0 0 1 1

13 4 290 c.869G>T p.Gly290Val Extracellularsequence

0 0.99 5.07 0 0 4 1 2,0

335, 340 4 322 c.965G>A p.Arg322GlnPolycystin

motif 0 0.999 5.62 1 0 5 2

82, 899 4 325 c.974G>A p.Arg325GlnPolycystin

motif 0.01 0.988 -3.84 1 3x 4 2

240 5 389 c.1166C>T p.Ala389Val Extracellularsequence 0 0.907 4.9 0 0 2 1 3,1

394 5 374 c.1121T>G p.Leu374Trp Extracellularsequence 0 0.892 4.9 0 0 1 1

122, 292 5 381 c.1141G>A p.Gly381Arg Extracellularsequence 0 0.966 4.9 0 0 3 2 3,1

282, 472 14 890 c.2668G>A p.Glu890Lys Coiled coil 0.21 1 5.01 0 0 2 2 2,1

Supplementary Table S2E. PKD2 non-truncating mutations in study families

51

Supplementary Table S3. Design of Custom Bait

Category Description

Sequencing technology Illumina

Sequencing protocol Paired-End Short Read (100bp)

Tiling frequency 3x

Bait length 120

Avoid standard repeat masked regions Yes

Avoid overlap 20

Layout strategy Centered

Strand Sense

Total input targets 1018

Total valid targets 256

Average number of baits per target 5.88

Total targets with bait coverage 2038

Total number of baits 1498

Baits removed due to avoid overlap 46

Total baits covered by baits 59913

Abbreviations. bp, basepair

Supplementary Table S4. Quality of Targeted Exome Sequencing

Cohort (N=749) Average S.D.

QC passed reads 9141842.96 6008387.66

Unique reads 7826567.35 5238200.17

Aligned unique reads 7815828.12 5227207.04

PKD1 target mean coverage (X) 1523.12 973.96

% of PKD1 target bases ≥1X 0.9931 0.0076





PKD2 target mean coverage (X) 1422.98 903.62






Abbreviations. QC, quality control; S.D, standard deviation

52

Supplementary Table S5. Definition of Grades

Grade of Mutations

Definitely

pathogenic

(DP)

• Stop-gain single-nucleotide variants (SNVs)

• Frameshift indels

• In-frameshift indels with ≥5 amino acids

• Typical splice site mutations

• Either newly discovered from the samples or previously

registered in the Mayo DB or TGESP(1) data

Highly likely

pathogenic

(HLP)

• Nonsynonymous SNVs or in-frameshift indels of <5 amino acids

that were established as HLP in the Mayo DB or TGESP DB

• Novel LP variants segregated in family(s)

Likely

pathogenic

(LP)

• Nonsynonymous SNVs that were listed as LP in Mayo DB,

• Possibly causal nonsynonymous SNVs that satisfied 2 of the

following 3 conditions

ØBeing damaging, as predicted by a SIFT score ≤0.05

ØBeing damaging, as predicted by Polyphen-2 ([HumDiv]

≥0.453 or [HVAR] ≥0.447)

ØHaving a genomic evolutionary rate profiling [GERP] ++

score ≥2

• Possibly causal in-frameshift indels of <5 amino acids that

satisfied Protein Variation Effect Analyzer PROVEAN] score of

≤2.5

No mutations (NM)

Likely neutral

(LN)

• Previously registered in the Mayo DB or TGESP DB

• Newly found in normal controls

Indeterminate

(I)

• Nonsynonymous SNV previously registered in the Mayo DB or

TGESP DB

• Nonsynonymous SNV or in-frameshift indels of <5 amino acids

with obscure pathogenicity in SIFT, Polyphen-2 [HumDiv or

HVAR], [GERP] ++, or [PROVEAN]

No variant

(NV)• No possible pathogenic variants

53

Gene Exonic Func Exon cDNA change Protein change DupR Sanger

PKD1 frameshiftdeletion

exon15 c.5014_5015del p.Arg1672GlyfsX

98 Y Confirmed

PKD1 frameshiftinsertion

exon16 c.6994_7000dup p.Val2334GlyfsX

88 Y Confirmed

PKD1 stopgain SNV exon15 c.4447C>T p.Gln1483X Y Confirmed

PKD1 stopgain SNV exon15 c.6549dup p.Glu2184X Y Confirmed


exon25 c.9083_9084del p.Glu3028GlyfsX

40 Y Confirmed




exon15 c.5014_5015del p.Arg1672GlyfsX

98 Y Confirmed


exon15 c.4070del p.Leu1357Argfs

X9 Y Confirmed

PKD1 frameshiftinsertion

exon11 c.2494dupC p.Arg832ProfsX4

0 Y Confirmed


PKD1 stopgain SNV exon5 c.1198C>T p.Arg400X Y Confirmed


exon11 c.2618_2621del p.Val873AlafsX2

4 Y Confirmed

PKD1frameshiftdeletion

exon15 c.5014_5015del

p.Arg1672GlyfsX98 Y Confirmed

PKD1 NA exon23

c.8017-2A>G 　 Y Confirmed

PKD1 stopgain SNV exon15

c.4447C>T p.Gln1483X Y Confirmed

PKD1 stopgain SNV exon15

c.6549dup p.Glu2184X Y Confirmed

PKD2 stopgain SNV exon13 c.2407C>T p.Arg803X N Confirmed


exon15 c.3684del p.Val1229TrpfsX

44 Y Confirmed

PKD1 stopgain SNV exon15 c.6549dup p.Glu2184X Y Confirmed

PKD1 nonsynonymous SNV

exon23 c.8311G>A p.Glu2771Lys Y Confirmed

PKD1nonframeshift

deletionexon24 c.8935_8937del p.Phe2979del Y Confirmed

Supplementary Table S6. Results from Validation Study(n=80)

54

PKD1nonsynonymous SNV

exon27 c.9404C>T p.Thr3135Met Y Confirmed

nonsynonymous SNV

exon15 c.4051C>T p.Arg1351Trp Y ConfirmedPKD1

stopgain SNV exon15 c.3334G>T p.Glu1112X Y ConfirmedPKD1

frameshiftinsertion

exon15 c.3349dup p.Gln1117ProfsX

19 Y ConfirmedPKD1

stopgain SNV exon9 c.1789C>T p.Gln597X Y ConfirmedPKD1

frameshiftinsertion

exon15

c.4315_4324delinsCAACTGTTG

p.Gly1439GlnfsX5 Y ConfirmedPKD1

frameshiftdeletion

exon7 c.1495del p.Glu499SerfsX5

9 Y ConfirmedPKD1

frameshiftdeletion

exon11 c.2659del p.Trp887GlyfsX1

1 Y ConfirmedPKD1

frameshiftdeletion

exon15 c.4679_4691del p.Val1560GlyfsX

3 Y ConfirmedPKD1

frameshiftdeletion

exon46

c.12672_12673del

p.Gln4224HisfsX2 N ConfirmedPKD1

frameshiftdeletion

exon15 c.3744_3754del p.Asp1249AlafsX

48 Y ConfirmedPKD1

frameshiftdeletion

exon19 c.7579_7580del p.Val2527LeufsX

67 Y ConfirmedPKD1

frameshiftdeletion

exon11 c.2659del p.Trp887GlyfsX1

1 Y ConfirmedPKD1

stopgain SNV exon15 c.5477G>A p.Trp1826X Y ConfirmedPKD1

frameshiftdeletion

exon8 c.1669_1670del p.Leu557AspfsX

29 Y ConfirmedPKD1

frameshiftdeletion

exon11 c.2716del p.Glu906SerfsX2

1Y ConfirmedPKD1

frameshiftdeletion

exon23

c.8327_8330delp.Leu2776Argfs

X98 Y ConfirmedPKD1

stopgain SNVexon17 c.7164C>G p.Tyr2388X Y ConfirmedPKD1

stopgain SNV exon17 c.7164C>G p.Tyr2388X Y ConfirmedPKD1

frameshiftdeletion

exon15 c.6808_6811del p.Asp2270HisfsX

43 Y ConfirmedPKD1

frameshiftdeletion

exon21 c.7973_7974del

p.Val2658GlyfsX2 Y ConfirmedPKD1

nonframeshiftdeletion

exon20

c.7837_7839del p.Leu2613del Y ConfirmedPKD1

stopgain SNV exon5

c.696T>A p.Cys232X Y ConfirmedPKD1

stopgain SNV exon3 c.350T>G p.Leu117X Y ConfirmedPKD1

55

stopgain SNVexon5

c.706C>T p.Gln236X Y ConfirmedPKD1

ConfirmedYp.Leu557AspfsX29c.1669_1670delexon

8frameshiftdeletionPKD1

ConfirmedYp.Tyr1599Xc.4797C>Aexon15stopgain SNVPKD1

ConfirmedYp.Trp1826Xc.5477G>Aexon15stopgain SNVPKD1

ConfirmedYp.Leu2776ArgfsX98c.8327_8330delexon


ConfirmedYp.Asn2770delc.8308_8310delexon23

nonframeshiftdeletionPKD1


ConfirmedYp.Glu2335GlyfsX85c.7003dupGexon

16frameshiftinsertionPKD1

ConfirmedYp.Phe1075ArgfsX28c.3223_3226delexon


ConfirmedYp.Val1841GlyfsX110c.5517_5521dupexon

15frameshiftinsertionPKD1

ConfirmedNp.Trp3603Xc.10809G>Aexon36stopgain SNVPKD1

ConfirmedYp.Gln1621Xc.4861C>Texon15stopgain SNVPKD1

ConfirmedNp.Gln183Xc.547C>Texon1stopgain SNVPKD2

ConfirmedNp.Lys714AsnfsX2c.2140delAexon


ConfirmedYp.Gln1203Xc.3607C>Texon15stopgain SNVPKD1

ConfirmedNp.Gln183Xc.547C>Texon1stopgain SNVPKD2


ConfirmedYp.Gln2606SerfsX14c.7816delexon


ConfirmedNp.Val4034CysfsX4c.12100delexon


ConfirmedNp.Phe3614LeufsX11

c.10839_10840del

exon37

frameshiftdeletionPKD1

ConfirmedNp.Ser3591Phec.10772C>Texon36

nonsynonymous SNVPKD1

ConfirmedYp.Ser689delc.2065_2067delexon10




ConfirmedYp.Ala81Valc.242C>Texon2


56

ConfirmedYp.Val908Metc.2722G>Aexon11


NotdetectedYp.Pro694Leuc.2081C>Texon

10nonsynonymous SNVPKD1



ConfirmedYp.Leu1652Proc.4955T>Cexon15


ConfirmedYp.Leu3031delc.9092_9094delexon25


ConfirmedYp.Ala81Valc.242C>Texon2


ConfirmedYp.Pro2087_Arg2088dupc.6258_6263dupexon

15nonframeshift

insertionPKD1

ConfirmedYp.Leu3031delc.9092_9094delexon25


ConfirmedYp.Cys3043Tyrc.9128G>Aexon25


ConfirmedNp.Ile4105Leuc.12313A>Cexon45


1. Long Range PCR information

Exon Forward primer Reverse primer Tm

('C)

Product

size(bp)

2-75'-CCCCGAGTAGCTGGAAC

TACAGTTACACACT-3'

5'-CGTCCTGCTGTGCCAGA

GGCG-3'70 4041

13-155'-TGGAGGGAGGGACGCC

AATC-3'

5'-GTCAACGTGGGCCTCCA

AGT-3'65 4391

15-215'-ATCCCTGGGGGTCCTAC

CATCTCTTA-3'

5'-ACACAGGACAGAACGG

CTGAGGCTA-3'68 5253

2. Sequencing primers information

Exon primer

name

Sequence Genomic

position(of5'firstbp;

hg19)

Distance to

exon-intron

junction

2-3PKD1_2-3-F 5'-GGGATGCTGGCAATGTGTGGGAT-3' 2,169,468 -89 exon 2

PKD1_2-3-R 5'-GGACCAACTGGGAGGGCAGAA-3' 2,169,038 77 exon 2

4 PKD1_4-F 5'-GGCGGTGCTGTCAGGGTG-3' 2,168,948 -102 exon 4

Supplementary Table S7. MLPA validation using Long Range PCR

sequencing

57

PKD1_4-R 5'-CCAGAGAGGCCTTCCTGAGC-3' 2,168,582 95 exon 4

5(A)PKD1_5A-F 5'-GAACAGCATGGGAGCCTGTGAGT-3' 2,168,561 -95 exon 5

PKD1_5A-R 5'-AGCCGGCCCAGCGGCATC-3' 2,168,033Within an

exon 5

5(B)PKD1_5B-F 5'-GCCTGTCCCTCTGCTCCG-3' 2,168,262

Within an

exon 5

PKD1_5B-R 5'-GTGTCAACGGTCAGTGTGGGC-3' 2,167,696 96 exon 5

6PKD1_6-F 5'-GTGTCTGCTGCCCACTCCC-3' 2,167,787 -124 exon 6

PKD1_6-RF 5'-CTCCTTCCTCCTGAGACTCCC-3' 2,167,397 93 exon 6

15(A)PKD1_15A-F 5'-TTCTGCCGAGCGGGTGGG-3' 2,161,938 -64 exon 15

PKD1_15A-R 5'-CATGTCGAAGGTCCACGTGATGT-3' 2,161,427Within an

exon 15

15(B)PKD1_15B-F 5'-GACATGAGCCTGGCCGTGG-3' 2,161,516

Within an

exon 15

PKD1_15B-R 5'-CCACCTCTGGCTCCACGCA-3' 2,161,027Within an

exon 15

15(C)PKD1_15C-F 5'-CACGCGGAGCGGCACGTT-3' 2,161,121

Within an

exon 15

PKD1_15C-R 5'-GGTGACCTCCGGACCCTC-3' 2,160,626Within an

exon 15

15(D)PKD1_15D-F 5'-TCTGCTGTGGGCCGTGGG-3' 2,160,706

Within an

exon 15

PKD1_15D-R 5'-CTGTACCGTGTGGTTGGTGGG-3' 2,160,215Within an

exon 15

15(E)PKD1_15E-F 5'-ACAGCATCTTCGTCTATGTCCTG-3' 2,160,303

Within an

exon 15

PKD1_15E-R 5'-GGTTCCCTGCCGTCATGGTG-3' 2,159,812Within an

exon 15

15(F)PKD1_15F-F 5'-GGGCTGAGCTGGGAGACCT-3' 2,159,899

Within an

exon 15

PKD1_15F-R 5'-GACAGCTGAGCCGGCAGC-3' 2,159,417Within an

exon 15

15(G)PKD1_15G-F 5'-CTGTGGGCCAGCAGCAAGGT-3' 2,159,494

Within an

exon 15

PKD1_15G-R 5'-CGTGCGGTTCTCACTGCCCA-3' 2,159,012Within an

exon 15

15(H)PKD1_15H-F 5'-GACGTCACCTACACGCCCG-3' 2,159,095

Within an

exon 15

PKD1_15H-R 5'-CCTCCCAGCGGTACTCAGTCT-3' 2,158,603Within an

exon 15

15(I)PKD1_15I-F 5'-GATGCGGCGATCACAGCGCA-3' 2,158,688

Within an

exon 15

PKD1_15I-R 5'-GGCCAGCCCTGGTGGCAA-3' 2,158,164Within an

exon 15

21PKD1_21-F 5'-AGTCGTGGGCATCTGCTGGC-3' 2,155,550 -75 exon 21

PKD1_21-R 5'-CAAGCTGCCCGTCTGCCCT-3' 2,155,240 83 exon 21

Reference. Ying-Cai Tan et al., J Mol Diagn. 2012 Jul;14(4):305-13. doi: 10.1016/ j.

jmoldx. 2012. 02. 007

58

Supplementary Figure S1

Of the entire sample of 524 families, variants were found by targeted exome

sequencing (TES) in 471 families (89.9%). Among them, mutations were found

in 409 families. In 62 families with likely neutral (LN) or indeterminate (I)

variants and 53 families with no variants (NV), subsequent PKD1 exon 1

Sanger sequencing (SS) and/or multiple ligation probe assay (MLPA) revealed

10 additional mutations. An overall mutation detection rate was 80.7% (423/524

families) by TES, PKD1 exon 1 Sanger sequencing, and MLPA.

59


Mutations were found more frequently in the C-type lectin and G

protein-coupled receptor proteolytic site (GPS) domains of polycystin-1 in the

cohort compared to those reported from the ADPKD mutation database (PKDB).

Likewise, mutations were detected more frequently in the linker domain of

polycystin-2 in my patients compared to those reported from PKDB. The

standardized residual of the chi-square test (cut-off of significance, 2.58) was

2.8 in the C-type lectin of polycystin-1 and the linker domain of polycystin-2,

and 2.6 in the GPS domain of polycystin-1.

60


A. As the Mayo imaging classification changed from 1A to 1E, the proportion

of PKD1-protein truncating (PKD1-PT) genotype increased. Conversely, as the

genotype changed from PKD2 to PKD1-PT, the proportion of Mayo imaging

classification 1C-E increased (P <0.001 by linear by linear test).

B. This figure shows the longitudinal distribution of Mayo class of individuals.

For the median follow up duration, the ICC of the intraclass correlation of

individuals was 0.84 [0.82, 0.86] showing that Mayo imaging classification of

each subject was mostly retained over time.

Abbreviations. PKD1-PT, PKD1 protein truncating; PKD1 ID, PKD1 small

in-frameshift indels; PKD1-NT, PKD1 non-truncating; NM, no mutations.

61


The ESRD incidence was 21.7% (110/508) in PKD1 and 10% (12/119) in

PKD2. The histogram represented only those who reached ESRD. The PKD1

genotype showed an earlier onset of ESRD than PKD2 genotype (53.3[46.7, 58.7]

vs. 63.1[54.0, 67.5], P=0.003)

62



HOPE-PKD cohort between 2009 and 2016 were screened. Among the screened

patients, 117 patients were excluded from the analysis including those who had

active cancer (n=4) or chronic liver disease (n=22) and those who had no kidney

or liver volume data available (n=91). A total of 749 subjects from 524 unrelated

families were included in the analysis.

63


Sequencing reads were first aligned with the human genome reference

sequence (hg19) using BWA version 0.7.5a with the MEM algorithm (default

options). To minimize false-positive and false-negative calls, I only used

uniquely mapped reads that were properly paired to avoid mapping reads that

aligned to both target regions and pseudogenes. In addition, only selected reads

with a high mapping quality (>30) were used. SAMTOOLS version 1.2 and

Picard version 1.127 (http://picard.sourceforge.net) were used to process

SAM/BAM files to duplicate the marking. Specifically, local realignment to

reduce any misalignment in the duplicated regions was performed.

I used RealignerTargetCreator and IndelRealigner from GATK version 3.3-1

with known single-nucleotide polymorphisms (SNPs) and indels from dbSNP142,

Mills and 1000G gold-standard indels at hg19 sites, and 1000G phase 1 indels at

hg19 sites. Known SNPs and indels were also used to perform a base

64

calibration. For calling variants, Unified Genotyper was used, and the called

variants were recalibrated by GATK based on dbSNP142, Mills indels, HapMap,

and Omni. ANNOVAR was used to annotate the variants. The depth of

coverage approach was applied to detect any large rearrangement or copy

number variations. The CalculateHsMetrics module in PICARD was used to

calculate the statistics of each parameter of the samples and depth of each exon,

and the copy number ratio was characterized by the normalized depth of the

exons from all the patients


The detected variants were classified into seven categories of nonsense,

frameshift, typical splicing, large deletion, missense, small in-frame indel and no

variant. Seven classes of variants were further divided into five grades

according to their pathogenicity: definitively pathogenic (DP), highly likely

pathogenic (HLP), likely pathogenic (LP), likely neutral (LN), and indeterminate

65

(I). The DP variants were defined as all the variants that result in protein

truncation, including nonsense, frameshift, typical splicing, and large deletion

variants or those previously established as DP in the ADPKD mutation database

(PKDB) or previous articles. The HLP variants were defined as missense

variants or small in-frame indels that were previously established as HLP in

PKDB or previous articles. The LP variants were defined as missense variants

or small in-frame indels that were previously established as LP in PKDB or

previous articles or variants that were predicted to be damaging using in-silico

analysis and segregated within a family. Variants were thought to be damaging

if the results of in-silico analysis satisfied two of the following conditions: 1)

damaging as predicted by a SIFT score ≤0.05, 2) damaging as predicted by

Polyphen-2 (HumDiv), and 3) GERP ++ score ≥4.

Supplementary Figure S8 Inverse relationship of GC content and the depth

of coverage for PKD1. (Solid line: GC contents (%), Dotted line: depth of

coverage)

The problem of NGS was low read depth or coverage of exon 1 of PKD1

gene. This resulted from high GC contents of exon 1 compared to other exons.

66

Supplementary Figure S9. Normalized read depth in PKD1 and PKD2 of

mutations that detected via MLPA

67

2nd part: Liver

INTRODUCTION

Autosomal dominant polycystic kidney disease (ADPKD) is a highly prevalent

and potentially lethal monogenic disorder characterized by numerous kidney

cysts and progressive renal failure. Several studies have examined the natural

course of the kidney manifestations according to genotype. The Halt Progression

of Polycystic Kidney Disease (HALT-PKD)1 and the Toronto Genetic

Epidemiology Study of PKD (TGESP), as well as studies conducted in Italy2

and China3, 4, showed remarkable correlations between the genotype and the

kidney phenotype. In addition to the causative genes, especially PKD1 and

PKD2, locus heterogeneity, allelic heterogeneity, and hypomorphic mutations are

also known to affect the prognosis of the kidney in ADPKD5-7. Furthermore, I

have found a significant correlation between renal progression and genotype,

according to PKD1 protein-truncating (PT) mutations, PKD1 non-truncating

(NT) mutations, and No mutations (NM) in part I.

Meanwhile, ADPKD is frequently accompanied by many extra-renal

manifestations, among which the most common manifestation is polycystic liver

disease (PLD)8. PLD in ADPKD can cause various symptoms, including early

satiety or abdominal discomfort, that lower the quality of life and serious

complications such as infection and malnutrition9. D'Agnolo et al. recently

reported that combined kidney and liver volume was associated with the

presence and severity of pain and gastrointestinal symptoms in ADPKD, with a

more prominent role for height-adjusted total liver volume (htTLV) than for

height-adjusted total kidney volume (htTKV)10. Moreover, I reported that

abdominal symptoms and hepatic complications increased as PLD progressed in

patients with ADPKD8.

68

The increased life expectancy that has been achieved through advancements in

ADPKD management has paradoxically brought to light hepatic manifestations

that would not have had a chance to surface11; therefore, it is expected that

more patients will suffer from hepatic manifestations, and we have no excuse to

procrastinate research into the hepatic manifestations of ADPKD.

Unlike kidney manifestations, few studies have investigated the natural course

and risk factors for PLD progression in ADPKD. Some previous studies showed

that age, female sex8, pregnancy, and the use of female sex hormones12 were

potential risk factors associated with PLD progression in ADPKD. Only a few

studies have investigated genotype-phenotype associations of PLD in ADPKD;

for instance, Zoghby et al. failed to show a significant association between the

genotype and the severity or growth rate of PLD in ADPKD patients13.

Therefore, the impact of genotype on PLD progression has not been thoroughly

investigated.

This study examined non-genetic (age, female sex, pregnancy, and the use of

female sex hormones) and genetic (family-level clustering, type, and position of

the PKD1/2 mutation) risk factors of PLD in ADPKD in a prospective cohort,

the coHOrt for genotype-PhenotypE correlation in ADPKD (HOPE-PKD), with

749 Korean patients.

RESULTS

Non-genetic factors influencing PLD severity

The htTLV of 749 subjects (360 male and 389 female) ranged from 336 mL to

7,885 mL. Subjects were divided into 3 groups (mild, moderate, and severe)

according to htTLV (Figure S1). Severe PLD was found in 64 cases (8.5%),

and moderate PLD in 55 cases (7.4%) (Table 1).

69

In comparison with no or mild PLD, moderate and severe PLD had more

females (% of females; none or mild vs. moderate vs. severe, 47.5% vs. 63.6%

vs. 85.9%, P for trend < 0.001) and was older (45.2 vs. 53.0 vs. 52.3 years, P

for trend < 0.001). Patients with moderate to severe PLD were more likely to be

hypertensive (73.9% vs. 84.6% vs. 85.2%, P for trend = 0.012).

Most of the interventions, such as hepatic artery embolization, percutaneous

drainage, partial hepatectomy, or liver transplantation, were performed in the

severe PLD group (0 (0%) vs. 2 (3.6%) vs. 18 (28.6%), P for trend < 0.001).

As described before8, the median htTKV was higher in patients with moderate

to severe PLD (678 vs. 1,024 vs. 1,102 mL/m, P for trend <0.001). The mean

eGFR was lower (69.2 vs. 49.4 vs. 46.9 mL/min/1.73m2, P for trend <0.001), and

the prevalence of end-stage renal disease(ESRD) was higher in patients with

moderate to severe PLD (15.4% vs. 30.9% vs. 35.9%, P for trend <0.001). The

proportion of grades according to the Mayo classification14 was different among

the 3 groups (P for trend <0.001) (Table 1).

In a fraility model, the hazard ratio (HR) for ESRD in the severe PLD group

was significantly higher than in the patients with no or mild PLD (HR [95%

CI]: mild vs. moderate vs. severe; reference vs. 1.684 [0.931, 3.046] vs. 2.012

[1.164, 3.477], Figure 1A). Addtionally, the familial clustering effect according to

3 PLD group was significant (P value for familial influence = 0.024). Even

though incidence of death and age of death failed to show trend according to 3

groups (Table 1), the HR for all-cause mortality with Cox proportional hazard

model became higher as the severity of PLD increased (HR [95% CI]: reference

vs. 3.155 [1.038, 9.586] vs. 6.723 [2.418, 18.697], Figure 1B). In summary, severe

PLD was associated with advanced renal disease and PLD was significantly

associated with mortality.

70

Among females, htTLV ranged from 458 to 7,885 mL/m. The mean age was

older in the moderate to severe PLD group (46.5 vs. 54.9 vs. 52.3 years, P for

trend <0.001, Table 1)

Pregnancy-related parameters were surveyed in 216 female participants, from

whom information was gathered regarding previous full-term or preterm

deliveries, abortions, and the use of female sex hormones. The number of

pregnancies was found to be significantly higher in patients with moderate to

severe PLD (Linear by linear test, P=0.006, Fig. 2A). The prevalence of sex

hormone use was associated with the severity of PLD (Linear by linear test,

P<0.001, Fig. 2B).

Familial clustering effect

A total of 138 families with more than 2 family members were included in the

analysis. The number of family members ranged from 2 to 7. Among these

families, 49 had 1 or more members with moderate to severe PLD. There were

5 concordant moderate to severe PLD families that had 2 or more members

(3.6%, 5/138), with 10.2% (5/49) of families having 1 or more member with

moderate to severe PLD. Among 5 concordant families, 2 were concordant for

severe PLD.

Since families contained multiple generations, only siblings with an age

difference of less than 5 years were included in the additional analysis.

Concordant moderate to severe PLD was found in approximately 7.7% (5/66) of

all siblings and approximately 27.8% (5/18) in families with 1 or more individual

with moderate to severe PLD. In other words, concordant families were rarely

observed. It was confirmed that the plotted htTLV of the 65 siblings was

71

remarkably different from that of the siblings in families with moderate to

severe PLD. In Fig. 3A, the concordant siblings appeared to be rarely observed.

However, when the siblings were plotted after the subject with the highest

htTLV was removed from each family, it was confirmed that the subjects with

low htTLV values showed the same pattern as that in Fig. 1A (Fig. 3B). I

divided each family into three groups (mild, moderate, and severe) based on the

member with the largest htTLV among the siblings. Then, I observed the trend

of htTLV for the members, except for those with the largest htTLV. Even

though the median age and sex distribution were not statistically different, the

prevalence of females was higher and htTLV increased with the severity of

PLD (P for trend <0.001, Table in Fig. 3B). A similar trend was shown in the

analysis excluding women aged 45 years or older whose liver is clearly large (P

for trend=0.003, Table in Fig. 3D).

In summary, families that have siblings with higher htTLV values also have

other siblings with relatively higher htTLV values. Through this data, it was

suggested that htTLV showed a family-level clustering effect anticipating that

the effect would be from the genetic power. The next step was to identify the

genotype-phenotype correlation of PLD.

The impacts of the PKD1 and PKD2 genes on PLD severity

Distribution of PKD genotype according to PLD severity

A total of 630 patients in 423 families after excluding NM were re-classified

into 4 genotype groups: PKD1-PT, 371 (59%); PKD1-ID, 21 (3%); PKD 1-NT,

119 (19%); PKD 2, 119 (19%). PKD1-PT larger htTKV (898 [476, 1546], mL/m)

but htTLV and intervention did not correlated with genotype. Genotype was

devided into 5 (PKD1 PT, PKD1 ID, PKD1 NT, PKD2, and NM) according to

72

Part 1. I analyzed the difference of htTLV according to 5 genotypes (Figure S2,

P value =0.077 by Linear by linear test). However, the prevalence of NM

mutations decreased in the moderate and severe PLD groups (P>0.001, by

Linear by linear test).

To analyze the family-level clustering effect of htTLV, a multilevel mixed

effect linear regression model was constructed with cross sectional htTLV,

adjusting for age, sex, and the family-level clustering effect. When log(htTLV)

was plotted according to age, the PKD1-PT genotype seemed to larger

log(htTLV) comparing to other genotypes, but there was no signficance.

However, the NM genotypes showed smaller log(htTLV) values compared to

PKD1 PT genotype (Figure S3 and Table Left). About the slope (Figure S3

and Table right) the PKD1-PT genotypes showed a steeper slope than the

NM group. In summary, there was no statistical significance ecept for PKD1

PT vs. NM group.

htTLV had a significant familial clustering effect, with an intra-class

correlation (ICC) of 0.15. However, the effect size was small when compared to

that of htTKV (family-level clustering effect, P<0.001; ICC, 0.43).

To summarize, the severity of PLD was associated with age, female sex, a

history of pregnancies, and the use of female sex hormone. Both htTKV and

htTLV also showed a family-level clustering effect. However, we could not

found the PKD1/2 genotype-phenotype correlation in log(htTLV).

Subgroup analysis: comparison of the no liver cyst and liver cyst groups

Although kidney involvement in ADPKD is generally known to have a

penetrance of 100%, liver cysts did not develop in some cases. Therefore,

additional analyses of the characteristics of patients with and without liver cysts

73

according to the class of the PKD gene were conducted. In a previous study,

liver cysts appeared in subjects aged 40 or older in approximately 90% of

subjects. Only subjects older than 40 years were included in the analysis, and

subjects over 40 years with <4 liver cysts were classified as the no liver cyst

group. Subjects over age 40 years with ≥4 liver cysts were defined as the liver

cyst group for analysis. The no liver cyst group consisted of 54 subjects

(10.5%), the liver cyst group consisted of 458 individuals (89.5%), and the

proportion of males was greater in the no liver cyst group (n=32, 59.3%). The

median htTLV and htTKV values were higher in the liver cyst group, and

eGFR tended to be higher in the liver cyst group, but without statistical

significance (61.1 vs. 51.8, mL/min/1.73 m2,P=0.067). In addition, the prevalence

of PRT was higher in the liver cyst group, and milder phenotype was found to

be more common in the no liver cyst group. In general, the renal phenotype

tended to be milder in the no liver cyst group without clinical significance

(Table S1). As for the prevalence of PKD classes in the 2 groups, NM was

more common in the no liver cyst group (Fig. S4).

Subgroup analysis: description of concordant moderate to severe PLD

families

There were 5 families concordant for moderate to severe PLD (A-E) in the

present study. All but 1 (D) family had significantly damaging variants. Because

a PKD1 missense variant of I was found in the family with PKD2, this variant

was likely to be pathogenic, but this possibility needs to be confirmed in other

families (Table S2).

Subgroup Analysis: Description of Moderate to Severe PLD Males

74

There were 29 male subjects with moderate to severe PLD. Although the

sample size was not large, the proportion of PKD1 was slightly larger compared

to the genotype distribution of 749 persons (PKD1 of total subjects, 68% vs.

PKD1 of moderate to severe PLD males, 76% [Figure S6]). In other words, the

possibility that the frequency of PKD1 is high in males with a large liver can

be considered (Table S3).

Location (domain) of mutations in moderate to severe PLD families

Fig. 4 shows the PKD1 gene locations of subjects in Mayo DB and the

study (SNUH) by domain. As each domain size is different, the prevalence of

each domain was analyzed by size and PT versus NT. Compared to the Mayo

DB data, the prevalence of C-type lectin in moderate to severe PLD was

observed to be higher in both PT and NT mutations; thus, it was suggested

that the prevalence of C-type lectin was significantly higher in families with

PLD individuals when prevalence was analyzed by dividing the subjects in my

study based on an htTLV of 1600 mL/m (multivariate-adjusted with age, sex

and family-level clustering effect, P=0.001). Therefore, C-type lectin can be

suggested as a warm spot for moderate to severe PLD (Fig. 4).

DISCUSSION

In summary, 1) the correlations of female sex, age, renal progression, multiple

pregnancies, and the use of female sex hormones and PLD were confirmed.

Moreover, the moderate to severe PLD participants had poorer renal outcomes

and higher mortality. 2) It is certain that a family-level clustering effect of PLD

existed, although the effect size was not notable compared to that observed for

the kidney. 3) The proportion of PKD1 PT mutations was not related to log

75

(htTLV). 4) In patients with no liver cysts, milder kidney disease was present.

5) The C-type lectin domain was a possible warm spot for moderate to severe

PLD in Korean ADPKD patients.

So far, HALT-PKD remains the only major study focused on the hepatic

manifestations of ADPKD, and revealed that the hepatic manifestations had no

correlation with PKD1/2. However, as a previous Korean study of the hepatic

presentation of ADPKD found that the liver prognosis tended to mirror the

kidney prognosis 8, the present study started with the hypothesis that the liver

would also be influenced by the PKD1/2 genes. Finally, the present study

proved for the first time, through a multilevel analysis, that htTLV showed a

family-level clustering effect. However, the influence of pregnancy and female

sex hormone use was stronger than the genetic effect in women.

In a subgroup analysis, subjects with no liver cysts and their genotypes were

investigated. In the current study, those with no liver cysts had milder renal

presentations. This is a valuable finding in that it is brand-new information that

can be used as an indicator for predicting the prognosis of ADPKD. Moreover,

In general, it is known that there is no hot spot for ADPKD; however, I found

a warm spot. The C-type lectin domain in the PKD1 gene was related to the

prevalence of moderate to severe PLD. C-type lectin is involved in cell-matrix

stability and/or cell-matrix signaling; during kidney development, it binds

carbohydrates in a calcium-dependent manner, and interacts with extracellular

matrix proteins. In the liver, it is associated with remodeling of the extracellular

matrix surrounding the cysts.15, 16 However, for this aspect of the study to yield

more concrete outcomes, a larger number of moderate to severe PLD patients

are required, and in vitro experiments need to be conducted in the future to

verify the results.

76

I have attempted genetic analysis with the NGS technique as the main

pipeline. To date, NGS has been considered to have limitations for use as a gold

standard in detecting PKD gene mutations. However, as shown in the recent

China study 17, NGS very accurately finds a variant and can process a large

amount of data at low cost and high speed, which are benefits. To overcome

the shortcomings of NGS, MLPA and Sanger techniques were used in this

study. NGS is expected to be more useful in clinical practice than the Sanger

technique, and the genetic diagnosis of polycystic kidney disease will be

promoted through NGS.

For now, making a genetic diagnosis of ADPKD is challenging. The gold

standard, Sanger sequencing, also shows approximately a 10% non-detection

rate, and 9.3% non-detection rate was shown in my study based on NGS.

Moreover, even when LN, I, and NC mutations were defined as NM

approximately 80% yield was shown. Therefore, the clinical circumstances of the

roughly 20% of the group without a mutation should be investigated as well. In

this study, NM was analyzed as a group, and although there were some

variations, the clinical course of participants with NM was mild compared to

that of patients with other genotypes, indicating that a genetic diagnosis can be

properly made with NGS. Based on this, I suggest that the results presented in

my study can be used to help inform NM patients about their prognosis.

In the current cohort, I followed ESRD progression and mortality. The

moderate to severe PLD group had more subjects with ESRD progression and

mortality. Attention must be focused on the result that the severe PLD group

had a 6.1 times higher mortality rate than the mild PLD group, and efforts to

identify the risk factors and to reduce the morbidity and mortality of severe

PLD patients are clearly needed.

77

In this study, the liver phenotype correlated well with the kidney phenotype.

The familial clustering effect in the liver phenotype was also proven in the

sibling analysis and frailty model. However, the fact that there is no

genotype-phenotype correlation in PLD is significantly complex. It is also

regrettable to conclude that there is no association between the genotype and

phenotype in the liver.

In this study, the PKD1 / 2 locus heterogeneity and allelic heterogeneity in

PLD were not clarified. We can think of the following reasons.

First, this study did not investigate all known genes that could cause PLD.

Moreover, we should also consider the effects of modifier genes and epigenetics.

Previous studies have suggested that the common phenotype and various

mutations of ADPKD and ADPLD provide more complex polycystic disease 18.

PRKCSH, SEC63, LRP5, ALG8, and SEC61B, known as ADPLD genes, modulate

polycystin 1 function 19. Additionally, the possibility that the GANAB gene

affects the ADPKD and ADPKD may also be considered 20. Further research is

needed on the possibility that unknown genes including above mentioned genes

may act as modifier.

Second, it is possible that the hormonal effect has been a factor disturbing the

genetic effect of PKD1 / 2. In previous retrospective studies, pregnancy did not

affect the severity of PLD, but there was a report that PLD became worse

when estrogen was administered 21. As shown in the result, the possibility that

women, pregnancy, and sex hormonal effect affected the liver phenotype and the

possibility that genotype correlation is covered by these factors should be

considered. In addition, we need additional studies on other enviromenal factors

besides hormal effects.

78

Third, the moderate to severe PLD scarcity. Unlike in the kidneys, moderate

to severe PLD is significantly few in number; therefore, this may have resulted

in the dilution of statistical significance. It is deemed necessary to investigate

the risk factors with more populations in order to confirm the

genotype-phenotype correlation in the liver. In addition to the PKD1 and PKD2

genes, it would also be substantial to study the 3rd gene effect.

A major strength of the current study is that it is the first to elucidate the

correlation between PLD and the PKD1/2 genes in ADPKD, which is supported

by the existence of a family-level clustering effect. Additionally, the cohort in

this study was fairly well designed and had a substantial number of subjects.

Considering that most of these subjects are now receiving prospective follow-up,

it is expected that further, more concrete analyses of this cohort will be possible

in the future.

There were some limitations in the current study. It was difficult to correctly

identify patients’ family history because all family members were not registered

at the same hospital. As only patients with an objective computed tomography

(CT)-based index measurement were enrolled, it was harder to investigate the

family-level clustering effect. Based on their history of moderate to severe PLD,

about 25% of families might have family-level clustering in moderate to severe

PLD (data not shown). As for the kidney and liver size, I used cross-sectional

data, therefore, a longitudinal study is needed to investigate serial htTKV or

htTLV changes.

In conclusion, the results strongly suggest that there is family-level clustering

in htTLV severity, although the magnitude is not overwhelming. I conducted a

thorough analysis despite having limited data, and verified the effect of the

PKD1/2 genes on PLD. However, it might be possible that the information was

not comprehensive enough, and that other variables could not be controlled for.

79

In the future, further research on PLD must be carried out, sufficient data must

be gathered, and corresponding follow-up periods must be implemented to obtain

more sophisticated and detailed correlations. Consequently, it will be possible to

predict the prognosis of PLD in ADPKD more clearly through the examination

of family members’ medical history, an establishment of a national policy to

allow cooperation between hospitals, and international cooperation.

METHODS

Patients


HOPE-PKD cohort between 2009 and 2016 were screened. Subjects aged over

18 who have compatible imaging findings with or without family history were

enrolled in HOPE-PKD cohort. Among the screened patients, 117 patients were

excluded from the analysis including those who have active cancer (n=4) or

chronic liver disease (n=22) and those who have no kidney or liver volume data

available (n=91, Figure S5). A total of 749 subjects from 524 unrelated families

were included in the analysis.

Collection of family history and clinical information of subjects

Baseline demographic profiles were collected, including variables such as age,

sex, height, and the presence or absence of hypertension, diabetes, chronic liver

disease, or cancer, familial history of PLD and ESRD. Hypertension was defined

as a blood pressure >140/90 mm Hg or a history of treatment with

antihypertensive medications. Chronic liver disease was defined as chronic

hepatitis (toxic, alcoholic, viral, or autoimmune) or liver cirrhosis. PLD was

80

defined as 4 or more liver cysts. Interventions were defined as transarterial

embolization, percutaneous drainage, partial hepatectomy, or liver transplantation.

Their median follow up duration was 77.3 months.

To measure liver and kidney volume, participants underwent CT exams that

were performed biennially in all patients using multi-detector CT scanners 22.

Volumetry was done manually by a single skilled technician based on a review

of each CT scan. Total liver volume was measured by 3-dimensional stereology

(Vitrea, Vital Images, Inc., Minnetonka, USA). Total kidney volume was

measured by the ellipsoid method 8. A concordant PLD family was defined by

the presence of at least 2 or more individuals with moderate to severe PLD.

Serum creatinine was measured using the Jaffe method, and traced using

isotope dilution mass spectrometry. The Chronic Kidney Disease Epidemiology

Collaboration Formula was used to calculate the estimated glomerular filtration

rate (eGFR) 23. ESRD was defined as either estimating GFR < 15mL/min/1.73m2

or initiation of renal replacement therapy.

Classification of PLD and definition of concordant PLD families

PLD groups were classified according to htTLV as no or mild

(height-adjusted total liver volume [htTLV] ≥ 1600 mL/m), moderate (1600

mL/m < htTLV ≤ 2400 mL/m), or severe (htTLV ≤ 2400 mL/m). The cutoff

value of htTLV ≥ 1600 mL/m for moderate to severe PLD was used based on

the probability of developing 2 or more abdominal symptoms and (hepatic)

complications 8.

in the scatter plot of the distribution of htTLV values ≥ 1600 mL/m by age

and sex (Fig. S2), age was found to affect the total liver volume, so I

attempted to adjust the findings regarding PLD according to age. Because

81

htTLV distribution differs by sex, I analyzed correlations between the PKD

genotypes and htTLV by dividing the patients according to sex. An htTLV

more than 2,400 mL/m was defined as severe PLD based on the definition

presented by Hogan et al. 24 of a total liver volume of 4,000 mL/m or more as

severe PLD; this corresponds to an htTLV of approximately 2,400 mL/m, or

about 3 times the normal liver size.

PKD1 and PKD2 genes screening by NGS, Exon1-Specific Sanger

sequencing and MLPA

PKD genotyping using next-generation screening (NGS), Exon1-Specific

Sanger sequencing and MLPA was performed using the method described in

Part 1.

Statistical analysis

Statistical analyses were performed using SPSS version 23.0 (IBM Corp.,

Armonk, NY, USA). I performed the ANCOVA test, linear-by-linear test, the

Jonckheere-Terpstra test, or the chi-square test to compare variables among the

3 PLD groups. P values <0.05 were considered to indicate statistical significance.

ML-win (University of Bristol, Bristol, UK) was used for comparisons between

htTLV groups with the family-level clustering effect taken into consideration.

Multilevel multiple logistic regression and a multilevel linear regression model

using Stata 14 (StataCorp., Texas, USA) were used to identify the correlations

between htTLV and gene class. In order to control for the clustering effect,

variance estimation was also done by the clustered sandwich estimator.

82

FIGURE LEGENDS and TABLES

Figure 1. A. ESRD progression according to PLD severity B. All cause

mortality according to PLD severity

In a fraility model, the hazard ratio (HR) for ESRD in the severe PLD group

was significantly higher than in the patients with no or mild PLD. Addtionally,

the familial clustering effect according to 3 PLD group was significant (P value

for familial influence = 0.024). The HR for all-cause mortality with Cox

proportional hazard model became higher as the severity of PLD increased.

Figure 2. Impact of female sex hormone on severity of PLD

The number pregnancies was found to be significantly higher in patients with

moderate to severe PLD (P=0.006, Fig. 2A). The prevalence of sex hormone use

was associated with the severity of PLD (P<0.001, Fig. 2B).

Figure 3. A. Distrubution of htTLV among siblings B. htTLV of siblings

excluding the subjects having highest htTLV C. Distrubution of htTLV

among siblings excluding famales over 45 years D. htTLV of siblings

excluding the subjects of highest htTLV and famales over 45 years

When the htTLV of 65 siblings was plotted, they were confirmed to be

noticeably different among siblings in the families that had moderate to severe

PLD. Therefore, it appeared that concordant siblings were rarely observed (Fig.

3A). However, when the siblings were plotted after the subject with the highest

htTLV was removed from each family, it was confirmed that the subjects with

low htTLV values showed the same pattern as found in Fig. 1A (Fig. 3B).

When the individuals with the highest htTLV in each family were divided into 3

groups, even though the median age and sex distribution were not statistically

different, htTLV increased with the severity of PLD (Table in Fig. 3B). A

similar trend was shown in the analysis excluding women aged 45 years or

83

older whose liver is clearly large (Table in Fig. 3C and D). Subjects with *

have multiple siblings and represent overlapping subjects.

Figure 4. A. Location of mutation of PKD1 in moderate to severe PLD B.

Location of mutation of PKD1 in moderate to severe PLD

Fig. 4 shows the PKD1 gene locations of subjects in Mayo DB and the study

(SNUH) by domain. As each domain size is different, the prevalence of each

domain was analyzed by size and PT versus NT. Compared to the Mayo DB

data, the prevalence of C-type lectin in moderate to severe PLD was observed

to be higher in both PT and NT mutations; thus, it was confirmed that the

prevalence of C-type lectin was significantly higher in families with PLD

individuals when prevalence was analyzed by dividing the subjects in my study

based on an htTLV of 1,600 mL/m (multivariate-adjusted with age, sex and

family-level clustering effect, P=0.001). Therefore, C-type lectin can be

suggested as a warm spot for moderate to severe PLD.

84

Table 1. Frequency of each class and grade in 524 families

Variables No or mild Moderate Severe P value

N (%) 630 (84.1) 55 (7.4) 64 (8.5)

Female, n (%) 299 (47.5) 35 (63.6) 55 (85.9) <0.001

Age, [mean±SD] 45.2±13.7 53.0±9.6 52.3±9.1 <0.001

1st visit age, 587,[mean±SD]

41.0±13.2 47.6±10.0 48.5±8.7 <0.001

Hypertension, 703, n (%) 436 (73.9) 44 (84.6) 52 (85.2) 0.012

Hypertension Dx age,372, [median [IQR]]

38.3[32.5,46.3]

41.4[33.7, 48.1]

43.5[39.8, 45.3]

0.011

Intervention, 747, n (%) 0 (0) 2 (3.6) 18 (28.6) <0.001

htTLV, [median [IQR]]882[749, 1048]

1881[1774, 2116]

3186[2775, 3949]

<0.001

htTKV, [median [IQR]]678[396, 1228]

1024[456, 1927]

1102[753, 1578]

<0.001

eGFR by CKD-EPI, 738,[mean±SD]

69.2±38.4 49.4±35.2 46.9±37.9 <0.001

RRT, n (%) 97 (15.4) 17 (30.9) 23 (35.9) <0.001

RRT age, [median [IQR]]53.0[46.2, 59.6]

55.1[52.1, 60.5]

53.3[51.1, 63.9]

0.140

death, n (%) 12 (1.9) 5 (9.1) 8 (12.5) 1.000

death age, [median [IQR]]65.0[54.9, 71.6]

58.9[53.6, 75.4]

64.3[60.5, 72.4]

0.721

*Grade,[n(%)]

1A 61 (9.7) 5 (9.1) 5 (7.8)

<0.001

1B 173 (27.5) 15 (27.3) 10 (15.6)

1C 193 (30.6) 20 (36.4) 25 (39.1)

1D 137 (21.7) 13 (23.6) 17 (26.6)

1E 66 (10.5) 2 (3.6) 7 (10.9)

no or mild (htTLV < 1600 mL/m), moderate (1600 mL/m < htTLV ≤ 2400

mL/m), or severe (htTLV ≤ 2400 mL/m) Linear by linear test,

Jonckheere-Terpstra test, or chi-square test as appropriate; interventions: hepatic

85

artery embolization, percutaneous drainage, hepatectomy, or liver transplantation;

eGFR of subjects with RRT or kidney transplantation was regarded as 15

mL/min/1.73 m2.

Abbreviations: SD, standard deviation; Dx, diagnosis; IQR, interquartile range;

htTLV, height-adjusted total liver volume; htTKV, height-adjusted total kidney

volume; eGFR, estimated glomerular flow rate; CKD-EPI, The Chronic Kidney

Disease Epidemiology Collaboration; RRT, renal replacement therapy.

86

Figure 1. A. ESRD progression according to PLD severity B. All cause

mortality according to PLD severity

Figure 2. Impact of female sex hormone on severity of PLD

87

Figure 3. A. Distrubution of htTLV among siblings B. htTLV of siblings excluding the subjects having highest

htTLV C. Distrubution of htTLV among siblings excluding famales over 45 years D. htTLV of siblings excluding the

subjects of highest htTLV and famales over 45 years

89

Figure 4. A. Location of mutation of PKD1 in moderate to severe

PLD B. Location of mutation of PKD1 in moderate to severe PLD

90

SUPPLEMENTARY INFORMATION

Figure S1. Distribution of htTLV according to age and sex

91

Figure S2. Distribution patterns of PKD genotypes according to

severity

92

Figure S3. A. htTLV according to sex and class of PKD gene B.

htTLV according to sex and class of PKD gene

93

Table S1. Distribution of PKD genotype according to liver cysts

VariablesNo liver cyst,

>40 yr

Liver cyst,

>40 yrP value

N (%) 54 (10.5) 458 (89.5) 　

Male, 512, n (%) 32 (59.3) 197 (43.0) 0.023

Age, 512, [mean±SD] 54.2 ± 10.5 53.4 ± 8.7 0.876

1st visit age,391,

[mean±SD]47.7 ± 11.1 48.9 ± 9.7 0.566

Hypertension,

478, n (%)41 (75.9) 363 (79.3) 0.389

Hypertension Dx age,

372, [median [IQR]]44.0 [35.4, 51.7] 42.4 [37.4, 48.0] 0.467

htTLV, 512,

[median [IQR]]806 [697, 915] 1030 [827, 1530] <0.001

htTKV, 510,

[median [IQR]]613 [338, 1000] 976 [547, 1611] <0.001

eGFR by EPI, 508,

[mean±SD]61.1 ± 34.9 51.8 ± 35.1 0.067

ESRD, 512, n (%) 7 (13.0)　 120 (26.2)　 0.031

ESRD age, 512,

[median IQR]]51.9 [45.5, 64.0] 53.7 [48.4, 60.4] 0.690

Grade, 512,

[n(%)]

1A 14 (25.9) 41 (9.0)

<0.001

1B 19 (35.2) 118 (25.8)

1C 14 (25.9) 156 (34.1)

1D 5 (9.3) 110 (24.0)

1E 2 (3.7) 33 (7.2)

94

Chi-square test; eGFR of subjects with RRT or kidney

transplantation was regarded as 15 mL/min/1.73 m2.

Abbreviations: SD, standard deviation; Dx, diagnosis; IQR,

interquartile range; htTLV, height-adjusted total liver volume; htTKV,

height-adjusted total kidney volume; eGFR, estimated glomerular flow

rate; CKD-EPI, The Chronic Kidney Disease Epidemiology

Collaboration; RRT, renal replacement therapy.

Figure S4. Distribution patterns of PKD genotypes according to

no liver cyst and liver cyst groups

95

ID Genotype Function AA Domain

A PKD1 PTFrameshift

insertionp.(Ala3888Cysfs*72) Cytoplasmic

B PKD1 PTLarge

deletionp.(Gly3792_Ser3797del) Extracellular

C PKD1 PT Stopgain p.(Ser523*) C-type lectin

D PKD1+2Missense(I)

+ Large deletion

p.(Gly1275Val)

+ Intron2-exon5PKD

E PKD1 ID Inframe indel p.(Ser689del) Extracellular

Age htTLV Genotype ExonicFunc AA Domain

30.1 2391 PKD2 Stopgain p.(Arg361*) ExtracellularSequence

31.9 1657 PKD2Missense(I)+largedeletionlargedeletion

p.(Gly1275Val)+ Intron2-exon5

CytoplasmicSequence

57.0 2084 PKD2 Stopgain p.(Arg803*) Linker

43.5 1950 PKD1 PT frameshiftdeletion p.(Gln1174Alafs*36) PKD

46.1 1668 PKD1 PT Stopgain p.(Gln1621*) PKD

46.1 2850 PKD1 PT frameshiftinsertion

p.(Pro2674Alafs*148) REJ

47.9 2831 PKD1 PT frameshiftinsertion p.(Thr2582Glnfs*39) REJ

49.4 1615 PKD1 PT frameshiftinsertion p.(Ala3888Cysfs*72) Cytoplasmic

Sequence

Table S2. Mutations of moderate to severe PLD families

Table S3. Mutations of moderate to severe PLD males

96

51.0 1873 PKD1 PT Stopgain p.(Gln1483*) PKD

51.2 1631 PKD1 PTframeshiftdeletion p.(Val4034Cysfs*4) transmem

brane

53.0 2401 PKD1 PT frameshiftdeletion p.(Gly3792_Ser3797del) Extracellular

Sequence

53.1 1664 PKD1 PT frameshiftdeletion p.(Arg4202Profs*146) Cytoplasmic

Sequence

54.8 1743 PKD1 PTframeshiftinsertion p.(Glu2335Glyfs*85) REJ

56.0 1970 PKD1 PT frameshiftinsertion p.(Leu1551Alafs*27) PKD

58.0 1975 PKD1 PT frameshiftdeletion p.(Val4034Cysfs*4) transmem

brane

58.1 2460 PKD1 PT frameshiftdeletion p.(Arg1672Glyfs*98) PKD

58.8 2249 PKD1 PT frameshiftinsertion p.(Gly668Trpfs*46)

LDL-receptorclass A

61.7 3199 PKD1 PT Stopgain p.(Lys3619*) CytoplasmicSequence

63.4 3142 PKD1 PT frameshiftdeletion p.(Leu4073Valfs*82) Extracellular

Sequence

40.9 2464 PKD1 NT Missense p.(Cys3043Tyr) GPS

43.7 2040 PKD1 NT Missense p.(Ala2783Pro) REJ

46.7 3156 PKD1 NT Missense p.(Leu3673Pro) ExtracellularSequence

52.2 2154 PKD1 NT Missense p.(Leu464Pro)C - t y p electin

57.3 1833 PKD1 NT Missense p.(Thr2445Pro) REJ

39.2 1778 PKD1 ID small indel p.(Ser689del) ExtracellularSequence

38.8 2352 NM

52.1 2768 NM

57.6 1781 NM

65.3 1822 NM

97

Figure S5. Mutations of moderate to severe PLD males

98

Figure S6. Diagram of study patient selection in HOPE PKD

cohort

99

REFERENCES

1stpart:Kidney

1. Heyer, CM, Sundsbak, JL, Abebe, KZ, Chapman, AB, Torres, VE,

Grantham, JJ, Bae, KT, Schrier, RW, Perrone, RD, Braun, WE:

Predicted mutation strength of nontruncating PKD1 mutations aids

genotype-phenotype correlations in autosomal dominant polycystic

kidney disease. Journal of the American Society of Nephrology, 27:

2872-2884, 2016.

2. Carrera, P, Calzavara, S, Magistroni, R, Den Dunnen, JT, Rigo, F,

Stenirri, S, Testa, F, Messa, P, Cerutti, R, Scolari, F: Deciphering

Variability of PKD1 and PKD2 in an Italian Cohort of 643 Patients

with Autosomal Dominant Polycystic Kidney Disease (ADPKD).

Scientific reports, 6: 30850, 2016.

3. Yang, T, Meng, Y, Wei, X, Shen, J, Zhang, M, Qi, C, Wang, C, Liu,

J, Ma, M, Huang, S: Identification of novel mutations of PKD1

gene in Chinese patients with autosomal dominant polycystic

kidney disease by targeted next-generation sequencing. Clinica

Chimica Acta, 433: 12-19, 2014.

4. Jin, M, Xie, Y, Chen, Z, Liao, Y, Li, Z, Hu, P, Qi, Y, Yin, Z, Li, Q,

Fu, P: System analysis of gene mutations and clinical phenotype in

Chinese patients with autosomal-dominant polycystic kidney

disease. Scientific reports, 6: 35945, 2016.

100

5. Rossetti, S, Kubly, VJ, Consugar, MB, Hopp, K, Roy, S, Horsley,

SW, Chauveau, D, Rees, L, Barratt, TM, Van't Hoff, WG:

Incompletely penetrant PKD1 alleles suggest a role for gene dosage

in cyst initiation in polycystic kidney disease. Kidney international,

75: 848-855, 2009.

6. Hateboer, N, v Dijk, MA, Bogdanova, N, Coto, E, Saggar-Malik,

AK, San Millan, JL, Torra, R, Breuning, M, Ravine, D, Group,

EP-PS: Comparison of phenotypes of polycystic kidney disease

types 1 and 2. The Lancet, 353: 103-107, 1999.

7. Hateboer, N, Veldhuisen, B, Peters, D, Breuning, MH, San-Millán,

JL, Bogdanova, N, Coto, E, Dijk, MA, Afzal, AR, Jeffery, S:

Location of mutations within the PKD2 gene influences clinical

outcome. Kidney international, 57: 1444-1451, 2000.

8. Kim, H, Park, HC, Ryu, H, Kim, K, Kim, HS, Oh, KH, Yu, SJ,

Chung, JW, Cho, JY, Kim, SH, Cheong, HI, Lee, K, Park, JH, Pei,

Y, Hwang, YH, Ahn, C: Clinical Correlates of Mass Effect in

Autosomal Dominant Polycystic Kidney Disease. PloS one, 10:

e0144526, 2015.

9. Ryu, H, Kim, H, Park, HC, Kim, H, Cho, EJ, Lee, KB, Chung, W,

Oh, KH, Cho, JY, Hwang, YH: Total kidney and liver volume is a

major risk factor for malnutrition in ambulatory patients with

autosomal dominant polycystic kidney disease. 18: 22, 2017.

10. D'Agnolo, HMA, Casteleijn, NF, Gevers, TJG, de Fijter, H, van

Gastel, MDA, Messchendorp, AL, Peters, DJM, Salih, M,

101

Soonawala, D, Spithoven, EM, Visser, FW, Wetzels, JFM, Zietse, R,

Gansevoort, RT, Drenth, JPH: The Association of Combined Total

Kidney and Liver Volume with Pain and Gastrointestinal Symptoms

in Patients with Later Stage Autosomal Dominant Polycystic

Kidney Disease. American journal of nephrology, 46: 239-248, 2017.

11. Irazabal, MV, Rangel, LJ, Bergstralh, EJ, Osborn, SL, Harmon, AJ,

Sundsbak, JL, Bae, KT, Chapman, AB, Grantham, JJ, Mrug, M,

Hogan, MC, El-Zoghby, ZM, Harris, PC, Erickson, BJ, King, BF,

Torres, VE: Imaging classification of autosomal dominant polycystic

kidney disease: a simple model for selecting patients for clinical

trials. Journal of the American Society of Nephrology : JASN, 26:

160-172, 2015.

12. Sherstha, R, McKinley, C, Russ, P, Scherzinger, A, Bronner, T,

Showalter, R, Everson, GT: Postmenopausal estrogen therapy

selectively stimulates hepatic enlargement in women with autosomal

dominant polycystic kidney disease. Hepatology, 26: 1282-1286,

1997.

13. Chebib, FT, Jung, Y, Heyer, CM, Irazabal, MV, Hogan, MC, Harris,

PC, Torres, VE, El-Zoghby, ZM: Effect of genotype on the severity

and volume progression of polycystic liver disease in autosomal

dominant polycystic kidney disease. Nephrology Dialysis

Transplantation, 31: 952-960, 2016.

14. Kim, H, Koh, J, Park, SK, Oh, KH, Kim, YH, Kim, Y, Ahn, C, Oh,

YK: Baseline Characteristics of the Autosomal Dominant Polycystic

102

Kidney Disease Subcohort of the KoreaN Cohort Study for

Outcomes in Patients With Chronic Kidney Disease (KNOW‐

CKD). Nephrology, 2018.

15. Stevens, LA, Schmid, CH, Greene, T, Zhang, YL, Beck, GJ,

Froissart, M, Hamm, LL, Lewis, JB, Mauer, M, Navis, GJ:

Comparative performance of the CKD Epidemiology Collaboration

(CKD-EPI) and the Modification of Diet in Renal Disease (MDRD)

Study equations for estimating GFR levels above 60 mL/min/1.73

m2. American Journal of Kidney Diseases, 56: 486-495, 2010.

16. Hogan, MC, Masyuk, TV, Page, LJ, Kubly, VJ, Bergstralh, EJ, Li,

X, Kim, B, King, BF, Glockner, J, Holmes, DR, 3rd, Rossetti, S,

Harris, PC, LaRusso, NF, Torres, VE: Randomized clinical trial of

long-acting somatostatin for autosomal dominant polycystic kidney

and liver disease. Journal of the American Society of Nephrology :

JASN, 21: 1052-1061, 2010.


Sundsbak, JL, Bae, KT, Chapman, AB, Grantham, JJ, Mrug, M:

Imaging classification of autosomal dominant polycystic kidney

disease: a simple model for selecting patients for clinical trials.

Journal of the American Society of Nephrology, 26: 160-172, 2015.

18. Prydz, K, Fagereng, GL, Tveit, H: The Role of Glycans in Apical

Sorting of Proteins in Polarized Epithelial Cells. In: Glycosylation.

InTech, 2012.

103

19. Weston, BS, Bagnéris, C, Price, RG, Stirling, JL: The polycystin-1

C-type lectin domain binds carbohydrate in a calcium-dependent

manner, and interacts with extracellular matrix proteins in vitro.

Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease,

1536: 161-176, 2001.


Fu, P, Chen, X: System analysis of gene mutations and clinical

phenotype in Chinese patients with autosomal-dominant polycystic

kidney disease. Sci Rep, 6: 35945, 2016.

2stpart:Liver

1. Heyer, CM, Sundsbak, JL, Abebe, KZ, Chapman, AB, Torres, VE,

Grantham, JJ, Bae, KT, Schrier, RW, Perrone, RD, Braun, WE:

Predicted mutation strength of nontruncating PKD1 mutations aids

genotype-phenotype correlations in autosomal dominant polycystic

kidney disease. Journal of the American Society of Nephrology, 27:

2872-2884, 2016.

2. Carrera, P, Calzavara, S, Magistroni, R, Den Dunnen, JT, Rigo, F,

Stenirri, S, Testa, F, Messa, P, Cerutti, R, Scolari, F: Deciphering

Variability of PKD1 and PKD2 in an Italian Cohort of 643 Patients

with Autosomal Dominant Polycystic Kidney Disease (ADPKD).

Scientific reports, 6: 30850, 2016.

104

3. Yang, T, Meng, Y, Wei, X, Shen, J, Zhang, M, Qi, C, Wang, C, Liu,

J, Ma, M, Huang, S: Identification of novel mutations of PKD1

gene in Chinese patients with autosomal dominant polycystic

kidney disease by targeted next-generation sequencing. Clinica

Chimica Acta, 433: 12-19, 2014.


Fu, P: System analysis of gene mutations and clinical phenotype in

Chinese patients with autosomal-dominant polycystic kidney

disease. Scientific reports, 6: 35945, 2016.

5. Rossetti, S, Kubly, VJ, Consugar, MB, Hopp, K, Roy, S, Horsley,

SW, Chauveau, D, Rees, L, Barratt, TM, Van't Hoff, WG:

Incompletely penetrant PKD1 alleles suggest a role for gene dosage

in cyst initiation in polycystic kidney disease. Kidney international,

75: 848-855, 2009.

6. Hateboer, N, v Dijk, MA, Bogdanova, N, Coto, E, Saggar-Malik,

AK, San Millan, JL, Torra, R, Breuning, M, Ravine, D, Group,

EP-PS: Comparison of phenotypes of polycystic kidney disease

types 1 and 2. The Lancet, 353: 103-107, 1999.

7. Hateboer, N, Veldhuisen, B, Peters, D, Breuning, MH, San-Millán,

JL, Bogdanova, N, Coto, E, Dijk, MA, Afzal, AR, Jeffery, S:

Location of mutations within the PKD2 gene influences clinical

outcome. Kidney international, 57: 1444-1451, 2000.

105

8. Kim, H, Park, HC, Ryu, H, Kim, K, Kim, HS, Oh, KH, Yu, SJ,

Chung, JW, Cho, JY, Kim, SH, Cheong, HI, Lee, K, Park, JH, Pei,

Y, Hwang, YH, Ahn, C: Clinical Correlates of Mass Effect in

Autosomal Dominant Polycystic Kidney Disease. PloS one, 10:

e0144526, 2015.

9. Ryu, H, Kim, H, Park, HC, Kim, H, Cho, EJ, Lee, KB, Chung, W,

Oh, KH, Cho, JY, Hwang, YH: Total kidney and liver volume is a

major risk factor for malnutrition in ambulatory patients with

autosomal dominant polycystic kidney disease. 18: 22, 2017.

10. D'Agnolo, HMA, Casteleijn, NF, Gevers, TJG, de Fijter, H, van

Gastel, MDA, Messchendorp, AL, Peters, DJM, Salih, M,

Soonawala, D, Spithoven, EM, Visser, FW, Wetzels, JFM, Zietse, R,

Gansevoort, RT, Drenth, JPH: The Association of Combined Total

Kidney and Liver Volume with Pain and Gastrointestinal Symptoms

in Patients with Later Stage Autosomal Dominant Polycystic

Kidney Disease. American journal of nephrology, 46: 239-248, 2017.


Sundsbak, JL, Bae, KT, Chapman, AB, Grantham, JJ, Mrug, M,

Hogan, MC, El-Zoghby, ZM, Harris, PC, Erickson, BJ, King, BF,

Torres, VE: Imaging classification of autosomal dominant polycystic

kidney disease: a simple model for selecting patients for clinical

trials. Journal of the American Society of Nephrology : JASN, 26:

160-172, 2015.

106

12. Sherstha, R, McKinley, C, Russ, P, Scherzinger, A, Bronner, T,

Showalter, R, Everson, GT: Postmenopausal estrogen therapy

selectively stimulates hepatic enlargement in women with autosomal

dominant polycystic kidney disease. Hepatology, 26: 1282-1286,

1997.

13. Chebib, FT, Jung, Y, Heyer, CM, Irazabal, MV, Hogan, MC, Harris,

PC, Torres, VE, El-Zoghby, ZM: Effect of genotype on the severity

and volume progression of polycystic liver disease in autosomal

dominant polycystic kidney disease. Nephrology Dialysis

Transplantation, 31: 952-960, 2016.


Sundsbak, JL, Bae, KT, Chapman, AB, Grantham, JJ, Mrug, M:

Imaging classification of autosomal dominant polycystic kidney

disease: a simple model for selecting patients for clinical trials.

Journal of the American Society of Nephrology, 26: 160-172, 2015.

15. Prydz, K, Fagereng, GL, Tveit, H: The Role of Glycans in Apical

Sorting of Proteins in Polarized Epithelial Cells. In: Glycosylation.

InTech, 2012.

16. Weston, BS, Bagnéris, C, Price, RG, Stirling, JL: The polycystin-1

C-type lectin domain binds carbohydrate in a calcium-dependent

manner, and interacts with extracellular matrix proteins in vitro.

Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease,

1536: 161-176, 2001.

107


Fu, P, Chen, X: System analysis of gene mutations and clinical

phenotype in Chinese patients with autosomal-dominant polycystic

kidney disease. Sci Rep, 6: 35945, 2016.

18. Cornec-Le Gall E, Torres VE, Harris PC. Genetic complexity of

autosomal dominant polycystic kidney and liver diseases. Journal of

the American Society of Nephrology. 2018;29(1):13-23.

19. Besse W, Dong K, Choi J, Punia S, Fedeles SV, Choi M, et al.

Isolated polycystic liver disease genes define effectors of

polycystin-1 function. The Journal of clinical investigation.

2017;127(5):1772-85.

20. Porath B, Gainullin VG, Cornec-Le Gall E, Dillinger EK, Heyer

CM, Hopp K, et al. Mutations in GANAB, encoding the glucosidase

IIα subunit, cause autosomal-dominant polycystic kidney and liver

disease. The American Journal of Human Genetics.

2016;98(6):1193-207.

21. D'agnolo HM, Drenth JP. Risk factors for progressive polycystic

liver disease: where do we stand? Nephrology Dialysis

Transplantation. 2015;31(6):857-9.

22. Kim, H, Koh, J, Park, SK, Oh, KH, Kim, YH, Kim, Y, Ahn, C, Oh,

YK: Baseline Characteristics of the Autosomal Dominant Polycystic

Kidney Disease Subcohort of the KoreaN Cohort Study for

108

Outcomes in Patients With Chronic Kidney Disease

(KNOW‐CKD). Nephrology, 2018.

23. Stevens, LA, Schmid, CH, Greene, T, Zhang, YL, Beck, GJ,

Froissart, M, Hamm, LL, Lewis, JB, Mauer, M, Navis, GJ:

Comparative performance of the CKD Epidemiology Collaboration

(CKD-EPI) and the Modification of Diet in Renal Disease (MDRD)

Study equations for estimating GFR levels above 60 mL/min/1.73

m2. American Journal of Kidney Diseases, 56: 486-495, 2010.

24. Hogan, MC, Masyuk, TV, Page, LJ, Kubly, VJ, Bergstralh, EJ, Li,

X, Kim, B, King, BF, Glockner, J, Holmes, DR, 3rd, Rossetti, S,

Harris, PC, LaRusso, NF, Torres, VE: Randomized clinical trial of

long-acting somatostatin for autosomal dominant polycystic kidney

and liver disease. Journal of the American Society of Nephrology :

JASN, 21: 1052-1061, 2010.

109

국문초록

상염색체우성다낭신(다낭신)은 신장 및 간을 침범하는 단일유전자 질환

이며, 말기신부전의 주요 원인이다. 유전 정보는 다낭신의 병인을 이해하

는 데 있어 가장 중요하다. 따라서 본 연구는 524 명의 비혈연 가계 및

749 명의 다낭신 환자를 대상으로 다낭신의 유전적 특성과 PKD1/2 유전

자 돌연변이가 신장 및 간에 미치는 영향을 확인하고자 하였다.

표적 엑솜 시퀀싱 및 PKD1 유전자의 exon 1의 Sanger 시퀀싱,

multiple ligation probe assay를 사용하여 PKD1/2의 유전 연구를 수행했

다. 총신장부피는 타원체법으로 계산하였고, 총 간부피는 sterology로 측

정하여 키로 보정 하였다(htTKV, htTLV). htTLV를 사용하여 질병의

중증도는 경증 (htTLV <1,600 mL / m), 중증도 (htTLV 1,600-2,400 mL

/ m) 및 중증 다낭간 (htTLV> 2,400 mL / m)로 분류되었다. 다낭신 또

는 다낭간의 중증도와 유전자 돌연변이의 유형 및 위치의 연관성을 가족

수준 클러스터링 (가족 내 상관 관계)을 제어하는 fraility 모델을 사용하

여 분석했다. 돌연변이 검출률은 80.7 % (423/524 가족, 331 돌연변이) 였

고 70.7 %는 새로운 돌연변이였다. PKD1 단백질 절단 유전자형

(PKD1-PT)은 진단 연령이 낮고, htTKV가 크며, PKD1 비 절단 및

PKD2 유전자형과 비교하여 신장 기능이 낮았다. PKD1 유전자형은

PKD2 유전자형 (53.3 [46.7, 58.7] vs. 63.1 [54.0, 67.5], P = 0.003)에 비해

빠르게 말기신부전으로 진행하였다. PKD2 유전자형은 연령, 성별, 가족

수준 클러스터링을 통제한 모델에서 PKD1-PT 유전자형보다 말기신부전

의 위험도가 0.2 배로 낮았다 (p = 0.037).

110

간 표현형에 관해서는, 심한 다낭간은 말기신부전으로의 보다 빠른 진

행과 더 높은 사망률을 보였다. 성별, 연령, 많은 임신 횟수 및 여성 호르

몬의 사용은 중등도 및 중증의 다낭간과 관련이 있었다. 중등도 및 중증

의 다낭간에 가족 수준의 클러스터링 효과가 있었다. 그러나 신장과 달리

다낭간의 심한 정도는 PKD1 / 2 유전자형과 상관이 없었고 단지 돌연변

이가 검출되지 않은 환자에서 경증 다낭간의 빈도가 증가했다. 또한,

C-type lectin domain은 PKD1 단백질 절단과 비절단 돌연변이 모두에서

중등도 및 중증의 다낭간에 대한 잠재적 온점 일 수 있음을 확인하였다.

결론적으로, 본 연구는 유전형이 한국인의 새로운 치료 요법에 대한 신

장의 빠른 진행자를 선택하는데 기여할 수 있음을 시사한다. 그러나 PLD

에 대한 추가적인 유전적 영향을 연구해야 한다.

…………………………………………………………………………………

주요어 : 상염색체우성다낭신, 표적엑솜시퀀싱, 빠른

진행자, 중증도 및 중증의 다낭간, 유전자변이, 표현형

학 번 : 2015-30809

Genetic Characteristics of Korean Patients with Autosomal...

Documents

Transcript of Genetic Characteristics of Korean Patients with Autosomal...