Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

134
Potential Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential data. Jing-Doo Wang (王經篤) Associate Professor Asia University, Taiwan. 第八屆台灣 Hadoop 社群年會 HadoopCon 2016 中央 研究院人文社會科學館 (2016.9.10)

Transcript of Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Page 1: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Potential Applications using the Class Frequency Distribution of Maximal Repeats

from Tagged Sequential data.Jing-Doo Wang (王經篤)

Associate ProfessorAsia University, Taiwan.

第八屆台灣 Hadoop 社群年會 HadoopCon 2016

中央研究院人文社會科學館 (2016.9.10)

Page 2: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

簡介

• 現職:– 亞洲大學資訊工程系副教授

• 最高學歷:– 中正大學資訊工程所博士

• 專長– 文字挖掘(Text Mining)-Maximal Repeat Extraction

– 生物資訊(Bioinformatics)、

– 雲端計算(Cloud Computing)、

– 類別架構模糊度之評估(Class Structure Ambiguity Evaluation)

Page 3: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

亞洲現代美術館

http://www.asia.edu.tw/main_1.php?yard/yard_01_09

Page 4: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Asia University Hospital 亞洲大學附屬醫院(2016.8.1)

http://www.auh.org.tw/web/index.php

Page 5: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

精準醫療(Precision Medicine)

• 臨床醫學 (中國醫大醫院+亞大附屬醫院)

• 基因體定序(Next Generation Sequencing)

• 生物資訊(亞洲大學)

http://newjust.masterlink.com.tw/HotProduct/HTML/img/GetImg.xdjpng?A=PA314-1b.png

Page 6: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

大數據研究中心

(一)建置本校大數據研究設備。(二)整合精準醫療相關資料庫。(三)執行有關大數據之學術交流、研究發展與產學事宜。(四)執行有關大數據之推廣服務相關事宜。

Page 7: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

學士後第二專長學士學位學程

申請學校: 亞洲大學

新設學程: 學士後巨量資料處理與分析學程

Page 8: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Outline

• Introduction

• Pattern History For Trend Analysis

• Product Traceability for Quality Monitoring

• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences

• Future Works

Page 9: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

9

“xabcyiiizabcqabcyrxar”

• ab

• bc

• abc

• abcy

Not Maximal repeat Pattern

Maximal Repeat Pattern

Yes

Yes

Page 10: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Why use “Maximal Repeats ”as features?

• Dictionary

– How to identify new words or phrases?

– e.g. “just do it”, “洪荒之力”。

• N-gram

– 2-gram, 3-gram,…,5-grams. (Google Ngram viewer)

– The value of “N” is limited.

• Maximal Repeat

– The length of maximal repeat is variable.

Page 11: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Journal of Supercomputing, 72(8), pp. 3236-3260,April 2016

Page 12: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

專利概念圖一個最大重複樣式抽取與計算

出現次數分布表格之方法

Page 13: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Patent Application Serial Number(US 15/208,994)( 申請中)

• Wang, Ching-Tu. Method for Extracting Maximal Repeat Patterns and Computing Frequency Distribution Tables. Patent Application Serial Number 15/208,994. 13 July 2016.

• 申請美國發明專利PA

– 所有權:王經篤

– 發明人:王經篤

Page 14: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

The flowchart of Maximal Repeat Extraction via MapReduce

Page 15: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Right Boundary Verification

Page 16: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Left Boundary Verification

Page 17: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Boundary Verification(Left&Right)

Page 18: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Outline

• Introduction

• Pattern History For Trend Analysis

• Product Traceability for Quality Monitoring

• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences

• Future Works

Page 19: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Pattern History for Trend Analysis

Jing-Doo Wang (王經篤)

Associate Professor

Asia University, Taiwan.

2016/9/12 19FSKD 20'11

Sequential Data + Timestamp

Page 20: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

2016/9/12 20FSKD 20'11

Page 21: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Experimental Results

• Download the abstracts of articles in PubMedfrom 1990~2014.(25 years)

2016/9/12 21FSKD 20'11

Page 22: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

PubMed (1990~2014)(25 years)

The abstracts and titles of 14,473,242 articles =>about 12 GB

2016/9/12 22FSKD 20'11

Page 23: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Page 24: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

The Abstracts and Titles of PubMed Articles (1990~2014)(12GB)

6 PCs=> 5 hours

Page 25: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

The History of a Significant Pattern顯要樣式歷史

The history of a significant pattern is the frequency distribution of that pattern over equally spaced time intervals.

25

Page 26: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Significant Pattern (顯要樣式)

• A significant pattern is one maximal repeat of consecutive words within texts.

26

(Length=1) TDP-43(Length=1)SARS(Length=1)H1N1(Length=5)non-small cell lung cancer (NSCLC)(Length=6)75 g oral glucose tolerance test(Length=6)4 x 4 Latin square design(Length=7)2 x 2 factorial arrangement of treatments(Length=9)the National Institute of Child Health and Human Development(Length=10)patients with squamous cell carcinoma of the head and neck(Length=11)anomalous origin of the left coronary artery from the pulmonary artery(Length=12)Pregnancy and Childbirth Group trials register and the Cochrane Controlled Trials Register(Length=13)the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire

Page 27: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

The History of the “HINI”

27

Page 28: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

(50,75,100) g Oral Glucose

Tolerance Test

2016/9/12 FSKD 20'11 28

Page 29: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

The (goal, aim, purpose) of this study

2016/9/12 FSKD 20'11 29

aim

goal

purpose

Page 30: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Archaeology(考古學)

https://zh.wikipedia.org/wiki/%E8%80%83%E5%8F%A4%E5%AD%A6

Page 31: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

On-Line.– PubMed (Medicine Articles)(1990-2014)– CNA (Central News Agent)(中央社新聞)(1990-1996)– 中華民國專利(1950~2009)(中華民國專利文件)

Potential Works. 法院判決案例 (Judicial Systems)– (Studies that allow closer examination of legal texts)– 全國碩博士論文– 中央圖書館– 中華明國專利(Patents)– 小說– 金庸小說全集– Harry Potter(哈力波特)– Shakespeare– 九把刀– Blogs.– News.

Text Archaeology(文件考古學)

圖:https://www.google.com.tw/search

Page 32: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

個人化學習樣式歷史雲

http://www.tutortristar.com/topic/Internet/seminar-big-data-marketing-beckett-20151005.html?utm_source=facebook&utm_medium=banner&utm_campaign=seminar-bigdata-marketing

Page 33: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

老人語言智力檢測(重複性話語)

你吃飽了沒!

https://www.google.com.tw/search

Page 34: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

新增語言詞彙檢測(口頭禪)

https://www.google.com.tw/search

Page 35: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

如何收集口語?轉換成文字?

Google’s Speech Recognition API

Version 2

https://www.google.com.tw/search

Page 36: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

時間標籤文字+地理位置

https://www.google.com.tw/search

Page 37: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Outline

• Introduction

• Pattern History For Trend Analysis

• Product Traceability for Quality Monitoring

• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences

• Future Works

Page 38: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

農產品Traceability (產銷履歷)

• 農產品產銷履歷制度=臺灣良好農業規範實施及驗證+履歷追溯體系

• 產銷履歷—安全、永續、資訊公開之可追溯農產品

From: http://taft.coa.gov.tw/ct.asp?xItem=4&CtNode=206&role=C

Page 39: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

χ α β ϒ θ φ樣式(Symbol)

序列化(Serialization)數值(Numeric Value)

生產履歷(Product Traceability)

樣式化(Symbolization)

大陸惠州 /蘇州

紙本記錄自動量測機量測紙本記錄X-R管制 目視管制顯微鏡量測照片檔

紙本記錄X-R管制

Icon: http://www.freepik.com

自動量測機量測 自動量測機量測

Page 40: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

40

產銷履歷=>

********************$***********

********************$***********

*****&*******%******************

********************************

*****&*******%******************

*****************************************#****?************@****

********************************

*********#****?************@****

*****&*******%******************

*********#****?***********@*****

Products

(產品)

Steps(步驟)專家意見

Page 41: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

41

特殊事件(訊號)序列(某類別中重複出現的序列)

*********************************************************************************************************#****?************@****

*********#****?************@*************#****?************@****

*****&*******%***********************&*******%******************

*****&*******%******************

********************$*******************************$***********

專家意見

Page 42: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

特殊事件(訊號)序列

事件(訊號)序列

#****?

@****

&*******%

$**********

*****

Page 43: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

工業4.0

tw.digiwin.biz

Page 44: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

物聯網(IoT)

http://www.indetail.com.tw/archives/2494

Page 45: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

物聯網(IoT)+ 工業4.0=?

http://portal.stpi.narl.org.tw/index/article/10095

Page 46: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

物聯網(IoT)+ 工業4.0

資料處理=>BigData!

www.slideshare.net

Page 47: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

監控與異常分析

http://innobic.blogspot.tw/

Page 48: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

單點故障(single point of failure)

www.ontargetpartners.com

Page 50: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

統計分析

newgenerationresearcher.blogspot.com

Page 51: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

統計分析觀點

• 單點故障=>單變數分析

• 多點故障=>多變數分析(multivariate analysis)

www.slideshare.net

Page 52: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

意外(Accident)?=多點故障(Multiple Points of Failure)

multiple points of failureops.fhwa.dot.gov

Page 54: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

(一連串事件的組合!)=>意外

• 無法預測?

www.frillo.co.uk

Page 55: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Wu Gui says "There are no accidents" -

https://www.youtube.com/watch?v=Q04LPj99ZPc

Page 58: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

從資料處理的角度來看這些代表了甚麼?

http://superbest.typepad.com/.a/6a00d83451b39269e20147e

25f0551970b-pisuperbest.typepad.com

Page 59: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

每件【物品】的背後一連串的數字(符號)

www.36dsj.com big5.xinhuanet.com

Page 60: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

「工業4.0」吹響號角加速資訊化與智慧化融合提升「智造」競爭力

http://tw.digiwin.biz/newsListDetail_6828.html

Page 62: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

62

物聯網(IoT) vs. 產銷履歷(Traceability)

********************************

********************************

********************************

********************************

********************************

****************************************************************

********************************

********************************

***********^********************

********************************

Products

(產品)

Steps(步驟)

Page 63: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

63

物聯網(IoT) vs. 產銷履歷(Traceability)

********************$***********

********************$***********

*****&*******%******************

********************************

*****&*******%******************

*****************************************#****?************@****

********************************

*********#****?************@****

*****&*******%******************

*********#****?***********@*****

Products

(產品)

Steps(步驟)

blog.tianya.cn news.hxsd.com

Page 64: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

專家意見=> 標籤

Page 65: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)

P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

100

Products

(產品)

10 Steps(步驟)

Page 66: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(1)

…[S3] [S4] [S5]…https://www.google.com.tw/search

Page 67: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(1)

P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Detect Abnormal (發現異常)

www.publicdomainpictures.net

Page 68: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(2)

https://www.google.com.tw/search

…[S3] [S4] [S5]…

Page 69: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(2)

P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Detect Abnormal (發現異常)

=>產品(半成品)完成時

Page 70: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

逐一檢查產銷過程?

• P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

• P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Figure: www.yuhing.edu.tw

Page 72: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(3)

P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Page 73: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(3)

P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10……P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

Page 74: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

特殊事件(訊號)序列

事件(訊號)序列S5, S6, S7

S1, S2, S3, S4

S1, S2, S3

S8, S9, S10

S1, S2, S3, S4, S5, S6, S7, S8, S9, S10

S9, S10https://www.google.com.tw/search

Page 75: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(2)

Ma_1

Ma_2

Mb_1

Mb_2

Ma_3

Mc_1

Mc_2

https://www.google.com.tw/search

…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…

Page 76: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(4)[Step,Machine]

P1 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P2 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P3 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P4 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P5 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P6 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,Mj]………P99 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P100 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]

10 Steps(步驟)

100 Products

(產品)

Page 77: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(2)

Ma_1

Ma_2

Mb_1

Mb_2

Ma_3

Mc_1

Mc_2

https://www.google.com.tw/search

…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…

Page 78: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-……-[S3,Ma_1]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_2]-…

…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…

…-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_1]-…

Page 79: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

…-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_3]-[S5,Mc_1]-…

…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…

…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…

Page 80: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

特殊事件(訊號)序列

…-[S3,Ma_1]

[S4,Mb_2]-…

[S4,Mb_2]-…

…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…

-[S5,Mc_1]-…

事件(訊號)序列

https://www.google.com.tw/search

Page 81: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(5)[Step,Machine,Time]

P1 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P2 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P3 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P4 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P5 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P6 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]………P99 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P100 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]

Page 82: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability (產銷履歷)(6)[Step,Machine,Time]

P1 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P2 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P3 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P4 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P5 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P6 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]………P99 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P100 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]

Page 83: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

逆向思考

• 實驗中找出為何{ 好}的原因?

frank727.pixnet.net

Page 84: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

城市獵人沙漠之鷹(萬中選一)

detail.chiebukuro.yahoo.co.jp

www.aigouboke.com

Page 85: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

85

特殊事件(訊號)序列(某類別中重複出現的序列)

************#*@**&*************************#*@**&*************************#*@**&****************@*****#****?******#*****@*************#****?************@*************#****?********&***@***

*****&***@***%******#****************&*******%******************

*****&*******%******************

***@****************$******@**************#*********$***@*******

專家意見

Page 86: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

電容-生產履歷(Traceability) Tags + Sequential Data

• 序列資料 (Sequential Data)

– 生產過程中-自動化數據收集

• (材料批號、機台、時間等)。

• 專家意見(Tags)

– (出廠前)電容品質

• (標籤: ok, 缺失A, 缺失B, 缺失C,… )

– (出廠後) 電容保固期間損壞

• (標籤: ok, 缺失D, 缺失E, 缺失F,… )

Page 87: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

研華股份有限公司(研華)Advantech Co.,Ltd.

From: http://www.advantech.com.tw/

Page 88: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

打造企業的資料湖泊( ETU )

From:http://etusolution.com/index.php/tw/solution-tw/etu-datalake

生產履歷分析

(jdwang: Maximal Repeat Extraction)

Page 89: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

工業4.0 製造業產銷履歷分析

製造業工廠(Factory)

嵌入式系統業(Embedded

System)

雲端計算服務平台 (Cloud Computing Platform)

產銷履歷-分析

(Maximal Repeat

Extraction)

品管部門(Quality

Department )

管理決策(Decision

Management)

Page 90: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

http://d1zlh37f1ep3tj.cloudfront.net/wp/wblob/54592E651337D2/33A8/5D7B11/lCIOuvczuB3U81XGqm9nag/903_1438243768yq2L.jpg

Page 91: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

我懂【半導體產業】嗎?

http://technews.tw/2016/04/11/tsmc-and-largan/http://image.slidesharecdn.com/random-090620075332phpapp02/95/-9-728.jpg?cb=1245484455

http://www.slideshare.net/5045033/ss-1002323

Page 92: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

It will be a hard work!

http://previews.123rf.com/images/dirkercken/dirkercken1208/dirkercken120800053/14852048-hard-work-ahead-tough-job-be-ambitious-even-if-you-have-a-difficult-challenging-task-with-impact-to--Stock-Photo.jpg

Page 93: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

New Direction & Thinking!

http://switchandshift.com/11-trademarks-of-rebellious-leadership

Page 94: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

94

晶片-生產履歷

****************************************

www.iconarchive.comhttp://technews.tw/2016/04/11/tsmc-and-largan/http://www.slideshare.net/5045033/ss-1002323

Page 95: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

台積電再度與清大合辦「第2屆半導體大數據分析競賽」

http://www.appledaily.com.tw/realtimenews/article/new/20150713/647073/

Page 96: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

*************************************

*************************************

*************************************

*************************************

*************************************

*************************************

*************************************

專家意見(Tags)

時間、機台、原料(廠牌)、溫度、壓力、操作人員等

Sequential Data (序列化資料)

Page 97: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

97

********************$***********

********************$***********

*****&*******%******************

********************************

*****&*******%******************

*****************************************#****?************@****

********************************

*********#****?************@****

*****&*******%******************

*********#****?***********@*****

Products

(產品)

Sequential Data (序列化資料)

專家意見(Tags)

Tags +Sequential Data

Page 98: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

98

特殊事件(訊號)序列(某類別中重複出現的序列)

*********************************************************************************************************#****?************@****

*********#****?************@*************#****?************@****

*****&*******%***********************&*******%******************

*****&*******%******************

********************$*******************************$***********

專家意見

Page 99: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

特殊事件(訊號)序列

事件(訊號)序列

#****?

@****

&*******%

$**********

*****

Page 100: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Traceability Granularity

****************************************

########## %%%%%%%%%%

@ © ᴂ$

Page 101: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Obstacles

• Numerical Values => Symbols

• Expert Opinions

– (Class or Tag)

• Interactive Interfaces

• Computing Environment

• Chip- Traceability

http://sponsorshipgreen.com/wp-content/uploads/2015/05/overcoming-obstacles-web-1200x725.jpg

Page 102: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

推動產學合作狀況!

• 廠商、公司:有機會,再研究!?

https://includedbygrace.files.wordpress.com/2014/01/upset-character.jpg

Page 103: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Make a wish or hope!?

http://ncmissionofhope.org/updates/wp-content/uploads/2016/03/Hope.jpg

Page 104: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

回到原點!

http://books.cw.com.tw/sites/default/files/blog/images/%E6%88%91%E7%9A%84%E5%8F%B0%E6%9D%B1%E5%A4%A2.jpg

Page 105: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Outline

• Introduction

• Pattern History For Trend Analysis

• Product Traceability for Quality Monitoring

• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences

• Future Works

Page 106: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

From: http://image.slidesharecdn.com/a-systematic-approach-to-genotypephenotype-correlations-1203626174948644-4/95/a-systematic-approach-to-genotypephenotype-correlations-4-728.jpg?cb=1203597376

From: Nucleic Acids Res. 2007 Aug; 35(16): 5625–5633.

Page 107: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

異中求同(不同物種間的比對)

Page 108: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

異中求同(不同物種間的比對)

Page 109: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

異中求同(不同物種間的比對)

Page 110: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

同中求異(同物種間的比對)

Page 111: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Human 23 Chromsome

https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Karyotype.png

Page 112: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Coding Genes

https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Human_genome_to_genes_zh.png

Page 113: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

49267 Human Genes(upstream 5000 bp)

Class Types # of Genes Percent

Coding Genes 38,645 78%

NonCoding Genes 10,622 22%

Page 114: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

https://en.wikipedia.org/wiki/Gene#/media/File:DNA_to_protein_or_ncRNA.svg

Page 115: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Upstream & Downstream

********************************************************************************************************************************

****************************************************************

****************************************************************

********************************

****************************************************************

************

******

******

***

******

5000 bp 500 bp

CodingGenes

NonCodingGenes

Classes

Page 116: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Page 117: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Finding Possibility

http://engage.riggspartners.com/r-blog/3business/finding-possibility

Page 118: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

OncoGene&TumorSuppressorGenes

ClassID Class Types # of Genes Percent

Coding Genes

C1 Non-Cancer Gene 37,914 76.96%

C2 OncoGene 345 0.70%

C3TumorSuppressorGene

386 0.78%

NonCodingGenes

C4 NonCoding Gene 10,622 21.56%

Page 119: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

37,914

345

386 10,622

Human Genes(49,267 )

Non-Cancer Gene OncoGene TumorSuppressor Gene NonCoding Gene

Page 120: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Experiments1: Existence.

0: Non-Existence

Condition

C1 C2 C3 C4DF

(≧)Length

(≧)

# of MaxiamlRepeats

C1C2C3C4 1 1 1 1 1000 36 143

C2C3C4 0 1 1 1 10 10 1

C2C3 0 1 1 0 2 10 1788

C2C4 0 1 0 1 10 10 35

C3C4 0 0 1 1 10 10 53

Page 121: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Page 122: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Acknowledgements

• 張建國教授– 中國醫藥大學

• 詹雯玲助理教授– (亞洲大學)

• 王昭能助理教授– (亞洲大學)

100 萬

Page 123: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

次世代定序(Next Generation Sequencing , NGS)

http://www.slideshare.net/ueb52/introduction-to-next-generation-sequencing-v2

Page 124: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

美國總統向癌症宣戰! 全力推動抗癌登月計畫(Cancer Moonshot)

http://i2.wp.com/geneonline.news/wp-content/uploads/2016/01/Obama_precision_medicine_0130151-e1453785268417.jpg?fit=1292%2C665

Page 125: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

千兆基因組定序完成癌症研究重大里程碑

• 「加拿大卑詩癌症中心完成定序 1,000 兆位元組(Terabyte,TB)的基因序列,相當於一個拍位元組(Petabyte,PB),遠比國際性的人類基因體計畫所定序的 DNA 超出33,000 倍;科學家更找出全球第四大致死癌症——胃癌的變異基因。」

http://geneonline.news/index.php/2016/05/30/canada-bioinformatics/

Page 126: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Human 23 Chromsome

https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Karyotype.png

Page 127: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Page 128: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Maximal Repeats appearing in all of 24 human chromosomes.

• Length |Maximal Repeats| <= 500 bp

– Ok!

• Length |Maximal Repeats| <= 1000 bp

– Disk Space Full!

Page 129: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Acknowledgements

• 王耀聰 (Jazz Wang)

Page 130: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Outline

• Introduction

• Pattern History For Trend Analysis

• Product Traceability for Quality Monitoring

• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences

• Future Works

Page 131: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

國際半導體展(2016.9.9,台北南港展覽館)

Page 132: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Future Works

• Hadoop (OffLine) => Spark (OnLine)

• Next Generation Sequence (次世代基因定序)

• Product Traceability (產品履歷)

• 立隆電子-亞洲大學 (產學合作案)

• Web Logs Analysis (駭客行為?)

• User Behavior Analysis (使用者行為分析)

Page 133: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

亞洲大學-產學合作機會

• 測試生產履歷(Traceability) 儲存空間

• 巨量資料計算平台

Page 134: Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)

Thanks for your listening!感謝聆聽! 請多指教!

www.flickr.comwww.slideshare.net

http://www.pptschool.com/250.html