Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
-
Upload
jing-doo-wang -
Category
Engineering
-
view
142 -
download
0
Transcript of Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Potential Applications using the Class Frequency Distribution of Maximal Repeats
from Tagged Sequential data.Jing-Doo Wang (王經篤)
Associate ProfessorAsia University, Taiwan.
第八屆台灣 Hadoop 社群年會 HadoopCon 2016
中央研究院人文社會科學館 (2016.9.10)
簡介
• 現職:– 亞洲大學資訊工程系副教授
• 最高學歷:– 中正大學資訊工程所博士
• 專長– 文字挖掘(Text Mining)-Maximal Repeat Extraction
– 生物資訊(Bioinformatics)、
– 雲端計算(Cloud Computing)、
– 類別架構模糊度之評估(Class Structure Ambiguity Evaluation)
亞洲現代美術館
http://www.asia.edu.tw/main_1.php?yard/yard_01_09
Asia University Hospital 亞洲大學附屬醫院(2016.8.1)
http://www.auh.org.tw/web/index.php
精準醫療(Precision Medicine)
• 臨床醫學 (中國醫大醫院+亞大附屬醫院)
• 基因體定序(Next Generation Sequencing)
• 生物資訊(亞洲大學)
http://newjust.masterlink.com.tw/HotProduct/HTML/img/GetImg.xdjpng?A=PA314-1b.png
大數據研究中心
(一)建置本校大數據研究設備。(二)整合精準醫療相關資料庫。(三)執行有關大數據之學術交流、研究發展與產學事宜。(四)執行有關大數據之推廣服務相關事宜。
學士後第二專長學士學位學程
申請學校: 亞洲大學
新設學程: 學士後巨量資料處理與分析學程
Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences
• Future Works
9
“xabcyiiizabcqabcyrxar”
• ab
• bc
• abc
• abcy
Not Maximal repeat Pattern
Maximal Repeat Pattern
Yes
Yes
Why use “Maximal Repeats ”as features?
• Dictionary
– How to identify new words or phrases?
– e.g. “just do it”, “洪荒之力”。
• N-gram
– 2-gram, 3-gram,…,5-grams. (Google Ngram viewer)
– The value of “N” is limited.
• Maximal Repeat
– The length of maximal repeat is variable.
Journal of Supercomputing, 72(8), pp. 3236-3260,April 2016
專利概念圖一個最大重複樣式抽取與計算
出現次數分布表格之方法
Patent Application Serial Number(US 15/208,994)( 申請中)
• Wang, Ching-Tu. Method for Extracting Maximal Repeat Patterns and Computing Frequency Distribution Tables. Patent Application Serial Number 15/208,994. 13 July 2016.
• 申請美國發明專利PA
– 所有權:王經篤
– 發明人:王經篤
The flowchart of Maximal Repeat Extraction via MapReduce
Right Boundary Verification
Left Boundary Verification
Boundary Verification(Left&Right)
Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences
• Future Works
Pattern History for Trend Analysis
Jing-Doo Wang (王經篤)
Associate Professor
Asia University, Taiwan.
2016/9/12 19FSKD 20'11
Sequential Data + Timestamp
2016/9/12 20FSKD 20'11
Experimental Results
• Download the abstracts of articles in PubMedfrom 1990~2014.(25 years)
2016/9/12 21FSKD 20'11
PubMed (1990~2014)(25 years)
The abstracts and titles of 14,473,242 articles =>about 12 GB
2016/9/12 22FSKD 20'11
The Abstracts and Titles of PubMed Articles (1990~2014)(12GB)
6 PCs=> 5 hours
The History of a Significant Pattern顯要樣式歷史
The history of a significant pattern is the frequency distribution of that pattern over equally spaced time intervals.
25
Significant Pattern (顯要樣式)
• A significant pattern is one maximal repeat of consecutive words within texts.
26
(Length=1) TDP-43(Length=1)SARS(Length=1)H1N1(Length=5)non-small cell lung cancer (NSCLC)(Length=6)75 g oral glucose tolerance test(Length=6)4 x 4 Latin square design(Length=7)2 x 2 factorial arrangement of treatments(Length=9)the National Institute of Child Health and Human Development(Length=10)patients with squamous cell carcinoma of the head and neck(Length=11)anomalous origin of the left coronary artery from the pulmonary artery(Length=12)Pregnancy and Childbirth Group trials register and the Cochrane Controlled Trials Register(Length=13)the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire
The History of the “HINI”
27
(50,75,100) g Oral Glucose
Tolerance Test
2016/9/12 FSKD 20'11 28
The (goal, aim, purpose) of this study
2016/9/12 FSKD 20'11 29
aim
goal
purpose
Archaeology(考古學)
https://zh.wikipedia.org/wiki/%E8%80%83%E5%8F%A4%E5%AD%A6
On-Line.– PubMed (Medicine Articles)(1990-2014)– CNA (Central News Agent)(中央社新聞)(1990-1996)– 中華民國專利(1950~2009)(中華民國專利文件)
Potential Works. 法院判決案例 (Judicial Systems)– (Studies that allow closer examination of legal texts)– 全國碩博士論文– 中央圖書館– 中華明國專利(Patents)– 小說– 金庸小說全集– Harry Potter(哈力波特)– Shakespeare– 九把刀– Blogs.– News.
Text Archaeology(文件考古學)
圖:https://www.google.com.tw/search
個人化學習樣式歷史雲
http://www.tutortristar.com/topic/Internet/seminar-big-data-marketing-beckett-20151005.html?utm_source=facebook&utm_medium=banner&utm_campaign=seminar-bigdata-marketing
老人語言智力檢測(重複性話語)
你吃飽了沒!
https://www.google.com.tw/search
新增語言詞彙檢測(口頭禪)
https://www.google.com.tw/search
如何收集口語?轉換成文字?
Google’s Speech Recognition API
Version 2
https://www.google.com.tw/search
時間標籤文字+地理位置
https://www.google.com.tw/search
Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences
• Future Works
農產品Traceability (產銷履歷)
• 農產品產銷履歷制度=臺灣良好農業規範實施及驗證+履歷追溯體系
• 產銷履歷—安全、永續、資訊公開之可追溯農產品
From: http://taft.coa.gov.tw/ct.asp?xItem=4&CtNode=206&role=C
χ α β ϒ θ φ樣式(Symbol)
序列化(Serialization)數值(Numeric Value)
生產履歷(Product Traceability)
樣式化(Symbolization)
大陸惠州 /蘇州
紙本記錄自動量測機量測紙本記錄X-R管制 目視管制顯微鏡量測照片檔
紙本記錄X-R管制
Icon: http://www.freepik.com
自動量測機量測 自動量測機量測
40
產銷履歷=>
********************$***********
********************$***********
*****&*******%******************
********************************
*****&*******%******************
*****************************************#****?************@****
********************************
*********#****?************@****
*****&*******%******************
*********#****?***********@*****
Products
(產品)
Steps(步驟)專家意見
41
特殊事件(訊號)序列(某類別中重複出現的序列)
*********************************************************************************************************#****?************@****
*********#****?************@*************#****?************@****
*****&*******%***********************&*******%******************
*****&*******%******************
********************$*******************************$***********
專家意見
特殊事件(訊號)序列
事件(訊號)序列
#****?
@****
&*******%
$**********
*****
物聯網(IoT)
http://www.indetail.com.tw/archives/2494
物聯網(IoT)+ 工業4.0=?
http://portal.stpi.narl.org.tw/index/article/10095
物聯網(IoT)+ 工業4.0
資料處理=>BigData!
www.slideshare.net
監控與異常分析
http://innobic.blogspot.tw/
單點故障(single point of failure)
www.ontargetpartners.com
單點故障=>具有經驗者可能解決!
www.benison.com.tw
www.youtube.com
統計分析
newgenerationresearcher.blogspot.com
統計分析觀點
• 單點故障=>單變數分析
• 多點故障=>多變數分析(multivariate analysis)
www.slideshare.net
意外(Accident)?=多點故障(Multiple Points of Failure)
multiple points of failureops.fhwa.dot.gov
電影• 絕命終結站
g333773.pixnet.net maizizi.pixnet.net
(一連串事件的組合!)=>意外
• 無法預測?
www.frillo.co.uk
Wu Gui says "There are no accidents" -
https://www.youtube.com/watch?v=Q04LPj99ZPc
人力組裝-生產線
jdzol.com.cn www.cfea.org.cn
機械人組裝-生產線
kaifangzhansb.mofcom.gov.cn
從資料處理的角度來看這些代表了甚麼?
http://superbest.typepad.com/.a/6a00d83451b39269e20147e
25f0551970b-pisuperbest.typepad.com
每件【物品】的背後一連串的數字(符號)
www.36dsj.com big5.xinhuanet.com
「工業4.0」吹響號角加速資訊化與智慧化融合提升「智造」競爭力
http://tw.digiwin.biz/newsListDetail_6828.html
天下武功,無堅不破,唯快不破!
chuansong.me
dannylun.blogspot.com
62
物聯網(IoT) vs. 產銷履歷(Traceability)
********************************
********************************
********************************
********************************
********************************
****************************************************************
********************************
********************************
***********^********************
********************************
Products
(產品)
Steps(步驟)
63
物聯網(IoT) vs. 產銷履歷(Traceability)
********************$***********
********************$***********
*****&*******%******************
********************************
*****&*******%******************
*****************************************#****?************@****
********************************
*********#****?************@****
*****&*******%******************
*********#****?***********@*****
Products
(產品)
Steps(步驟)
blog.tianya.cn news.hxsd.com
專家意見=> 標籤
Traceability (產銷履歷)
P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
100
Products
(產品)
10 Steps(步驟)
Traceability (產銷履歷)(1)
…[S3] [S4] [S5]…https://www.google.com.tw/search
Traceability (產銷履歷)(1)
P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
Detect Abnormal (發現異常)
www.publicdomainpictures.net
Traceability (產銷履歷)(2)
https://www.google.com.tw/search
…[S3] [S4] [S5]…
Traceability (產銷履歷)(2)
P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
Detect Abnormal (發現異常)
=>產品(半成品)完成時
逐一檢查產銷過程?
• P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
• P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
Figure: www.yuhing.edu.tw
techpinions.com
giphy.com
Traceability (產銷履歷)(3)
P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10………P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
Traceability (產銷履歷)(3)
P3 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P99 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P2 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P6 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P1 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P4 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10P5 => S1, S2, S3, S4, S5, S6, S7, S8, S9, S10……P100=> S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
特殊事件(訊號)序列
事件(訊號)序列S5, S6, S7
S1, S2, S3, S4
S1, S2, S3
S8, S9, S10
S1, S2, S3, S4, S5, S6, S7, S8, S9, S10
S9, S10https://www.google.com.tw/search
Traceability (產銷履歷)(2)
Ma_1
Ma_2
Mb_1
Mb_2
Ma_3
Mc_1
Mc_2
https://www.google.com.tw/search
…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…
Traceability (產銷履歷)(4)[Step,Machine]
P1 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P2 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P3 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P4 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P5 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P6 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,Mj]………P99 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]P100 =>[S1,M]-[S2,M]-[S3,M]-…-[S9,M] -[S10,M]
10 Steps(步驟)
100 Products
(產品)
Traceability (產銷履歷)(2)
Ma_1
Ma_2
Mb_1
Mb_2
Ma_3
Mc_1
Mc_2
https://www.google.com.tw/search
…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…
…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-……-[S3,Ma_1]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_2]-…
…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…
…-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_1]-…
…-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_2]-……-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-……-[S3,Ma_1]-[S4,Mb_3]-[S5,Mc_1]-……-[S3,Ma_2]-[S4,Mb_3]-[S5,Mc_1]-…
…-[S3,Ma_1]-[S4,Mb_2]-[S5,Mc_1]-…
…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…
特殊事件(訊號)序列
…-[S3,Ma_1]
[S4,Mb_2]-…
[S4,Mb_2]-…
…-[S3,Ma_2]-[S4,Mb_1]-[S5,Mc_2]-…
-[S5,Mc_1]-…
事件(訊號)序列
https://www.google.com.tw/search
Traceability (產銷履歷)(5)[Step,Machine,Time]
P1 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P2 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P3 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P4 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P5 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P6 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]………P99 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P100 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]
Traceability (產銷履歷)(6)[Step,Machine,Time]
P1 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P2 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P3 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P4 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P5 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P6 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]………P99 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]P100 =>[S1,Ma,T1]-[S2,Mb,T2]-[S3,Mc,T3]-…-[S9,Mi,T9] -[S10,Mj,T10]
逆向思考
• 實驗中找出為何{ 好}的原因?
frank727.pixnet.net
城市獵人沙漠之鷹(萬中選一)
detail.chiebukuro.yahoo.co.jp
www.aigouboke.com
85
特殊事件(訊號)序列(某類別中重複出現的序列)
************#*@**&*************************#*@**&*************************#*@**&****************@*****#****?******#*****@*************#****?************@*************#****?********&***@***
*****&***@***%******#****************&*******%******************
*****&*******%******************
***@****************$******@**************#*********$***@*******
專家意見
電容-生產履歷(Traceability) Tags + Sequential Data
• 序列資料 (Sequential Data)
– 生產過程中-自動化數據收集
• (材料批號、機台、時間等)。
• 專家意見(Tags)
– (出廠前)電容品質
• (標籤: ok, 缺失A, 缺失B, 缺失C,… )
– (出廠後) 電容保固期間損壞
• (標籤: ok, 缺失D, 缺失E, 缺失F,… )
研華股份有限公司(研華)Advantech Co.,Ltd.
From: http://www.advantech.com.tw/
打造企業的資料湖泊( ETU )
From:http://etusolution.com/index.php/tw/solution-tw/etu-datalake
生產履歷分析
(jdwang: Maximal Repeat Extraction)
工業4.0 製造業產銷履歷分析
製造業工廠(Factory)
嵌入式系統業(Embedded
System)
雲端計算服務平台 (Cloud Computing Platform)
產銷履歷-分析
(Maximal Repeat
Extraction)
品管部門(Quality
Department )
管理決策(Decision
Management)
http://d1zlh37f1ep3tj.cloudfront.net/wp/wblob/54592E651337D2/33A8/5D7B11/lCIOuvczuB3U81XGqm9nag/903_1438243768yq2L.jpg
我懂【半導體產業】嗎?
http://technews.tw/2016/04/11/tsmc-and-largan/http://image.slidesharecdn.com/random-090620075332phpapp02/95/-9-728.jpg?cb=1245484455
http://www.slideshare.net/5045033/ss-1002323
It will be a hard work!
http://previews.123rf.com/images/dirkercken/dirkercken1208/dirkercken120800053/14852048-hard-work-ahead-tough-job-be-ambitious-even-if-you-have-a-difficult-challenging-task-with-impact-to--Stock-Photo.jpg
New Direction & Thinking!
http://switchandshift.com/11-trademarks-of-rebellious-leadership
94
晶片-生產履歷
****************************************
www.iconarchive.comhttp://technews.tw/2016/04/11/tsmc-and-largan/http://www.slideshare.net/5045033/ss-1002323
台積電再度與清大合辦「第2屆半導體大數據分析競賽」
http://www.appledaily.com.tw/realtimenews/article/new/20150713/647073/
*************************************
*************************************
*************************************
*************************************
*************************************
*************************************
*************************************
專家意見(Tags)
時間、機台、原料(廠牌)、溫度、壓力、操作人員等
Sequential Data (序列化資料)
97
********************$***********
********************$***********
*****&*******%******************
********************************
*****&*******%******************
*****************************************#****?************@****
********************************
*********#****?************@****
*****&*******%******************
*********#****?***********@*****
Products
(產品)
Sequential Data (序列化資料)
專家意見(Tags)
Tags +Sequential Data
98
特殊事件(訊號)序列(某類別中重複出現的序列)
*********************************************************************************************************#****?************@****
*********#****?************@*************#****?************@****
*****&*******%***********************&*******%******************
*****&*******%******************
********************$*******************************$***********
專家意見
特殊事件(訊號)序列
事件(訊號)序列
#****?
@****
&*******%
$**********
*****
Traceability Granularity
****************************************
########## %%%%%%%%%%
@ © ᴂ$
Obstacles
• Numerical Values => Symbols
• Expert Opinions
– (Class or Tag)
• Interactive Interfaces
• Computing Environment
• Chip- Traceability
http://sponsorshipgreen.com/wp-content/uploads/2015/05/overcoming-obstacles-web-1200x725.jpg
推動產學合作狀況!
• 廠商、公司:有機會,再研究!?
https://includedbygrace.files.wordpress.com/2014/01/upset-character.jpg
Make a wish or hope!?
http://ncmissionofhope.org/updates/wp-content/uploads/2016/03/Hope.jpg
回到原點!
http://books.cw.com.tw/sites/default/files/blog/images/%E6%88%91%E7%9A%84%E5%8F%B0%E6%9D%B1%E5%A4%A2.jpg
Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences
• Future Works
From: http://image.slidesharecdn.com/a-systematic-approach-to-genotypephenotype-correlations-1203626174948644-4/95/a-systematic-approach-to-genotypephenotype-correlations-4-728.jpg?cb=1203597376
From: Nucleic Acids Res. 2007 Aug; 35(16): 5625–5633.
異中求同(不同物種間的比對)
異中求同(不同物種間的比對)
異中求同(不同物種間的比對)
同中求異(同物種間的比對)
Human 23 Chromsome
https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Karyotype.png
Coding Genes
https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Human_genome_to_genes_zh.png
49267 Human Genes(upstream 5000 bp)
Class Types # of Genes Percent
Coding Genes 38,645 78%
NonCoding Genes 10,622 22%
https://en.wikipedia.org/wiki/Gene#/media/File:DNA_to_protein_or_ncRNA.svg
Upstream & Downstream
********************************************************************************************************************************
****************************************************************
****************************************************************
********************************
****************************************************************
************
******
******
***
******
5000 bp 500 bp
CodingGenes
NonCodingGenes
Classes
Finding Possibility
http://engage.riggspartners.com/r-blog/3business/finding-possibility
OncoGene&TumorSuppressorGenes
ClassID Class Types # of Genes Percent
Coding Genes
C1 Non-Cancer Gene 37,914 76.96%
C2 OncoGene 345 0.70%
C3TumorSuppressorGene
386 0.78%
NonCodingGenes
C4 NonCoding Gene 10,622 21.56%
37,914
345
386 10,622
Human Genes(49,267 )
Non-Cancer Gene OncoGene TumorSuppressor Gene NonCoding Gene
Experiments1: Existence.
0: Non-Existence
Condition
C1 C2 C3 C4DF
(≧)Length
(≧)
# of MaxiamlRepeats
C1C2C3C4 1 1 1 1 1000 36 143
C2C3C4 0 1 1 1 10 10 1
C2C3 0 1 1 0 2 10 1788
C2C4 0 1 0 1 10 10 35
C3C4 0 0 1 1 10 10 53
Acknowledgements
• 張建國教授– 中國醫藥大學
• 詹雯玲助理教授– (亞洲大學)
• 王昭能助理教授– (亞洲大學)
100 萬
次世代定序(Next Generation Sequencing , NGS)
http://www.slideshare.net/ueb52/introduction-to-next-generation-sequencing-v2
美國總統向癌症宣戰! 全力推動抗癌登月計畫(Cancer Moonshot)
http://i2.wp.com/geneonline.news/wp-content/uploads/2016/01/Obama_precision_medicine_0130151-e1453785268417.jpg?fit=1292%2C665
千兆基因組定序完成癌症研究重大里程碑
• 「加拿大卑詩癌症中心完成定序 1,000 兆位元組(Terabyte,TB)的基因序列,相當於一個拍位元組(Petabyte,PB),遠比國際性的人類基因體計畫所定序的 DNA 超出33,000 倍;科學家更找出全球第四大致死癌症——胃癌的變異基因。」
http://geneonline.news/index.php/2016/05/30/canada-bioinformatics/
Human 23 Chromsome
https://zh.wikipedia.org/wiki/%E4%BA%BA%E9%A1%9E%E5%9F%BA%E5%9B%A0%E7%B5%84#/media/File:Karyotype.png
Maximal Repeats appearing in all of 24 human chromosomes.
• Length |Maximal Repeats| <= 500 bp
– Ok!
• Length |Maximal Repeats| <= 1000 bp
– Disk Space Full!
Acknowledgements
• 王耀聰 (Jazz Wang)
Outline
• Introduction
• Pattern History For Trend Analysis
• Product Traceability for Quality Monitoring
• Mining for Distinctive Pattern (Biomarker)from Genomic Sequences
• Future Works
國際半導體展(2016.9.9,台北南港展覽館)
Future Works
• Hadoop (OffLine) => Spark (OnLine)
• Next Generation Sequence (次世代基因定序)
• Product Traceability (產品履歷)
• 立隆電子-亞洲大學 (產學合作案)
• Web Logs Analysis (駭客行為?)
• User Behavior Analysis (使用者行為分析)
亞洲大學-產學合作機會
• 測試生產履歷(Traceability) 儲存空間
• 巨量資料計算平台
Thanks for your listening!感謝聆聽! 請多指教!
www.flickr.comwww.slideshare.net
http://www.pptschool.com/250.html