•識N四uR,識&Data MiningKaM 1Hu&出?RIc模
Transcript of •識N四uR,識&Data MiningKaM 1Hu&出?RIc模
照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针
烈种精花照烈种精花照烈种精花照烈种精花照媒媒媒媒群群群群 群–群–群–群–
1
必嵌康康必嵌康康必嵌康康必嵌康康 暴暴暴暴 必施必施必施必施
联邮集金高邮金––联邮集金高邮金––联邮集金高邮金––联邮集金高邮金––
媒媒媒媒媒媒媒媒 善–照金身首集高针–w集t长––照溺脚溺––器–––康康–环特激群花溺群虽–善–照金身首集高针–w集t长––照溺脚溺––器–––康康–环特激群花溺群虽–善–照金身首集高针–w集t长––照溺脚溺––器–––康康–环特激群花溺群虽–善–照金身首集高针–w集t长––照溺脚溺––器–––康康–环特激群花溺群虽–
必嵌康康–获精种–息息康–联烈益特範烈特–www密s邮集金高邮金验身针密o龐针必嵌康康–获精种–息息康–联烈益特範烈特–www密s邮集金高邮金验身针密o龐针必嵌康康–获精种–息息康–联烈益特範烈特–www密s邮集金高邮金验身针密o龐针必嵌康康–获精种–息息康–联烈益特範烈特–www密s邮集金高邮金验身针密o龐针
ClusterComputing
DistributedComputing
SuperComputer
7
GridComputing
UtilityComputing
CloudComputing
理oo针首金 立身p群金量u邮金 理环联
激集针脚身过首金
联身身联联身身联联身身联联身身联
联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金
益益益益益益益益身身联身身联身身联身身联 I2身身联
益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金 身s–身–联金龐v集邮金身s–身–联金龐v集邮金身s–身–联金龐v集邮金身s–身–联金龐v集邮金
8
益身身联益身身联益身身联益身身联
益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金
網身身联網身身联網身身联網身身联
網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金
联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金
Speech/WritingSpeech/Writing
XML/SOAPXML/SOAPHTTP/HTMLHTTP/HTMLSMTPSMTP Email ClientsEmail Clients
Web BrowsersWeb Browsers
WiWi--Fi/BroadbandFi/BroadbandDevicesDevices
Web ServicesWeb Services
Rights ManagementRights ManagementTrusted Computing HardwareTrusted Computing Hardware
MouseMouse
Cloud ComputingCloud Computing
PC ArchitecturePC ArchitectureDOSDOS SpreadsheetsSpreadsheets
Word ProcessorsWord Processors
PCPCMid 80sMid 80s
InternetInternetMid 90sMid 90s
ApplicationsApplicationsLate 80sLate 80s--Mid 90sMid 90s
Web AppsWeb AppsMid 00s Mid 00s -- . . .. . .
TodayToday
MouseMouseGUIGUILANsLANs
(Cloud(Cloud(Cloud(Cloud Computing)Computing)Computing)Computing)
理oo针首金 始 理验身集首始 虽ou脚u过金始
理oo针首金 照o邮s始理oo针首金 脚身首风始 集理oo针首金始
理oo针首金 烈身首金高量身龐
始 虽溺症精精始 溺立溺融精範
ClusterComputing
属属属属
網获立網获立網获立網获立 立網益立網益立網益立網益
SuperComputer
22
網获立網获立網获立網获立 立網益立網益立網益立網益
康暴描嵌~康暴描嵌~康暴描嵌~康暴描嵌~
益脚益脚益脚益脚
ClusterComputing
DistributedComputing
SuperComputer
25
环龐o验–益激立始
GridComputing
UtilityComputing
ClusterComputing
DistributedComputing
SuperComputer
26
GridComputing
UtilityComputing
CloudComputing
理oo针首金 立身p群金量u邮金 理环联
激集针脚身过首金
照溺脚溺照溺脚溺照溺脚溺照溺脚溺 烈金高t金龐––烈金高t金龐––烈金高t金龐––烈金高t金龐––媒媒媒媒媒媒媒媒 每每每每 泄泄泄泄
1. 遗C ?
2. V.鲸. ?
3. ?3. ?
4. ?
学ata Center
溺高–益量金身–蓋长os金–脚集验金–症身s–烈o验金
Nortel Steel Enclosure
Containerized telecom equipment
Sun Project Black Box
242 systems in 20’
32
Rackable Systems
1,152 Systems in 40’
Rackable Systems Container
Cooling Model
Caterpillar
Portable Power
联长集pp集高针–烈o高t身集高金龐–身s–照身t身–烈金高t金龐–立o量u首金联长集pp集高针–烈o高t身集高金龐–身s–照身t身–烈金高t金龐–立o量u首金联长集pp集高针–烈o高t身集高金龐–身s–照身t身–烈金高t金龐–立o量u首金联长集pp集高针–烈o高t身集高金龐–身s–照身t身–烈金高t金龐–立o量u首金
33
Amazon
EC2
App Engine
Microsoft
Azure
Yahoo
Hadoop
Iaas/Paas Paas Paas Software
Compute/
Storage
Web application
Web and non-web
Software
OS on Xen Application OS through Map / Reduce
40
hypervisor container Fabric controller
Architecture
EC2 Command-line tools
Web-based Administratio
n console
Windows Azure portal
Command line and web
APIs yes yes yes yes
yes maybe yes no
AMI (Amazon
Machine Image)
Python .NET framework
Java,
照身t身–立集高集高针
Database systems, Database systems, Database systems, Database systems, Data Warehouses, Data Warehouses, Data Warehouses, Data Warehouses, OLAPOLAPOLAPOLAP
Machine Machine Machine Machine learninglearninglearninglearning
Statistical and data Statistical and data Statistical and data Statistical and data analysis methodsanalysis methodsanalysis methodsanalysis methods
OLAPOLAPOLAPOLAP
VisualizationVisualizationVisualizationVisualization
Mathematical Mathematical Mathematical Mathematical programmingprogrammingprogrammingprogramming
High High High High performance performance performance performance computingcomputingcomputingcomputing
Data MiningData MiningData MiningData Mining
烈群益联網媒照立
Business Business Business Business UnderstandingUnderstandingUnderstandingUnderstanding
Data PreparationData PreparationData PreparationData Preparation
Data Data Data Data UnderstandingUnderstandingUnderstandingUnderstanding
DataDataDataData
EvaluationEvaluationEvaluationEvaluation
ModelingModelingModelingModeling
DeploymentDeploymentDeploymentDeployment
DataDataDataData
Binary Classifier ( )Numeric Predictor ( )Time Series ( )C&R TREE ( )Quick Unbiased Efficient Statistical Tree (QUEST )CHAID ( )
照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针 媒媒媒媒康康康康
CHAID ( )Decision List ( )Regression ( )PCA/Factor ( )Neural Net ( )C5.0 ( )Feature Selection ( )Discriminant Analysis ( )Logistic ( )Generalize Linear Model ( )Cox Regression
照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针照身t身–立集高集高针 媒媒媒媒必必必必
Support Vector Machine (SVM )Bayes Net ( )SLRM ( )GRIAprioriCARMA ( )Sequence ClustercSequence ClustercK-Means (K-Means )Kohonen ( )Two-Step ( )Anomaly ( )Random Forests ( )ICA ( )Multivariate adaptive regression spline (MARS )Pmml( )Boosting
SQL server 2008SPSS 17 (PAWS) --IBMSASSQL 2008+Excel (2008)-Data Mining SQL 2008+Excel (2008)-Data Mining Add-inClementine 12.0Statistica 7.0WEKAR � Cloud RR+Excel ADD-IN …….
st身t邮o高高st身t邮o高高st身t邮o高高st身t邮o高高
脚长金–验身st金龐验集高量s–过金长集高量–脚长金–验身st金龐验集高量s–过金长集高量–脚长金–验身st金龐验集高量s–过金长集高量–脚长金–验身st金龐验集高量s–过金长集高量–
st身t邮o高高st身t邮o高高st身t邮o高高st身t邮o高高
脚长o验身s–激身集金龐脚长o验身s–激身集金龐脚长o验身s–激身集金龐脚长o验身s–激身集金龐 康暴操康康暴操康康暴操康康暴操康媒媒媒媒
群群群群
– 群属联邮集首身过–善照器烈精立–联金龐v金龐
群特x邮金首 善康暴暴施器
-49-
特龐集邮长–範金uw集龐t长特龐集邮长–範金uw集龐t长特龐集邮长–範金uw集龐t长特龐集邮长–範金uw集龐t长 康暴應施康暴應施康暴應施康暴應施媒媒媒媒
群特x邮金首群特x邮金首群特x邮金首群特x邮金首
•http://rcom.univie.ac.at/RExcel
University of Vienna
联身身联联身身联联身身联联身身联
联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金
益益益益益益益益身身联身身联身身联身身联 I2身身联
益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金益高fo龐验身t集o高–又–集高t金首首集针金高邮金 身s–身–联金龐v集邮金身s–身–联金龐v集邮金身s–身–联金龐v集邮金身s–身–联金龐v集邮金
60
益身身联益身身联益身身联益身身联
益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金益高f龐身st龐u邮tu龐金–身s–身–联金龐v集邮金
網身身联網身身联網身身联網身身联
網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金網首身tfo龐验–身s–身–联金龐v集邮金
联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金联oftw身龐金–身s–身–联金龐v集邮金
Speech/WritingSpeech/Writing
XML/SOAPXML/SOAPHTTP/HTMLHTTP/HTMLSMTPSMTP Email ClientsEmail Clients
Web BrowsersWeb Browsers
WiWi--Fi/BroadbandFi/BroadbandDevicesDevices
Web ServicesWeb Services
Rights ManagementRights ManagementTrusted Computing HardwareTrusted Computing Hardware
MouseMouse
Cloud ComputingCloud Computing
PC ArchitecturePC ArchitectureDOSDOS SpreadsheetsSpreadsheets
Word ProcessorsWord Processors
PCPCMid 80sMid 80s
InternetInternetMid 90sMid 90s
ApplicationsApplicationsLate 80sLate 80s--Mid 90sMid 90s
Web AppsWeb AppsMid 00s Mid 00s -- . . .. . .
TodayToday
MouseMouseGUIGUILANsLANs
DATADATA MININGMINING
IntelIntel SQLServerSQLServer UnixUnix
DATADATA MININGMINING UnixUnix55 SQL SQL
Server2008Server2008 UDMUDM 3232 2323 64641111
DATADATA MININGMINING
BASELIIBASELII
1111
DATADATA MININGMININGBIBI
媒媒媒媒
Keyword &
推
薦
知
識
社
群
推
薦
知
識
社
群
推
薦
知
識
社
群
推
薦
知
識
社
群
Data Mining Center
推
薦
知
識
社
群
推
薦
知
識
社
群
推
薦
知
識
社
群
推
薦
知
識
社
群
2004.12004.12 2006.5
2002.32003.1 2005.5
2004.11
2004.12
2002 2006
2006.5
2006.5
2003.92006.12
2002.12
2003.1
2005.3
2005.5
DataMining+ BI (IN Cloud Computing)
Customer-focused Operations-focused Research-focused
Life-time ValueMarket-Basket AnalysisProfiling & SegmentationRetention
Profitability AnalysisPricingFraud DetectionRisk Assessment
Combinatorial ChemistryGenetic ResearchEpidemiology
RetentionTarget MarketAcquisitionKnowledge PortalCross-SellingCampaign ManagementE-Commerce
Risk AssessmentPortfolio ManagementEmployee TurnoverCash ManagementProduction EfficiencyNetwork PerformanceManufacturing Processes
96.89% 95.64% 97.74% 0.000022Recall 94.81% 92.86% 96.79% 0.00005Precision 92.53% 88.96% 94.37% 0.000158F-measure 93.65% 91.45% 95.49% 0.000085
85.34% 84.40% 86.06% 0.000018Recall 79.72% 77.07% 82.16% 0.000133Precision 55.82% 51.47% 60.17% 0.000397F-measure 65.62% 61.72% 68.00% 0.00023
74.82% 73.65% 76.21% 0.000028Recall 0.18% 0.00% 0.64% 0.000003
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Recall•
•
•
•1:3
Recall 0.18% 0.00% 0.64% 0.000003Precision 32.67% 0.00% 100.00% 0.09652F-measure 0.35% 0.00% 1.28% 0.000012
85.71% 84.21% 86.28% 0.000022Recall 72.80% 68.90% 74.20% 0.000148Precision 68.31% 65.33% 70.20% 0.000112F-measure 70.47% 67.07% 71.71% 0.000102
83.18% 80.61% 84.87% 0.000087Recall 89.75% 79.36% 89.47% 0.000496Precision 39.69% 31.10% 46.71% 0.001618F-measure 53.95% 46.15% 60.02% 0.001217
0.00%
10.00%
20.00%
30.00%
40.00%
1:1 1:2 1:3
•
•