RRE (Revolution R Enterprise) vs. R at PC Cluster
-
Upload
kylan-hartman -
Category
Documents
-
view
46 -
download
0
description
Transcript of RRE (Revolution R Enterprise) vs. R at PC Cluster
112/04/19 1
RRE (Revolution R RRE (Revolution R Enterprise) vs. R at PC Enterprise) vs. R at PC ClusterCluster
Edward Cheng2.18.2014
EnvironmentEnvironment
• Node01~node36,stathpc: RHEL 5 + RRE 6.1 (R-2.14.2)
• Node51~node60, himemhpc: RHEL 6 + RRE 7.0 (R-3.0.2)
112/04/19 3
HistoryHistory
R 起源 1993, Professor, Ross Ihaka and
Robert Gentleman, University of Aukland, 紐西蘭
Reolution Analytics 公司 (www.revolutionanalytics.com) 2008 by Intel Capital 等創投投資 董事會成員有: Robert Gentleman 教
授 (R founder), Norman H. Nie 顧問 (前 SPSS CEO)
Revolution R Enterprise ( 企業版 R)
112/04/19 4
RR
• R is world’s most widely used statistics programming language.
• Free and open source software
112/04/19 5
112/04/19 9
PerformancePerformance
R-2.14.2 RRE 6.1 R-3.0.1 RRE 7.0
Matrix Multiply (10000*10000) 751 sec 35 sec 568 sec 20 sec
SVD (10000*10000) 5746 sec 374 sec 4549 sec 256 sec
DefinitionDefinition
• “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
112/04/19 11
Big DataBig Data
• 2011 年全球數位資料的使用量約為 1.8 ZB ( 1 ZB = 2 的 70 次方位元組)。依據 IDC ( International Data Corporation )所做的研究報告預測,到 2020 年的總量將是現在的 44 倍,約為 35.2 ZB 。
112/04/19 13
Big DataBig Data
112/04/19 14
BIG DATA
海嘯來襲
20062006累計儲存了 850 TB的網頁資料
20092009每週約有二億二千萬張照片上傳,也就是需要25 TB的空間儲存
20112011每分鐘約有 48小時(48GB)的影片上傳( 每天約有 70TB)
eBayeBay
The world’s largest online marketplace•We have over 50 petabytes of data •We have over 400 million items for sale•We process more than 250 million user queries per day•We have over 112 million active users•We sold over US$75 billion in merchandize in 2012
112/04/19 15
Big ProblemsBig Problems
• Capacitydata too big to fit into memory
• Speedcomputation may be too slow to be useful
112/04/19 16
RevoScaleRRevoScaleR
• RevoScaleR PackageRevoScaleR analysis functions such as rxCube, rxLinMod, rxCovCor, rxLogit, and rxGlm will provide significant speed improvements over any alternatives. These algorithms are all optimized for handling big data.
112/04/19 18