Post on 18-Mar-2018
Web Computing for Information Island
Crisis in the Era of Big Data
Gang Huang
Peking University
2016.10.20, Taiyuan, China
数据孤岛的Web开放之道
Agenda
• Information Island Crisis in the Era of Big Data
• Web Computing Paradigm as a Silver Bullet
• 10 Years Research on Web Computing
• Future of Web Computing for Big Data
1Web Computing for Big Data - Gang Huang
Data as a Resource
2Web Computing for Big Data - Gang Huang
Enterprise
Information
System
Mobile AppDesktop/Web App
Embedded System
“Surface” Data from World Wide Web • Data can be retrieved by standard web crawlers or search
engines such as Google, Baidu, Bing, etc
• Till June 2016,4.5+ million web sites with 200+ billion pages
“Deep” Data from Service-Oriented Web• Source: enterprise/organization information systems, business
systems like Amazon, Ctrip, CRM, SCM, and zillions of
desktop/mobile apps
• Such data is dynamically generated with the service interaction,
but CANNOT be accessed via crawler!
• Volume: 10x-100x compared to surface data(excluding
video/audio)• Value: pretty higher than surface data
Big Data is generated by billions of Information Systems
Deep Data Collection
3Web Computing for Big Data - Gang Huang
Enterprise
Information
System
Mobile AppDesktop/Web App
Embedded System
In 2012, Google announced
the “In-App Search” for deep
data exploration
In 2015, Apple iOS 9 supports
deep data search for Apple
APPs and cached data search
of other APPs
Surface data collection is the core competence of WWW
Deep data collection is the core competence of Big Data
Information Island Crisis in the Era of Big Data
4
The In-App
Search can
support only
1000+ apps
The iOS supports only
local cache of third-
party APPs
Web Computing for Big Data - Gang Huang
中国大数据产业峰会(2016年5月25日)
50,000,000 Man-Months
100,000,000,000 RMB
Collecting data from 100,000 e-Gov Systems
* from Digital China, Neusoft, Taiji, CS&S, etc.
Silver Bullet to Information Islands
Export from
Close DB
Package
Interception
on HTTPS
Crawler
on C/S
Crawler
on A/S
B/S
C/S
A/SDB
Application
Logic
Network数据
• Specific or ad-hoc solutions for
different levels and scenarios.
• Typically include DB exporter/
importer, crawler, refactoring,
• Heavily depend on the
application infrastructure, e.g.,
hardware, OS, security policies.
• High difficulty, risk, cost, labor-
based, error-prone.
5Web Computing for Big Data - Gang Huang
Refactoring without
source code and
developer ?ET
LDB , refactoring CrawlerIntereption
Agenda
• Information Island Crisis in the Era of Big Data
• Web Computing Paradigm as a Silver Bullet
• 10 Years Research on Web Computing
• Future of Web Computing for Big Data
6Web Computing for Big Data - Gang Huang
YanCloud for Data as a Service
7
Desktop/Web/Mobile Application Systems
Data API Learning and Construction Platform
Client API Cloud
①
Data API Runtime and Management Platform
②
• Data Catalog
• API composition
• Online deployment and evolution
• Data accounting
Data API Store
③
• Domain-Specific API
• API production, consumption
• General at memory level
• Read, but can write back
• Real-time data manipulation
• WYSWYG data visualization
• The unique PRODUCT supports deep data collection of Web/PC/App
Web Computing for Big Data - Gang Huang
YanCloud Applications on Smart City
Data API
Data API
Data API
1. 马鞍山2. 北京本地新闻3. 北京晨报4. 北京新闻5. 本溪门户网6. 本溪通7. 本溪网8. 广佛都市网9. 四川新闻网10. 四川在线网11. 威海网12. 威海信息港13. 西安新闻网14. 张家港在线15. 张家港在线16. 中国本溪17. 中国首都网18. 北京政风行风热线19. 成都市长邮箱20. 张家港市便民服务网21. 北京12345微博22. 北京发布微博23. 北京交警微博24. 本溪发布厅微博25. 平安本溪微博26. 威海播报微博27. 威海发布微博28. 威海警方在线29. 北京交通违章30. 北京驾照扣分31. 福州驾照扣分32. 山东威海交通违章33. 威海水费管理系统34. 威海电费系统
35. 西安公积金系统36. 西安驾照扣分37. 西安社保38. 北京公交卡系统39. 成都机动车违章系统40. 成都地税系统41. 成都售楼系统42. 重庆机动车违章查询43. 重庆驾驶人违章查询44. 重庆交通管理信息网45. 重庆驾照记分系统46. 重庆图书馆系统47. 成都水账单系统48. 北京摇号系统49. 便民查询网系统50. 北京交通违章系统51. 北京公积金系统52. 北京社保系统53. 扬州驾驶人违章系统54. 扬州用电系统55. 武汉用水系统56. 武汉用电系统57. 珠海—中国南方电网58. 全国违章查询系统59. 中国扬州系统60. 扬州燃气系统61. 徐州社保公积金水费查询62. 南通公积金,水费查询63. 本溪交通,水费查询64. 扬州-物价云管理系统65. 贵阳电费系统66. 石家庄违章系统67. 石家庄驾驶人扣分系统68. 珠海驾驶人系统…
From 60
man-months
To 1 man-
day
8Web Computing for Big Data - Gang Huang
315 data APIs for 121
systems from 43 cities
YanCloud Applications on Data Collection
9Web Computing for Big Data - Gang Huang
Tax Management Systems
HR Management System
From impossible
To 5 man-days using YanCloud
From unsolvable
To 3 man-days using YanCloud
YanCloud Applications on Data Collection
10Web Computing for Big Data - Gang Huang
500+ Systems in 20+ Provinces and Ministries across China in 2016
Engineering Efficiency ⬆100X Labor Cost ⬇90%
Sharing andCrowdsourcingof data,algorithms,applicationsandstakeholders
YanCloud Applications on Mobilization
11Web Computing for Big Data - Gang Huang
Generate mobile APPfrom legacy Visa Application System
Generate WeChat Public Accountfrom legacy Journal Portal
DaaS Applications on Mobile Intelligence
12Web Computing for Big Data - Gang Huang
腾讯新闻 猫眼电影
美团
滴滴出行
Deep Sensing Deep Searching Deep Linking
Deep Sensing Deep Searching Deep Linking
Deep Sensing Deep Searching Deep Linking
Agenda
• Information Island Crisis in the Era of Big Data
• Web Computing Paradigm as a Silver Bullet
• 10 Years Research on Web Computing
• Future of Web Computing for Big Data
13Web Computing for Big Data - Gang Huang
Our Vision on Internet Computing
Web Computing for Big Data - Gang Huang 14
Pervasive Computing
Internet of Things
Service Computing
Semantic Web
Social Computing
System of Systems
Grid/Cloud Computing
as a
Computer
Digital Economy
E-government
Modern Service
Smarter Planet
Internet Culture
Social Network
Virtual World
Internet
Technical Trend Business TrendBig Trend
•Grid/Cloud computing proposes a new model of networked applications from the perspective of resource sharing and management.
•Pervasive computing discusses a new situation of networked applications from the perspective of human computer interaction.
•Service Oriented Computing focuses on a new form of software with emphasis on collaboration and dynamism from the philosophy of
software as a service.
•…
Internetware for Internet Computer
Web Computing for Big Data - Gang Huang 15
“Internet Computer” requires substantial improvements in software
characteristics for implementing new business naturally with new technology.
Internetware: A New Software Paradigm for Internet Computing, IEEE Computer 2012
IBM GTO (Global Technology Outlook) 2012
Web Pages as Web Services
16Web Computing for Big Data - Gang Huang
Web Technologies Mechanisms (HTML, JavaScript, CSS)
Web Browser
SOAP Service RESTful Service JavaScript API RSS/Atom
Internetware Rich Client: Browser-based Middleware
Application Programming Interface
MaaSiMashup (Mashup Environment) Service-Oriented Rich Client
Intra-Browser Communication
Mechanisms
Event Bus UI Composition
Browser-Server Communication
Mechanisms
Service Data Cache
Business Process Integration
Component
Container
On-the-fly Composition
Model Checking for Quality
Cross-Domain OAuth
Advanced Features Advanced Features
CyberC 2009 Best Paper
IEEE Transactions on Services Computing 2009
Q 1: Very few service mashup components?
Silver Bullet Part 1: We controlthe web pages for opening
information island !
Service mashup is a data flow integratingmultiple interactive web services
A : Any Web page can become a web mashup
component if we break the security mechanisms
of standard web pages, i.e. sandbox.
In-Depth Analysis on Services Mashups
17Web Computing for Big Data - Gang Huang
Behavior model
(UML Sequence Diagram)
Verification model
(in Premola)
Model Checker
SPIN
Behavior of
application
Specification of
constraints and refinements
Results
(trace sets and violation)
① Generating Behavior Model ② Constraints and
Refinements
Specification
③ Verification of
Behavior Model
Behavior of
environment
Generation Template for Runtime Environment
behavior meta-model Synthesized behavior model
ICSS 2010 Best Paper Award
Q 2: Web browser controls the behavior of web pages?
A: We analyze the source code of web browser and
model checks its runtime behavior for understanding the
whole browser-based service mashups
Performance evaluation
Silver Bullet Part 2: We controlthe web browser for opening
information island !
Data Cache for Services Mashup
18Web Computing for Big Data - Gang Huang
Logic
1. Intercept User Requests
2. Query
3. Invoke
4. Respond 5. Cache
6. Respond
7. Validate
Instance Repository
Application Programming Interface
Cache Strategy
Data Model
Component
Context Desired cache strategy
Google Weather’s
cache strategy:
Cacheexpires
immediately.
AA uses the weather data in a real time application.
The same as from Google.
B
B uses the weather data to feed other services which care less about the accuracy.
Frequency: cache data does not expire within five minutes from the last response.
C
C only needs today’s weather from the responses, which varies less frequent.
Granularity: cache should be done on fine-grained structures within the responses.
Q 3: Standard Browser/Server
interactions unfit service
mashup?
A : We control the cache
strategies of HTTP and HTML.
SOCA 2010 Best Paper Nomination, WWW 2015
Silver Bullet Part 3: We controlthe interaction between webbrowser and web server for
opening information island !
Offloading Javascript Programs
19Web Computing for Big Data - Gang Huang
•Rich Web mashups cannot work well on mobile devices •Chess games, 3D Graphics, RPGs
•Mobile Web can leverage the cloud-side resources
49x page load time
improvement
92% Energy saving
Generally applied to
major browsers
Chrome, Safari, and
FireFox
SPLASH 2012, WWW 2016, IEEE Transactions on Mobile Computing 2016
Q 4: Javascript make web pages much more
complex to understand and control?
A : We make the Javascript programs
offloaded from mobile browser to cloud.
Silver Bullet Part 4: We controlthe Javascript programs for
opening information island !
All-in-One by SM@RT
20Web Computing for Big Data - Gang Huang
Science China 2013 & IEEE Transactions on Services Computing 2016
SM@RT SMVC ModelSM@RT Client-Cloud-Convergence
Platform
SM@RT for Java-based Information Islands
21Web Computing for Big Data - Gang Huang
Clusteredappclasses
3.detectwhichclassesshouldbeoffloadedasawhole
Locationanchoredappclasscluster
Movableappclasscluster
1.detectwhichclassesaremovable
a b c
d e f
g h i
ClassifiedappclassesLocationanchoredappclass
Movableappclass
2.makemovableclassesbeabletooffload
Proxyclass
Transformedappclasseswithproxies
a b c
d e f
g h i
4.packagedeployablefiles+
DeployableAndroidapp+
MovableappclassespackedinanexecutableJarfile
JavaBytecode
Android app
a b c
d e f
g h i
ClassifiedappclassesLocationanchoredappclass
Movableappclass
OOPSLA 2012
Re-implement the silver bullet for opening Java-based information island
Runtime model of an offloaded Android app
97% execution time and 83% energy saving
SM@RT SMVCProgramming Abstraction
•Java bytecode•Java VM •Java Invocation•VM in Cloud
Our Silver Bullet to Information Island Crisis
22Web Computing for Big Data - Gang Huang
Apps/Server, Client/Server, Browser/Server
Networked Software
Architecture
Dev Framework
Middleware
Host OS Host OS
Network
Code/Data
Analytics
Re
co
ve
ry a
nd
Re
facto
ring
Micro
Service
Self-Organize
Self-Optimize
Self-Evolve
Self-Configure
Self-Healing
Self-Protect
HTML/CSS
Javascript
Java bytecode
Assembly
Browser
JDK/JVM
GUI Widget
HTTP Stack
Android/Linux
Service Oriented
Software Architecture
Data and Service
Innovation
SM@RT
Model
View
Controller
Presentation DataBusiness
Summary of Our Web Computing for Big Data
• The ONLY Silver Bullet for Web/Desktop/Mobile Information Islands
• 500+ Government and Enterprise Applications
• 100X Engineering Efficiency Improvement
• 90% Labor Cost Saving
• 80,000,000 RMB Patent Royalties
• 10 years research and practice
Web Computing for Big Data - Gang Huang 23
Agenda
• Information Island Crisis in the Era of Big Data
• Web Computing Paradigm as a Silver Bullet
• 10 Years Research on Web Computing
• Future of Web Computing for Big Data
24Web Computing for Big Data - Gang Huang
Intra-Organization Deep Data Sharing
Palantir:real-time inspection of tens of
government system
20 billion $ assessed value
DOMO: Real-time collection of hundreds of
EIS and BI support
2 billion $ Assessed Value
25Web Computing for Big Data - Gang Huang
API Economy for Big Data
Palantir:real-time inspection of
tens of government system, 20
billion $ assessed value
DOMO: Real-time collection of
hundreds of EIS and BI support
2000M $ Assessed Value
API-based data
trading (10+ billions
of RMB market)
Intra-Organization Deep Data Inter-Organization Deep Data
API Economy for situational applications
5 Billion API request of Google and Facebook
3 Billion API request of Twitter(75% of total traffic)25+ billion USD market (Gartner]
Situational Deep Data
26Web Computing for Big Data - Gang Huang
API Economy by Web Computing
API Specification(Data)
API Management(Data)
API Invocation(Data)
API Consumption(Data)
API Economy
625 M $2000M $ 2800M $
600M $
Web 1.0 (HTML+HTTP) Web 2.0 (REST+XML) Web 3.0 (Semantics)
Web 1.0 (HTML+HTTP) Web 2.0 (REST+API ) Web 3.0 (Big Data)
HTMLv.s.
API/Data Spec
Web Searchv.s.
API/Data Search
HTTP/SSL for Web Pages v.s.
HTTP/Block-chain for Data
RESTfulv.s.
Micro-Services
Web Computing for Big Data - Gang Huang 27
1 ZB互联网年流量
ThanksWeb Computing for Information Island Crisis in the Era of Big Data
Gang Huang, Peking University, hg@pku.edu.cn
数据孤岛的Web开放之道 黄罡 北京大学