二等辺三角形の性質(1) - ed2.city.yamato.kanagawa.jp · 三角形・四角形 二等辺三角形の性質(1) 二等辺三角形…2つの辺が等しい三角形(定義)
資料科學在 Whoscall 產品體系中的角色
-
date post
22-Nov-2014 -
Category
Technology
-
view
12.491 -
download
9
description
Transcript of 資料科學在 Whoscall 產品體系中的角色
![Page 1: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/1.jpg)
Gogolook Confidential
![Page 2: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/2.jpg)
Gogolook Confidential
How Started?
![Page 3: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/3.jpg)
Gogolook Confidential
How Started?
![Page 4: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/4.jpg)
Gogolook Confidential
How Started?
![Page 5: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/5.jpg)
Gogolook Confidential
The Best App For
identifying and blocking calls
The Best App – LINE whoscall
![Page 6: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/6.jpg)
Gogolook Confidential
![Page 7: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/7.jpg)
Gogolook Confidential
Key Features
![Page 8: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/8.jpg)
Gogolook Confidential
★ Instant Caller Identification
LINE whoscall identifies background information of incoming unknown calls in seconds through tags reported by other users, Internet search results, and our comprehensive global database.
Instant Caller Identification
![Page 9: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/9.jpg)
Gogolook Confidential
★ Database with over 600 Million Phone Numbers
LINE whoscall boasts an online database with over 600 million phone numbers. The database of LINE whoscall covers yellow pages, spammers, telemarketers, costumer services...,etc. with numerous community tags contributed by users and comments based on real users’ experiences.
Database & Number Details
![Page 10: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/10.jpg)
Gogolook Confidential
Incoming Call Dialogue
Incoming Call DialogueFraud Call
Business Corporation
Restaurant
![Page 11: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/11.jpg)
Gogolook Confidential
★ Community Tag
★ Block unwanted calls & SMSs
Contributions from the global user community has always been the pillar of LINE whoscall’s service. LINE whoscalluser can tag a phone number and share it with others, which creates an integrated phone number database and a reliable communication network for everyone.
Block calls and SMSs intelligently to ensure a harassment-free calling experience.
Tag & Block
![Page 12: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/12.jpg)
Gogolook Confidential
★ World’s Largest Yellow Page Database
★ Offline Database Available for Free
LINE whoscall owns one of the world’s largest onlinephone number database in the world, which covers most of numbers of businesses and service providers essential to you daily lives.
The free database is not only available online but also offline. And they are completely free! The unlimited usage of database with over 600 million phone numbers is only on LINE whoscall.
Database Usage
![Page 13: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/13.jpg)
Gogolook Confidential
3 of every 5 strangers’ calls
can be identified by LINE whoscall
Over 400 million phone calls
are identified by LINE whoscall
every month.
3000 spammer numbers
are reported by LINE whoscalluser every day.
Number Identification
– 2014.07 – 2014.07
![Page 14: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/14.jpg)
Gogolook Confidential
Market
![Page 15: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/15.jpg)
Gogolook Confidential
Honors
![Page 16: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/16.jpg)
Gogolook Confidential
What we will be…
![Page 17: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/17.jpg)
Gogolook Confidential
Vision
![Page 18: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/18.jpg)
Gogolook Confidential
資料科學在 whoscall 的應用GOGOLOOK 資料科學家 高義銘
![Page 19: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/19.jpg)
Gogolook Confidential
★ 日常生活經常會遇到的問題
![Page 20: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/20.jpg)
Gogolook Confidential
★ 人面對未知的事物就會有一種…
我有一種不祥的預感!
![Page 21: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/21.jpg)
Gogolook Confidential
★ 坊間流傳著許多解決此問題的 APPs
小熊來電通知
![Page 22: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/22.jpg)
Gogolook Confidential
★ 坊間流傳著許多解決此問題的 APPs
小熊來電通知
![Page 23: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/23.jpg)
Gogolook Confidential
★ Why whoscall?
因為… 他是連 Google 執行長都說讚的軟體!
唉呦,讚喔
![Page 24: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/24.jpg)
Gogolook Confidential
whoscall 是如何解決未知來電的問題咧?
![Page 25: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/25.jpg)
Gogolook Confidential
★ Technologies adopted
1. Yellow pages:HiPage, Yelp, Zenrin…
2. Google search
3. Other sources
Technologies adopted
![Page 26: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/26.jpg)
Gogolook Confidential
★ Technologies adopted
Technologies adopted
4. 使用者回報與標記
![Page 27: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/27.jpg)
Gogolook Confidential
★ Technologies adopted
Technologies adopted
4. 使用者回報與標記
![Page 28: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/28.jpg)
Gogolook Confidential
★ whoscall, I have a problem…
如果一個未知號碼,我們無法從這些 sources 去取得任何資訊,那就 GG 了嗎?
![Page 29: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/29.jpg)
Gogolook Confidential
★ whoscall, I have a problem…
如果一個未知號碼,我們無法從這些 sources 去取得任何資訊,那就 GG 了嗎?
是的,GG 然後洗洗睡…
![Page 30: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/30.jpg)
Gogolook Confidential
當然不能洗洗睡,要不然我站在這邊幹嘛?
![Page 31: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/31.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Problem we want to solve
![Page 32: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/32.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Can we determine if it’s a spam number?
Problem we want to solve
![Page 33: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/33.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Can we determine if it’s a spam number?
推銷電話?
Problem we want to solve
![Page 34: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/34.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?騷擾電話?
Problem we want to solve
![Page 35: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/35.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?騷擾電話?
打錯電話?
Problem we want to solve
![Page 36: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/36.jpg)
Gogolook Confidential
★ Problem we want to solve
For an unknown phone number:• No google result• No user tag / report• Not a whoscall user
Can we determine if it’s a spam number?
推銷電話? 詐騙電話?騷擾電話?
打錯電話?
Problem we want to solve
(我又不是神!!)
![Page 37: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/37.jpg)
Gogolook Confidential
★ Scenario
Scenario
OO推銷
小明
小明妹
小明哥
?
![Page 38: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/38.jpg)
Gogolook Confidential
★ We think it should work because…
whoscall userbase ( = potential sensors)• > 10 million installations• > 10 thousands tags (daily)• > 30 million phone calls (daily)
![Page 39: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/39.jpg)
Gogolook Confidential
Analysis procedures
Analysis procedures1. Collect call logs2. Compare with user tags3. Explore call behaviors4. Extract features5. Classify unknown numbers using
machine learning techniques
![Page 40: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/40.jpg)
Gogolook Confidential
★ Collect call logs
• Recruit a group of voluntary whoscall users as our sensors.
• Collect phone call logs from these sensors for a month.
Collect call logs
![Page 41: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/41.jpg)
Gogolook Confidential
★ User privacy User privacy is kept in the highest priority. Phone numbers are stored as one-way hash
codes. (therefore unable to be reversed)
User privacy
![Page 42: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/42.jpg)
Gogolook Confidential
Analysis procedures
Analysis procedures1. Collect call logs2. Compare with user tags3. Explore call behaviors4. Extract features5. Classify unknown numbers using
machine learning techniques
![Page 43: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/43.jpg)
Gogolook Confidential
★ List of user tags
List of user tags
一接就掛斷一打來就掛掉一接對方馬上掛斷一接就掛電話一接起來就掛斷電話一接起來,就說打錯一直傳廣告簡訊一直打錯電話一直收到沒顯示的APP一直狂打錯電話一聲一聲不響,就掛掉,有問題一聲就掛一聲掛斷一聽收線
嚴重騷擾
國外莫名來電
國際電話偽裝台北碼???
地下錢莊
地下錢莊推銷
地下非法期公司
地產
垃圾
垃圾簡訊
垃圾訊息
基隆美髮
壽險
外勞
夜半打給不認識的在亂
色情交友
色情交友電話
色情人肉市場
色情垃圾簡訊
色情外送
色情妹妹電話
色情干擾
色情廣告簡訊
色情拉客妹
色情按摩
色情推銷
色情推銷電話
色情援交外送
色情敗類
摩門
撥了馬上掛掉
擾亂電話
收數率調查
收視率調查
放款簡訊
政府宣導
敲一聲而已
整人電話
新光保全
星展借貸
星展推消
星展銀行
淫媒仲介
![Page 44: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/44.jpg)
Gogolook Confidential
★ Compare with user tags
• Compare these phone numbers with user reports from whoscall database (封鎖記錄)
Compare with user tags
Normal numbers
0987-991-XXX0986-225-XXX02-2675-XXXX03-862-XXXX
...
02-2543-XXXX03-556-XXXX886-XXXX
…
推銷電話
02-2783-XXXX886-903-XXXX0800-000-XXX
…
惡意電話
![Page 45: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/45.jpg)
Gogolook Confidential
★ Data summary
Data summary
推銷電話
民調中心
騷擾電話
詐騙電話
70% 1%
5%
24%
# Samples: 7854Normal: 4000Spam: 3854
![Page 46: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/46.jpg)
Gogolook Confidential
Analysis procedures
Analysis procedures1. Collect call logs2. Compare with user tags3. Explore call behaviors4. Extract features5. Classify unknown numbers using
machine learning techniques
![Page 47: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/47.jpg)
Gogolook Confidential
Normal numbers
0
5
10
15
20 Calls =195 (in 66, out 129)Opponents = 72 (in 21, out 58)
★ Normal numbers
![Page 48: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/48.jpg)
Gogolook Confidential
★ Spam numbersSpam numbers
0
10
20
30
Calls =471 (in 15, out 456)Opponents = 186 (in 11, out 183)XX信用卡行銷 (7)OOO,XXXX行銷 (6)電話行銷 (3)
![Page 49: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/49.jpg)
Gogolook Confidential
Analysis procedures
Analysis procedures1. Collect call logs2. Compare with user tags3. Explore call behaviors4. Extract features5. Classify unknown numbers using
machine learning techniques
![Page 50: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/50.jpg)
Gogolook Confidential
★ What is a feature?
What is a feature?
“Feature” is a measurable property of a phenomenon being observed.
![Page 51: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/51.jpg)
Gogolook Confidential
Example
Or, we want to analyze a company, we can look at features:
公司人數
★ Example
![Page 52: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/52.jpg)
Gogolook Confidential
Example
Or, we want to analyze a company, we can look at features:
工程師人數
★ Example
![Page 53: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/53.jpg)
Gogolook Confidential
Example
Or, we want to analyze a company, we might look at features:
公司裡面Python工程師的比例
★ Example
![Page 54: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/54.jpg)
Gogolook Confidential
Example
Or, we want to analyze a company, we might look at features:
公司向心力
★ Example
![Page 55: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/55.jpg)
Gogolook Confidential
Example
Or, we want to analyze a company, we might look at features:
CEO 帥氣程度
★ Example
![Page 56: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/56.jpg)
Gogolook Confidential
Features for call patterns
Ratio of out calls
0.8
0.6
0.4
0.2
0.0Fraud Marketing Normal
![Page 57: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/57.jpg)
Gogolook Confidential
Features for call patterns
Ratio of recurring opponents
Fraud Marketing Normal0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
![Page 58: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/58.jpg)
Gogolook Confidential
Features for call patterns
Ratio of missed out calls
Fraud Marketing Normal
0.6
0.5
0.4
0.3
0.2
0.1
0
![Page 59: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/59.jpg)
Gogolook Confidential
Features for call patterns
Ratio of working time calls
Fraud Marketing Normal
0.6
0.5
0.4
0.3
0.2
0.1
0
0.7
![Page 60: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/60.jpg)
Gogolook Confidential
Features for call patterns
Median of call durations
Fraud Marketing Normal
50
40
30
20
10
0
60
seconds
![Page 61: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/61.jpg)
Gogolook Confidential
Features for call patterns
Ratio of out calls in contact book
Fraud Marketing Normal
0.10
0
0.25
0.30
0.35
0.20
0.15
0.05
![Page 62: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/62.jpg)
Gogolook Confidential
Analysis procedures
Analysis procedures1. Collect call logs2. Compare with user tags3. Explore call behaviors4. Extract features5. Classify unknown numbers using
machine learning techniques
![Page 63: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/63.jpg)
Gogolook Confidential
Ratio of recurring components is less than 40% Ratio of out calls is more than 60%
Ratio of in calls is less than 20% Then we claim the number is a spam number
Intuitively, we can determine an unknown number by rules such as if
★ Naïve method
![Page 64: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/64.jpg)
Gogolook Confidential
★ Problem 1
Too many features…
![Page 65: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/65.jpg)
Gogolook Confidential
★ Problem 2
How to determine the rule?
![Page 66: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/66.jpg)
Gogolook Confidential
Machine learning★ Solution
![Page 67: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/67.jpg)
Gogolook Confidential
Machine learning★ Solution
Let the machine learn from the data
![Page 68: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/68.jpg)
Gogolook Confidential
What is machine learning?
★ What is machine learning?
機器學習是一種從過去的資料或經驗當中,構造一個模型 (Model),而學習 (Learning) 這件事就是讓這個模型以程式的方式執行,等到學習到一定的程度後,就可以做預測 (猜),這個「猜」是有根據的,且命中率高的。
![Page 69: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/69.jpg)
Gogolook Confidential
Machine learning techniques for classification
★ Machine learning techniques for classification
Support vector machine
Logistic regression
Decision tree
Neural networks
Naïve Bayes
Nonparametric Bayesian method
![Page 70: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/70.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 71: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/71.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 72: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/72.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 73: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/73.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 74: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/74.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 75: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/75.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 76: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/76.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 77: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/77.jpg)
Gogolook Confidential
Support vector machine for binary classification
★Support vector machine for binary classification
![Page 78: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/78.jpg)
Gogolook Confidential
這樣就夠了嗎?
![Page 79: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/79.jpg)
Gogolook Confidential
Real-life scenario
★ Real-life scenario
When will we require a spam number prediction?Ans: The time a phone call reaches a whoscall user
We want to predict whether a number is spam as EARLY as possible in order to prevent further victims…
![Page 80: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/80.jpg)
Gogolook Confidential
Real-life scenario
Time
# recent calls
Victim 1 Victim 2 Victim 3
XX推銷★ Real-life scenario
推銷電話
![Page 81: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/81.jpg)
Gogolook Confidential
Let’s look at the performances of SVM under different numbers of recent calls
![Page 82: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/82.jpg)
Gogolook Confidential
SVM for binary classification
★ SVM for binary classification
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10# recent calls
Accuracy
![Page 83: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/83.jpg)
Gogolook Confidential
嗯…表現的不錯,但是…可以再快一點嗎?
![Page 84: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/84.jpg)
Gogolook Confidential
Reduce the number of features
★ Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
![Page 85: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/85.jpg)
Gogolook Confidential
Reduce the number of features
★ Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
當然我們不是用手去選…
![Page 86: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/86.jpg)
Gogolook Confidential
Reduce the number of features
★ Reduce the number of features
Features computation is time-consuming. So we want to reduce the number of features before we do classification.
Feature selection methods:
Regularization methods
Backward, forward, and stepwise methods
Bayesian feature selection
Random forest method
![Page 87: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/87.jpg)
Gogolook Confidential
Feature selection results
★ Feature selection results
10 15 20 25 30
3 recent calls
5 recent calls
10 recent calls0.8
0.85
0.9
0.95
1.0
# features
Accuracy
![Page 88: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/88.jpg)
Gogolook Confidential
Feature selection results
★ Feature selection results
10 15 20 25 30
3 recent calls
5 recent calls
10 recent calls0.8
0.85
0.9
0.95
1.0
# features
Accuracy
![Page 89: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/89.jpg)
Gogolook Confidential
Feature selection results
★ Feature selection results
10 15 20 25 30
3 recent calls
5 recent calls
10 recent calls0.8
0.85
0.9
0.95
1.0
# features
Accuracy
![Page 90: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/90.jpg)
Gogolook Confidential
Ratio of out calls
Rate of out calls
Ratio of out calls in contact book
Ratio of reciprocal opponents
Ratio of recurring opponents
Median call duration of in calls
Ring duration of answered calls
and more…
★ Selected features
Ratio of missed calls
Rate of new opponents
Ratio of in calls in contact book
![Page 91: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/91.jpg)
Gogolook Confidential
★ Comparison of w/ and w/o feature selection
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10# recent calls
Accuracy
![Page 92: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/92.jpg)
Gogolook Confidential
Done?
阿不就好棒棒?
![Page 93: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/93.jpg)
Gogolook Confidential
What is power?
★ What is power?
Power of class A: The probability of accurately classify a class A sample to class A.
![Page 94: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/94.jpg)
Gogolook Confidential
What is power?
★ What is power?
Power of class A: The probability of accurately classify a class A sample to class A.
性別Classifier
97.5% this is a male
![Page 95: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/95.jpg)
Gogolook Confidential
What is power?
★ What is power?
Power of class A: The probability of accurately classify a class A sample to class A.
性別Classifier
97.5% this is a male
![Page 96: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/96.jpg)
Gogolook Confidential
Power of our classifier
★ Power of our classifier
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Power
![Page 97: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/97.jpg)
Gogolook Confidential
義銘,加油好嗎?
![Page 98: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/98.jpg)
Gogolook Confidential
★ Data summary
Data summary
推銷電話
民調中心
騷擾電話
詐騙電話
70% 1%
5%
24%
# Samples: 7854Normal: 4000Spam: 3854
![Page 99: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/99.jpg)
Gogolook Confidential
★ Data summary
Data summary
推銷電話
民調中心
騷擾電話
詐騙電話
70% 1%
5%
24%
# Samples: 7854Normal: 4000Spam: 3854
![Page 100: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/100.jpg)
Gogolook Confidential
Marketing numbers vs. normal numbers
★ Marketing numbers vs. normal numbers
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Accuracy
![Page 101: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/101.jpg)
Gogolook Confidential
Fraud numbers vs. normal numbers
★ Fraud numbers vs. normal numbers
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Accuracy
![Page 102: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/102.jpg)
Gogolook Confidential
一種摻在一起做撒尿牛丸的概念…
![Page 103: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/103.jpg)
Gogolook Confidential
Power of SVM for multi-classification
★ Power of SVM for multi-classification
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Power
![Page 104: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/104.jpg)
Gogolook Confidential
Power of SVM for binary classification
★ Power of SVM for binary classification
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Power
![Page 105: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/105.jpg)
Gogolook Confidential
What is type I error rate?
★ What is type I error rate?
Type I error: The probability of misclassify a class B sample to class A.
性別Classifier
5% this is a male
![Page 106: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/106.jpg)
Gogolook Confidential
What is type I error rate?
★ What is type I error rate?
Type I error: The probability of misclassify a class B sample to class A.
性別Classifier
5% this is a male
![Page 107: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/107.jpg)
Gogolook Confidential
Type I error comparison
★ Type I error comparison
0
0.05
0.1
0.15
0.3
3 4 5 6 7 8 9 10
# recent calls
Type I error0.2
0.25
![Page 108: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/108.jpg)
Gogolook Confidential
這點小成果讓我稍稍放鬆地去逛街,突然電話響一聲,我開心地接了起來…
![Page 109: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/109.jpg)
Gogolook Confidential
結果,對方掛斷了
![Page 110: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/110.jpg)
Gogolook Confidential
響一聲掛斷的惡意電話
★ 響一聲掛斷的惡意電話
“響一聲掛斷”(one-ring call) 是一種引誘接電話者回撥的惡意電話,通常伴隨著高額付款電話。
於是我們先觀察“響一聲掛斷”這類型電話號碼的 call patterns。
![Page 111: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/111.jpg)
Gogolook Confidential
Call patterns of one-ring calls
★ Call patterns of one-ring calls
Numbers Mean duration of ringing (seconds)
Mean duration ofout calls (seconds)
0982-415-XXX 1.6 0
0982-420-XXX 3.6 0
0982-495-XXX 5.2 1.25
04-3-704-XXXX 0.9 0
0923-931-XXX 6.7 2.6
![Page 112: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/112.jpg)
Gogolook Confidential
Feature comparison
Ratio of new opponents
Fraud Marketing NormalOne-ring0
0.2
0.4
0.6
0.8
![Page 113: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/113.jpg)
Gogolook Confidential
Feature comparison
Ratio of in calls
0
0.1
0.2
0.3
0.4
0.5
Fraud Marketing NormalOne-ring
![Page 114: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/114.jpg)
Gogolook Confidential
Feature comparison
Ratio of missed calls
0
0.2
0.4
0.6
0.8
Fraud Marketing NormalOne-ring
![Page 115: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/115.jpg)
Gogolook Confidential
★ Naïve method
Similarly, without machine learning we can design rules such as:
![Page 116: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/116.jpg)
Gogolook Confidential
★ Naïve method
Similarly, without machine learning we can design rules such as:
Rule1: The mean of the ringing duration is less then 7 seconds. and
Rule 2: The mean of the outcall duration is less than 3 seconds.
Then we claim that it is a one-ring spam call.
![Page 117: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/117.jpg)
Gogolook Confidential
★ Problems
1. Too many features…2. How to determine the rule?3. New observations.
![Page 118: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/118.jpg)
Gogolook Confidential
★ Problem 3
Numbers Mean duration of ringing (seconds)
Mean duration ofout calls (seconds)
0982-415-XXX 1.6 0
0982-420-XXX 3.6 0
0982-495-XXX 5.2 1.25
04-3-704-XXXX 0.9 0
0923-931-XXX 6.7 2.6
![Page 119: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/119.jpg)
Gogolook Confidential
Numbers Mean duration of ringing (seconds)
Mean duration ofout calls (seconds)
0982-415-XXX 1.6 0
0982-420-XXX 3.6 0
0982-495-XXX 5.2 1.25
04-3-704-XXXX 0.9 0
0923-931-XXX 6.7 2.6
04-2-676-XXXX 15.7 1.4
★ Problem 3
New observation
![Page 120: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/120.jpg)
Gogolook Confidential
Numbers Mean duration of ringing (seconds)
Mean duration ofout calls (seconds)
0982-415-XXX 1.6 0
0982-420-XXX 3.6 0
0982-495-XXX 5.2 1.25
04-3-704-XXXX 0.9 0
0923-931-XXX 6.7 2.6
04-2-676-XXXX 15.7 (S.D.=10.7) 1.4
★ Problem 3
![Page 121: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/121.jpg)
Gogolook Confidential
Machine learning can efficiently “learn” from new data and create rules for us.
![Page 122: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/122.jpg)
Gogolook Confidential
Power of SVM for multi-classification
★ Power of SVM for multi-classification
0.8
0.85
0.9
0.95
1.0
3 4 5 6 7 8 9 10
# recent calls
Power
![Page 123: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/123.jpg)
Gogolook Confidential
Accuracy comparison
★ Accuracy comparison
3 4 5 6 7 8 9 10
# recent calls
0
0.05
0.1
0.15
0.3
0.2
0.25
Type I error
![Page 124: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/124.jpg)
Gogolook Confidential
DeploymentAll the algorithms have been implemented in the whoscall app, so how does it work?
![Page 125: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/125.jpg)
Gogolook Confidential
OO推銷
小明
Data center
Classifier calculating…
0984-003-XXX
回傳: 此號碼可能為推銷電話
所需時間: 50-100 milliseconds
![Page 126: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/126.jpg)
Gogolook Confidential
What’s next?
![Page 127: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/127.jpg)
Gogolook Confidential
Improvements of the classification model
1. Fraud numbers analysis2. Fuzzy classification algorithm3. Spam-category scores4. Cooperate with more solid outside sources5. Generalize to other countries.
Much more…
★ Improvements of the classification model
![Page 128: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/128.jpg)
Gogolook Confidential
Future perspectives
1. User’s tag correction mechanisms2. Personalized penalty setting3. Anti-countermeasures4. Extend to SMS spam detection5. Clustering vs. user tags6. Spam detect Scam detection
★ Future perspectives
![Page 129: 資料科學在 Whoscall 產品體系中的角色](https://reader030.fdocument.pub/reader030/viewer/2022012401/54701093af795996308b4601/html5/thumbnails/129.jpg)
Gogolook Confidential
Creating a contact network of trust
感謝大家寶貴的時間