Kinesis / Lambda / EMR / Redshift 를 이용한 Big Data 분석 - 이상현 (빙글)
-
Upload
aws-korea-usergroup-aws -
Category
Technology
-
view
323 -
download
3
Transcript of Kinesis / Lambda / EMR / Redshift 를 이용한 Big Data 분석 - 이상현 (빙글)
Big�Data�Analysis�With�API�Gateway�/�Lambda
이상현�Kurt�Lee�
Vingle�Inc�https://www.vingle.net�
iOS�/�Frontend�/�Backend�Technical�Lead�[email protected]��https://github.com/breath103
1.�저희는�데이터를�정말�많이�수집합니다..
{�����content:{���������type:�'post',���������id:�12345,���������position_x:�2,���������position_y:�2�����},�����referral:{���������category:�'newsfeed',���������area:�'newsfeed'�����},�����action:{���������type:�'impression',�����}�}
1.�저희는�데이터를�정말�많이�수집합니다..
{�����content:{���������type:�'post',���������id:�12345,����},�����referral:{���������category:�'newsfeed',���������area:�‘newsfeed’,���������resource_id:�‘12345’�����},�����action:{���������type:�‘read',�����}�}
1. 저희는 데이터를 정말 많이 수집합니다..
{ content:{ type: ‘webpage', id: “http://www.rog..”, }, referral:{ category: ‘card_show', area: ‘card_show’, resource_id: ‘12345’ }, action:{ type: ‘read',
duration: 5.6, } }
1.�저희는�데이터를�정말�많이�수집합니다..
30,000��Record�Per�Minute
24,000,000�Byte�Per�Minute
Architecture�1
결과물(예시)
새로운 아키텍쳐를 위한 요구사항들
기술적으로,1) Scalability 확보2) Main web server에 영향을 안주도록3) S3에 넣기전에 간단한 수준의 data validation / formatting을 할수 있도록
BI적으로,1) 모든 데이터를 다 redshift에 넣는건 의미 없다. 날짜별, Cohort별, 유저 그룹별, Aggregation 된 데이터가 훨씬 중요
Architecture�2
Lamba�Codeimport�AWS�from�'aws-sdk';�
export�default�function�(event,�context,�callback)�{���const�httpHeader�=�new�HTTPHeader(event.headers);���const�formattedTickets�=�TicketFormatter.format(event.body.data);���User.fetchUserByToken(httpHeader.token)�����.then((user)�=>�{�������const�firehoseClient�=�new�AWS.Firehose({���������region:�'us-east-1',���������httpOptions:�{�timeout:�5000�},�������});�������firehoseClient.putRecordBatch({���������DeliveryStreamName:�streamName,���������Records:�formattedTickets.map((ticket)�=>�{�����������const�data�=�{�������������Data:�`${JSON.stringify(ticket)}\n`,�����������};�����������return�data;���������}),�������},�(err,�data)�=>�{���������callback({�����������failedPutCount:�data.FailedPutCount,�����������succeedPutCount:�data.RequestResponses.length�-�data.FailedPutCount,���������});�������});�����});�});�
Pricing
=>�신기할�정도로�쌈…�
Monitoring
Cloudwatch�Dashboard를�이용하면��Lambda�Invocation,�Errors,�Duration까지�한눈에�파악�가능
Cloudwatch�Alarm을�통해� Slack으로�실시간�모니터링도�가능
Deployment
과정�(Node.js�+�Babel�ES6)기준�1. Babel�Compile�2. npm�install�—production�3. Zip�압축��4. Lambda로�업로드�5. 끝^^
=>�쉽고�간단.�배포�속도도�상당히�빠름.����업로드�뒤�1~2분�이내에�적용됨���AWS-SDK�좀�다뤄봤다면�자동화도�간단
Deployment
master를�자동�빌드�하고,��빌드�할때마다�Lambda로새로운�버젼을�업로드�하고��해당�버젼을�git-tag로�추가
Rollback->�Alias를�사용한다면,�����Alias�버젼만�바꿔주면됨.�����마찬가지로�즉시�적용
->�AWS-SDK에서�API로�지원하는기능.�자동화�가능
TIP-11. Lambda�Container는�(불명확하지만)생명주기가�있다
ex)�DB�Connection�이�필요할때,�매�invocation마다�만들지�������말고,�Container�전역�변수로�만들어서�재활용하자�������(Lambda는�Duration이�pricing기준에�있다)
TIP-21. Environment�Variables는�Lambda�Code에�박지말고�API�Gateway에�넣어서�Lambda로�
종합1. Lambda�좋습니다.�특히�마이크로서비스에�
2. Deployment�/�Pricing�/�scalability�/�Monitorability�모두�일반적인�
서버�구축�툴�(Opsworks�/�ECS�/�ElasticBeansTalk)�에�비해�좋습니
다�
3. 다만�특성을�제대로�이해하고�써야합니다.�생각보다�문서로�설명해주
지�않는�특성들이�많아요�
4. API�Gateway도�마찬가지.�좋지만�문서로�설명�안해주는�기능�/�특성
이�많습니다�
5. aws-sdk를�이용한�자동화를�적극�활용�추천�
6. 사용하는�아마존�리소스가�워낙�다양하고�많아서�한눈에�보기가�힘듭
니다.�AWS에서�관련�포탈을�하나�만들어주시기를…