To Have Own Data Analytics Platform, Or NOT To

To Have Own Data Analytics Platform, Or NOT To青山エンジニア勉強交流会 April 24, 2017

Satoshi Tagomori (@tagomoris)

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

http://tsuchinoko.dmmlabs.com/?p=1770

At Feb 23, 2015• To Have Own Data Analytics Platform, Or NOT To,

In Startup Companies:

• "NOT To, in general"

• Data analytics services: • AWS EMR, Redshift • Google BigQuery • Treasure Data

Options In 2017• On Premise

• Cloudera CDH, Hortonworks HDP, ...

• Services • AWS EMR, Redshift, Athena, Kinesis Analytics, ... • Google BigQuery, Cloud Dataflow, Cloud

Dataproc, ... • MS Azure SQL Data Warehouse, Stream Analytics,

Data Lake Analytics, ... • Treasure Data

TO HAVE OR

NOT TO HAVE ?

DO NOT

Anyway,

NO FINE CONCLUSION IN THIS PRESENTATION

On Premise Platform In Past• 2011-2014: On-premise Hadoop&Presto cluster

• w/ Fluentd stream processing cluster • w/ Norikra stream processing • w/ Web UI (Shib)

https://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-2014-taiwan

To Be Considered• Distributed Processing Platform

• Data Management

• Process Management

• Platform Management

• Visualization and BI

• Connecting Data

Distributed Processing Platform

• Hadoop, Presto, Spark, Flink, Storm, ... • + Servers

• EMR, Redshift, Dataproc, ... • Cost per instances

• BigQuery, Athena, Treasure Data, .... • Cost per data/queries/...

Data Management

• How to collect data?

• How to ingest data?

• How to manage schema?

• How to move data from here to there?

Process Management

• How to run queries on schedule?

• How to build workflow between queries?

• How to run queries after data ingestion?

• How to move data from the platform to elsewhere after queries?

Platform Management• How to upgrade software?

• How to add nodes?

• How to manage failures / downtime?

• How to replace hardware?

• How to switch platforms?

• How to provide compatibility for queries?

Visualization and BI

• How to show query results graphically?

• How to show relations between data graphically?

• How to query data interactively?

Connecting Data• How to join logs and master data?

• How to join logs and user list?

• How to join logs and CRM data?

• How to push query results to marketing tools/services?

• How to send notifications using query results?

Additional Topics

• Stream Processing Platform

• Machine Learning Platform

• AI(?) Services

In My Past Case:• Distributed Processing Platform

• Hadoop & Presto (& Norikra)

• Data Management • Hive schema & Custom made UI (Shib) • Managed by engineers of each services

• Process Management • Custom made query scheduler (ShibUI)

• Platform Management • By tagomoris

• Visualization, BI: N/A

• Connecting Data: N/A

About Treasure Data• Distributed Processing Platform: Hive, Presto

• Data Management: Fluentd & Schema-less DB

• Process Management: Digdag / Treasure Workflow

• Platform Management: Automatic

• Visualization and BI: Treasure BI

• Connecting Data: Embulk / Data Connector

Recent Improvements around Data Analytics

• Improvements of CDH/HDP to manage clusters • Online Upgrade • Support many processing frameworks

• Many new data processing software/frameworks • Apache Flink, Apache Arrow, Apache Beam, ...

• Many new services available • Stream processing, Machine learning, ...

• Saving money is important - it's true.

• Saving money introduces many issues - it's true!

• Money solves many problems - is it true?

Complexity

• Connecting data / processing with applications

• Connecting data / processing with services

• Connecting data / processing with people

Chasing the World• Many new software / services / platform /

paradigm, day by day

• Data sizes are growing day by day

• Complexity is growing day by day

• A data platform CANNOT live as-is 5 years!

Finding Treasure From Data

• "Data Processing" is: • NOT the purpose • just a tool to get something great

• Use developers and their time to find treasures!

Thank you! @tagomoris

To Have Own Data Analytics Platform, Or NOT To

Software

Transcript of To Have Own Data Analytics Platform, Or NOT To

A Practical Guide to Evaluating Your Own Programs · A Practical Guide to Evaluating Your Own Programs by ... A Practical Guide to ... Needs Assessment and Evaluation Guide, ...

Teenagers should be allowed to choose their own clothes Unit 3.

To Each Their Own: How to Solve Analytic Complexity

How to build your own robot with ibm bluemix&watson

A cloud-to-edge approach to support predictive analytics ...serena-project.eu/wp-content/uploads/A-cloud-to... · 13 such as predictive analytics, visualization and scheduling, integrated

Ten Ways to Make Analytics Actionable

How to Develop Your Own Hot Selling Product

Introduction to Google Analytics - Training 2016

Be Your Own Boss - An Introduction to freelancing

Introduction to Advanced Analytics with SharePoint Composites

Applying Tableau to Strategy and Analytics The Medical ...academicdepartments.musc.edu/chp/mha/leadership_conference/Sp… · Applying Tableau to Strategy and Analytics The Medical

BYOD: Bring Your Own Device Dodge Elementary. What is it? Bring Your Own Device (BYOD) is a program that allows you to bring your own device to school.

Welcome To Sharon - my cousin's very own town

how to build you own child's playground

10 Step Guide to Analytics

Forensics to Improve Data Analytics in Fraud Detection · PDF fileForensics to Improve Data Analytics in Fraud Detection Franklin M Din ... Forensics to Improve Data Analytics in ...

How to Create your Own Exchange Compatible Backend

Should you start your own law firm top 10 reasons to start or not start your own firm

Help pasirranji to have their own water supply

Network Analytics to improve customer experience