Building Data-Centric Businesses

Post on 10-Jan-2017

14.717 views 0 download

Transcript of Building Data-Centric Businesses

Daniel Aragao & Simon Hope

Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher

REALESTATE.COM.AU

6BMarket Cap

11MAustralian Properties

55MVisits in September

4.7MApp Downloads …and counting

3,500PEOPLE

13COUNTRIES

34OFFICES

TECHNOLOGY &

SOCIAL JUSTICE

• In the beginning…

• Organising our Data

• Implementation approaches

• Hipster Batches

• Reactify

• Bring Your Own Data

• Finding the Data

• What we have learned so far

THIS IS WHAT THE STORY IS ABOUT

SORRY… IT’S OK TO LEAVE NOW

• Nope, we didn’t create a new Hadoop

• No hardcore Data Science

• There are some implementation details

• REA embraced the Cloud. AWS everywhere

• Under construction

IN THE BEGINNING…

ORGANISING OUR DATA

Increasingly, content is being distributed through searchand social platforms... There’s less visiting of publishers as destinations.

Jeff Weiner, CEO, Linkedin

Data sources

Data warehouse

PROBLEM…

STRATEGY…

STRATEGY…

STRATEGY…

Data Warehouse

StagingSSIS Dim Fact

PROBLEM…

Data Warehouse

StagingSSIS Dim Fact

PROBLEM…

Star schema leaky details

No Data Warehouse

StagingSSIS Dim Fact

STRATEGY…

STRATEGY…

Data Warehouse Facade

StagingSSIS Dim Fact

???

WHAT’S IN THE BOX?

Good things come in small packages services

THE HIPSTER BATCH

???

Hipster Batch

Hipster Batch

THE HIPSTER BATCH

• Small and short lived

• Decoupled via flat files via S3

• Single purpose

• Idempotent

• Polyglot

• Minimal runtime dependencies

• Discoverable

SNS, SQS

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

SNS, SQS

ASG, ECS, Lambda

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

SNS, SQS

ASG, ECS, Lambda

KMS

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

Logs

SNS, SQS

ASG, ECS, Lambda

KMS

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

Logs

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

Logs

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

S3 buckets

Data

A ‘TYPICAL’ IMPLEMENTATIONHipster Batch

Hipster Batch

HIPSTER BATCH DOES SCIENCE

• Behavioural models for targeted marketing

• Recommendation engine

• External channels

Hipster BatchSCIENCE!

x 20

Hipster Batch

Stats models

SCIENCE!

x 20

API

Hipster Batch

Stats models

SCIENCE!

API

x 20

API

Hipster Batch

Stats models

SCIENCE!

API

x 20

API

Hipster Batch

Stats models

SCIENCE!

API

x 20

API

Hipster Batch

Stats models

GoogleNowAPI

SCIENCE!

From legacy to reactive

REACTIFY

Reactify

???

Reactify

http://www.reactivemanifesto.org

REACTIFY

• Manage Data flow with messages

• Protect consumers and care about isolation

• Resilience is important and Data replication is just fine

• Demand is elastic - and your components should be too

Reactify

Listings

Data coupling

No resilience or elasticity

Coupling

PROBLEM…

Reactify

Listings

SOLUTION…

Reactify

Listings Reactify

SOLUTION…

Reactify

Listings Reactify

SOLUTION…

Reactify

Listings ReactifyHipster Batch

SOLUTION…

Reactify

Listings ReactifyHipster Batch

Shielded consumers

IsolationDecoupled

SOLUTION…

Reactify

Listings

IMPLEMENTATION…

Reactify

ListingsRESTAPI

IMPLEMENTATION…

Reactify

ListingsRESTAPI

IMPLEMENTATION…

Reactify

ListingsRESTAPI Dynamo

Event Maker

Event Differ

IMPLEMENTATION…

Reactify

ListingsRESTAPI Dynamo

Event Maker

Event Differ

Kinesis

2

IMPLEMENTATION…

2

• Exposes current state only

• Stream of change notifications

• Hypertext Application Language - HAL

• Clear entity types

• Linking over embedding

• Cacheable and discoverable

REST API

REACTIFY REST API

REST API

https://feeds.listings.realestate.com.au/combined-listings/120449689

REST API

https://feeds.listings.realestate.com.au/combined-listings/120449689

REST API

https://feeds.listings.realestate.com.au/combined-listings/120449689

REST API

https://feeds.listings.realestate.com.au/combined-listings/120449689

REST API

Event Maker

https://feeds.listings.realestate.com.au/combined-listings/-/changes

REST API

Event Maker

https://feeds.listings.realestate.com.au/combined-listings/-/changes

REST API

Event Maker

https://feeds.listings.realestate.com.au/combined-listings/-/changes

REST API

Event Maker

https://feeds.listings.realestate.com.au/combined-listings/-/changes

Reactify

Event Differ

Reactify

Event Differ

Reactify

Event Differ

Reactify

Event Differ

The octopus in the box

— Did you use that data set? — Errr… No, we have another one

BRING YOUR OWN DATA

BRING YOUR OWN DATA - BYOD

• Allow data to flow freely

• Help the business to get what they need when they need it

• Self-service

BYOD

BYOD

CSV

BYOD

CSV

x 5

BYOD

CSV

x 5

Smarts on datatypes

BYOD

CSV

x 5

TableauServer

Smarts on datatypes

BYOD

CSV

x 5

TableauServer

Smarts on datatypes

BYOD

CSV

x 5

TableauServer

Audit, auth, share…

Smarts on datatypes

These were the implementation approaches, now to…

FIND THE DATA

Meaningful, automated, and easy-to-search metadata

WE TRIED

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

Logs

MORE THAN DATAHipster Batch

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

Logs

MORE THAN DATAHipster Batch

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

Logs

Dataz

Ancestry

MORE THAN DATAHipster Batch

SNS, SQS

ASG, ECS, Lambda

KMS

Cloudwatch

Logs

Dataz

Ancestry

Metadata

MORE THAN DATAHipster Batch

Ancestry

Ancestry

Ancestry

Ancestry

Ancestry

RESTAPI

METADATA PIPELINE

Producers

RESTAPI

Ancestry

Ancestry

Ancestry

METADATA PIPELINE

Producers

RESTAPI

Ancestry

Ancestry

Ancestry

METADATA PIPELINE

Producers

RESTAPI

Ancestry

Ancestry

Ancestry

METADATA PIPELINE

Producers

Scrapy

RESTAPI

Ancestry

Ancestry

Ancestry

METADATA PIPELINE

Producers

Scrapy

RESTAPI

Ancestry

Ancestry

Ancestry

METADATA PIPELINE

Producers

Scrapy

WHAT WE HAVE LEARNED SO FAR

• Consumers create the last-mile data as needed

• We must work with external, independent delivery channels

• Push quality back to source/producer systems

• Data belongs to the entire organisation, not to a single team

I’ll give you my Data Warehouse when you can pry it from my cold dead hands.

THANK YOU

Daniel Aragao Simon Hope@dear_dr_dan @mapbutcher

REALESTATE.COM.AU