Valentine Gogichashvili
Head of Data Engineering @ZalandoTech twitter: @valgog google+: +valgog email: [email protected]
15 countries
4 fulfillment centers
18+ million active customers
~3 billion € revenue
150,000+ products
10,000+ employees
135 million visits per month
One of Europe's largest online fashion retailers
Good old small world
Started as a tiny online shop
Prototyped on Magento (PHP)
Used MySQL as a database
Web Application
Backend
Database
REBOOT
5½ years ago
• Java
• macro service architecture with SOAP as RPC layer
• PostgreSQL
• Heavy usage of Stored Procedures
• 4 databases + 1 sharded database on 2 shards
• Python for tooling (i.e code deploy automation)
REBOOT
Java Web Frontend
Java Backend
PostgreSQL
Java Backend
PostgreSQL
Java Backend
PostgreSQL 9.0 RC1 PostgreSQL 9.0
RC1 PostgreSQL 9.0 RC1 PostgreSQL
"macro" services
REBOOT
• PostgreSQL
• Heavy usage of Stored Procedures
• clean transaction scope
• very clean data
• processing close to data
REBOOT
• PostgreSQL
• Java Sproc Wrapper
• complex type mapping
• transparent sharding
Live long and prosper…
Very stable architecture that is still in use in the oldest
(vintage) components
We implemented everything ourselves starting from
warehouse and order management and finishing with
Web Shop and Mobile Applications
Live long and prosper…
"I want to code in Scala/Clojure/Haskell because it is cool and compact"
"But nobody will be able to support your code if you leave the company,
everybody should use Java, learn SQL and write Stored Procedures"
Live long and prosper…
"I want to code in Scala/Clojure/Haskell because it is cool and compact"
"But nobody will be able to support your code if you leave the company,
everybody should use Java, learn SQL and write Stored Procedures"
"I like the team and people but I am moving on to another company where I
can use cooler technologies!"
Autonomous teams
● can choose own technology stack
● including persistence layer
● are responsible for operations
● should use isolated AWS accounts
Supporting autonomy — Microservices
Business Logic
Data Storage
Team A Business Logic
Data Storage
Team B
RES
T A
PI R
EST AP
I
Applications communicate using REST APIs
Databases hidden behind the walls of AWS VPC
public Internet
Supporting autonomy — Microservices
Business Logic
Data Storage
Team A Business Logic
Data Storage
Team B
RES
T A
PI R
EST AP
I
public Internet
Classical ETL process is impossible!
Data integration in the classical world
Business Logic DBA Business Intelligence
Data Warehouse (DWH)
Database
ETL process
DBA
BI
Dev
Data integration in the classical world
Classical ETL process
Business Logic
Data Warehouse (DWH)
Database DBA
BI
Business Logic
Database
Business Logic
Database
Business Logic
Database
Dev
Data integration in the classical world
Classical ETL process • Use-case specific — a lot of manual work
• Usually outputs data into a Data Warehouse
• well structured
• easy to use by the end user (SQL)
Data integration in the world of microservices
Business Logic
Database R
EST
AP
I
Business Logic
Database
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
RES
T A
PI
Data integration in the world of microservices
Queues to the rescue!
But will be there a unified bus?
Data integration in the world of microservices
Queues to the rescue!
But will be there a unified bus?
What about form and schema?
Data integration in the world of microservices
Business Logic
Database R
EST
AP
I
Business Logic
Database
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
RES
T A
PI
Ap
p A
Ap
p B
Ap
p C
Ap
p D
Business Intelligence
Data integration in the world of microservices
Business Logic
Database R
EST
AP
I
Business Logic
Database
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
RES
T A
PI
Ap
p A
Ap
p B
Ap
p C
Ap
p D
Business Intelligence
Data integration in the world of microservices
Business Logic
Database R
EST
AP
I
Business Logic
Database
RES
T A
PI
Business Logic
Database
RES
T A
PI
Business Logic
RES
T A
PI
Ap
p A
Ap
p B
Ap
p C
Ap
p D
Data heavy services (ML and DDDM)
Nakadi Event Bus
• A secured HTTP API
This allows microservices teams to maintain service boundaries, and not
directly depend on any specific message broker technology. Access to the API
can be managed and secured using OAuth scopes;
• An event type registry
Events sent to Nakadi can be defined with a schema and managed via a
registry. Events can be validated before they are distributed to consumers;
• Inbuilt event types
Nakadi has optional support for events describing business processes and
data changes using standard primitives for identity, timestamps, event types,
and causality
Nakadi Event Bus
• Schema versioning
Enable schema evolution in the backwards compatible way;
• Low latency event delivery
Once a publisher sends an event using a simple HTTP POST, consumers can be
pushed to via a streaming HTTP connection, allowing near real-time event
processing;
• Built on proven infrastructure
Nakadi uses Apache Kafka as its internal message broker and PostgreSQL as a
backing database for storing metadata and schema information.
Nakadi Cluster
Schema validation
Schema metadata
store
Event metadata enrichment
ZooKeeper config store
Messaging Backend (Kafka now, Kynesis & others later)
Nakadi Broker
Management of offsets for high level API
Nakadi Event Bus — next steps
Nakadi Cluster (physically separated
per event type)
Nakadi Cluster (physically separated
per event type)
Nakadi UI
Nakadi Event Bus — next steps
Nakadi Cluster (physically separated
per event type)
Nakadi Cluster (physically separated
per event type)
Nakadi Schema Manager
Nakadi UI
Nakadi Event Bus — next steps
Nakadi Cluster (physically separated
per event type)
Nakadi Cluster (physically separated
per event type)
Nakadi Schema Manager
Nakadi Schema Crawler
Nakadi UI
Nakadi Event Bus — next steps
Nakadi Cluster (physically separated
per event type)
Nakadi Cluster (physically separated
per event type)
Nakadi Schema Manager
Nakadi Schema Crawler
Nakadi UI
Nakadi Topology Manager
Bubuku Kafka AWS supervisor
• IP address discovery using AWS
• Exhibitor discovery using AWS load balancer name
• Rebalance partitions on different events
• React on Exhibitor topology changes
• Automatic Kafka restart in case if broker is considered dead
• Graceful broker termination in case of supervisor stop
• Broker start/stop/restart synchronization across the cluster
Open Source at Zalando
https://zalando.github.io/
https://github.com/zalando/zalando-howto-open-source
https://github.com/zalando/nakadi
https://github.com/zalando-incubator/bubuku
STUPS.io for responsible organizations in AWS
Top Related