Tokyo azure meetup #2 big data made easy

42
Big Data Made Easy with Azure Data Lake Kanio Dimitrov, Tokyo Azure Meetup

Transcript of Tokyo azure meetup #2 big data made easy

PowerPoint Presentation

Big Data Made Easy with Azure Data LakeKanio Dimitrov,Tokyo Azure Meetup

https://www.getpostman.com/collections/2449c4125d7af478aed8http://azjobsdemo.azurewebsites.net/https://searchsamples.azurewebsites.net/#/1

Kanio Dimitrov (KD) - https://www.getpostman.com/collections/2449c4125d7af478aed8Kanio Dimitrov (KD) - https://searchsamples.azurewebsites.net/#/Kanio Dimitrov (KD) - http://azjobsdemo.azurewebsites.net/About Me

Azure Architect & Advisor

Tokyo Azure Meetup Host

twitter: @azurekanio

blog: https://azurekan.wordpress.com/

Big Data Made Easy WhenEasy to manage

Easy to debug

Easy to optimize

Key PointsAny Data

Enterprise

Developers

What is Azure Data Lake?

CosmosInternal Microsoft System

10 000 Developers

100 000-s interactive jobs/day

Exabytes of data

Microsoft Core System

From Cosmos to Azure Data Lake

Azure Data Lake StoreAzure Data Lake AnalyticsEase of use

Ability to Scale

Offered to the public

Azure Data Lake Based on Open Source

Easy to StartCreate ADL Store Account

Create ADL Analytics Account (90 seconds, free)

Write & Submit U-SQL script

U-SQL job executes

ADL Analytics

Distributed analysis service

Built on Apache YARN

Dynamic scaling

ADL Analytics

Pay per query

Scale per query

Federated query

ADL Analytics

Uses U-SQL - C# & SQL

No Scale limits

Optimized to work with ADL Store

Data SourceReadWriteADL StoreYesYesStorage BlobYesYesAzure SQLYesIn FutureAzure SQL Data WarehouseYesIn FutureAzure SQL DB in VMYesIn FutureOn Premise Data SourcesIn FutureIn Future

ADL Analytics AdministrationWeb-based management in Azure Portal

Automation with PowerShell

Role-Based Access Control with Azure Active Directory

Monitory Service Operation and Activity

DevelopmentAuthor, debug & optimize Big Data applications in Visual Studio

Languages: U-SQL & Hive (coming soon)

.Net integration with U-SQL

ADL Analytics SDK-sJAVAC++.NETNode.jsPythonU-SQL ExtensibilityYesManagement OperationsBy GAYesYesBy GA

U-SQLSQL

Support of familiar SQL clauses

Structured and Unstructured Data

Relational metadata objects.NET

U-SQL - full C# expressions

Reuse .NET code

Use C# for defining:Types, Functions, Joins, Aggregations, I/O (Extractors, Outputters)

Logical Plan -> Physical Plan

One node perspective

Physical plan created

Defines level of parallelism

HDInsightManaged Hadoop Cluster in the Cloud

Deploy Storm jobs from Visual Studio

Use C# to author event processing logic

Integrate existing packages & code

ADLA vs HDInsight

Azure Data Lake AnalyticsHDInsightAutomatically ScaleStart quickly with C#, SQL, Visual StudioJobs - Convenient, efficient, automatic scale Leverage open source tech Java, Eclipse, HiveManage clusters customization, control and flexibility

Azure Data Lake Store

Hyper Scale Web HDFS store in the cloud

Store any data in native format

Enterprise grade

No limits to Scale

Optimized for analytic workload performance

Azure Data Lake Store

Unlimited Storage (petabytes)

Optimized for AnalyticsParallel computing optimizedAuto optimization for any throughput

ReliableAutomatically replicates data (3 copies)Highly available

Integration

ADL Store

HDInsight

ADL StoreSDK-sJAVAC++.NETNode.jsPythonU-SQL ExtensibilityWebHDFS ClientLibWebHDFSYesYesBy GAManagement OperationsBy GAYesYesBy GA

Visual Studio Tools

ADL Analytics Billing ModelAccount is free

Pay for compute nodes for the duration of query

Formula (GA) = 5 cents + (minutes x parallelism x Analytics Unit Price)

Analytics Unit Price - $0.017 / minute

Preview 50% discount

ADL Store Billing ModelAccount is free

Pay for amount of data - $0.08 / GB per month

Pay for number of I/O operations - $0.14 / million transactions

Preview 50% discount

Security

Based on Azure Active Directory

Federate with Enterprise Active Directory

Two factor authentication

Security - AccessRole Based Access Control

Custom access with POSIX ACLs

Permissions for specific named users or groups

Security - RuntimeAll user code runs in VM

VM-s are locked down

Detailed audit records who, when, what, how long

Audit logs available out of the box

Security - EncryptionEncryption on the wire - Data uploaded via HTTPS

Encryption at rest after public preview

Integration with Azure Key Vault for keys

Encryption is optional

Security - Compliance

Certification

Azure compliance requirements

External auditing

DEMO

Tokyo Azure Meetup Learn | Share | Enjoy Cool Demos!