Prashant_Agrawal_CV

PRASHANT AGRAWALEmail: [email protected]: +91-8097606642

Professional Summary:

● Innovative Software Professional with 5+ years of progressive experience and continued success as a Big Data Analyst.

● Day to day experience in working with Agile/Scrum methodology.● Vast experience in search engine solution viz. E-commerce search, Enterprise

Search, Log Analytics and Monitoring etc.● Hands on Exposure on Log analytics for various logs such as syslog, authlog,

Postfix logs, Router logs, apache logs, netflow logs, Application logs using ELK● Hands on experience on ETL using Spark, Spark Streaming and Spark SQL● Full Text Search Solution with analytics and visualization using Elasticsearch,

Logstash and Kibana.● Good knowledge on distributed computing system such as Hadoop Eco

System, Spark, Flume etc. to analyze the network logs and perform ETL on various data sets.

● Hands on experience on working with HDP (Horton Works) cluster with various components such as Spark, HDFS, Flume, Kafka, Yarn, Oozie, Phoenix, Presto etc.

● Good hands on with SVN, VSS, Git and build tools like Maven.● Hands on experience in developing the product for capturing, intercepting and

monitoring the internet traffic for LEA’s (Law Enforcement Agency’s).● Good exposure in handling the big data (in TB’s) with Elasticsearch using

cluster of 12 nodes at deployment level.● Good exposure on writing the spark application using Scala

Domain and Skill Set:

Domain Engineering and Network forensics, Big Data Analytics , Digital Marketing and Advertisement

Programming Languages Core Java, Scala , C#, PHP

Operating System Mac, Linux (Ubuntu ,Red Hat, Cent OS) , Windows7

Tools /DB/PackagesElasticsearch, Logstash, Kibana, X-pack, Spark, Flume, Oozie, Intellij Idea, Visual Studio 2010, MySQL, Version Control(Perforce, SVN, VSS, GIT), Defect Tracker(Jira, Fogbugz)

of 1

mailto:[email protected]?subject=Call%20for%20Interview

Professional Project Details:

Project - 1

Project Name Log Analytics and Visualization using ELK

Team Size 1

Start Date Jan 2016 End Date Till Now

DescriptionThis Project Involves Log analytics using ELK which also caters writing of Elastic queries for E-commerce and Enterprise search

Role & Contribution

● ELK 5.x Setup and Configuration on various OS such as Mac, Linux, Windows

● Migration of data from ELK 2.x to 5.x● Well verse in setting up the various nodes in prod cluster like master, data

and client node.● Implementation of shards and replica (to avoid single node failure) for better

management of indexes.● Preparation of schema and analyzers (through template) to store the data in

elasticsearch● Written Elasticsearch query to support various search features such as Auto

Complete, Synonyms, Grammar Based Search, Exact and Non Exact search, misspelled search, aggregation, Boolean search, aggregations etc.

● Build and development of Logstash plugin to support specific requirement or feature.

● Data extraction using various logstash Input Plugin such as JDBC, File, TCP, UDP, S3 etc.

● Data filtration using Logstash Filter plugin (CSV, Grok, Mutate, Date, Geo etc.)

● Data indexing to Elasticsearch using Logstash output plugin● Used various beats as data shipper to Logstash. Includes various beats such

as File beat, Metric beat etc.● Data visualization and dashboard reporting using Kibana● Setting up backup and restore using snapshot and restore● Fine tuning and optimization of queries in order to get response faster.● Preparation of multi index architecture (Time series Indexing arch) in order

to perform faster search and get the response as quick as possible.

Technologies JSON

Tools/Tool Elasticsearch 5.x/2.x, Logstash 5.x/2.x, Kibana 5.x/4.x, Beats, X-

of 2

chain pack, Head Plug-in, Kopf Plug-in, Carrot2, Lingo3g, Putty, Win SCP

Project - 2

Project Name Data Lake Modules in Spark Team Size 1

Start Date June 2016 End Date Till Now

Description This project involves creating a generic data lake solution to migrate the data from one SQL/No SQL database to another.

Role & Contribution

● Created a module on spark which reads data from SQL or Hbase and dumps the same to HDFS as AVRO or ORC

● Module is created in a way where user can specify their input type if its has to be Hbase or SQL and output type as to be ORC or AVRO in HDFS

● Implemented the data lake modules to run periodically using the oozie scheduler, where job runs every hour and dump the data.

● Added a functionality to auto clean up the older dumps which is X version and Y days are old.

● Created an offline index module which acts as secondary index to hbase table with required fields only to speedup the select query.

Technologies Scala

Tools/Tool chain

Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Hbase , Hive , Maven, Git

Project - 3

Project Name Spark ETL for fact and dim types of data

Team Size 2

of 3

Start Date Jan 2016 End Date May 2016

DescriptionThis project involves ETL processing with Spark which involves pulling up the information from a variety of sources, transforms the data, and then pushes to Presto for OLTP/OLAP analytics for dim and fact type of data.

Role & Contribution

● Consume the dim/fact Kafka message in spark using Kafka consumer which are produced by Kafka producer

● Extract the messages consumed by Kafka consumer● Perform business logic and transformation on those messages● Push the transformed data to either Hive or Presto for further OLTP and

OLAP analytics

Technologies Scala

Tools/Tool chain

Maven, Git, Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Presto

Project - 4

Project Name Deployment of Elasticsearch to handle big data in IIMS

Team Size 2

Start Date Feb 2014 End Date Dec 2015

DescriptionThis Project Involves deployment of Elasticsearch to handle data in GB’s (in a day)

Role & Contribution

● Deployment architecture preparation to handle such a big data using cluster with 12 nodes.

● Data visualization and dashboard loading using Kibana● Well verse in setting up the various nodes in deployment

such as Master, Data and client node.● Hands on experience with various plug-in such as Mapper

Attachment, Head, Big desk, Carrot2 (with lingo3g categorization algorithm)

● Preparation of schema to store the data in Elasticsearch● Preparation of search query to be used for retrieval of data

from Elasticsearch using Query String, Match, Boolean, Aggregation etc.

● Though its big data still backup and restore is required in case of any failure hence setup the Snapshot and restore process to take daily backup of the same.

of 4

● Implementation of shards and replica (to avoid single node failure)

● Fine tuning and optimization of queries in order to get response faster.

● Preparation of multi index architecture in order to perform faster search and get the response as quick as possible.

● Implementation of feature like Synonyms, Stemming, Grammar extension, wild card search etc.

Technologies C#, Elasticsearch(NoSQL Database)

Tools/Tool chain

Elasticsearch 1.5.x, Mapper Attachment, Head Plug-in, Big desk Plug-in, Carrot2, Lingo3g, Putty, WinSCP

Project – 5

Project Name Big Data Platform Development Team Size 2

Start Date June 2015 End Date Dec 2015

Description

This project involves processing of the Logs being generated from various system and devices and perform predictive analysis to form the pattern and trace the attacking device or user

Role & Contribution

● Collecting the high speed logs coming from various devices and inject the same using Flume

● Pass on the logs data from flume to spark streaming and spark SQL so as to store the same onto Memory (As Spark is being known for in memory data processing)

● Perform predictive analysis onto the log as per the defined algorithm, also perform the computation with self derived algorithm as well

● Persist the log, in memory for specific time duration using spark streaming and then persist the same permanently to Elastic

● Performed all above operation and computation using distributed computing. Which includes setting up the 5 node cluster of HDFS using Horton Works Development Platform

Technologies Core Java, Flume, HDFS, HDP Clustering, Spark, Spark Streaming and Spark SQL, Elasticsearch

of 5

Tools/Tool chain Maven, Git, MySQL

Educational Qualifications:

Course Board/University Year of Passing Percentage10th CBSE 2005 79.2012th CBSE 2007 76.00B.E.(Computer Science)

RGPV Bhopal 2011 76.44

GATE - 2011 91 PercentileCAT - 2010 85 Percentile

Personal Profile:

Date of Birth : 23rd Nov, 1989Passport No : J3031277Willing to re-allocate : Depends upon LocationWillingness for Onsite : YesPAN : ATXPA9120F

Declaration:

I hereby declare that the information provided above is correct and true to the best of my knowledge and believe.

of 6

Date: 09 Jan 2017Place: Pune (Prashant Agrawal)

of 7

Prashant_Agrawal_CV

Documents

Transcript of Prashant_Agrawal_CV