Prashant_Agrawal_CV
-
Upload
prashant-agrawal -
Category
Documents
-
view
8 -
download
0
Transcript of Prashant_Agrawal_CV
PRASHANT AGRAWALEmail: [email protected]: +91-8097606642
Professional Summary:
● Innovative Software Professional with 5+ years of progressive experience and continued success as a Big Data Analyst.
● Day to day experience in working with Agile/Scrum methodology.● Vast experience in search engine solution viz. E-commerce search, Enterprise
Search, Log Analytics and Monitoring etc.● Hands on Exposure on Log analytics for various logs such as syslog, authlog,
Postfix logs, Router logs, apache logs, netflow logs, Application logs using ELK● Hands on experience on ETL using Spark, Spark Streaming and Spark SQL● Full Text Search Solution with analytics and visualization using Elasticsearch,
Logstash and Kibana.● Good knowledge on distributed computing system such as Hadoop Eco
System, Spark, Flume etc. to analyze the network logs and perform ETL on various data sets.
● Hands on experience on working with HDP (Horton Works) cluster with various components such as Spark, HDFS, Flume, Kafka, Yarn, Oozie, Phoenix, Presto etc.
● Good hands on with SVN, VSS, Git and build tools like Maven.● Hands on experience in developing the product for capturing, intercepting and
monitoring the internet traffic for LEA’s (Law Enforcement Agency’s).● Good exposure in handling the big data (in TB’s) with Elasticsearch using
cluster of 12 nodes at deployment level.● Good exposure on writing the spark application using Scala
Domain and Skill Set:
Domain Engineering and Network forensics, Big Data Analytics , Digital Marketing and Advertisement
Programming Languages Core Java, Scala , C#, PHP
Operating System Mac, Linux (Ubuntu ,Red Hat, Cent OS) , Windows7
Tools /DB/PackagesElasticsearch, Logstash, Kibana, X-pack, Spark, Flume, Oozie, Intellij Idea, Visual Studio 2010, MySQL, Version Control(Perforce, SVN, VSS, GIT), Defect Tracker(Jira, Fogbugz)
Page 1 of 1
Professional Project Details:
Project - 1
Project Name Log Analytics and Visualization using ELK
Team Size 1
Start Date Jan 2016 End Date Till Now
DescriptionThis Project Involves Log analytics using ELK which also caters writing of Elastic queries for E-commerce and Enterprise search
Role & Contribution
● ELK 5.x Setup and Configuration on various OS such as Mac, Linux, Windows
● Migration of data from ELK 2.x to 5.x● Well verse in setting up the various nodes in prod cluster like master, data
and client node.● Implementation of shards and replica (to avoid single node failure) for better
management of indexes.● Preparation of schema and analyzers (through template) to store the data in
elasticsearch● Written Elasticsearch query to support various search features such as Auto
Complete, Synonyms, Grammar Based Search, Exact and Non Exact search, misspelled search, aggregation, Boolean search, aggregations etc.
● Build and development of Logstash plugin to support specific requirement or feature.
● Data extraction using various logstash Input Plugin such as JDBC, File, TCP, UDP, S3 etc.
● Data filtration using Logstash Filter plugin (CSV, Grok, Mutate, Date, Geo etc.)
● Data indexing to Elasticsearch using Logstash output plugin● Used various beats as data shipper to Logstash. Includes various beats such
as File beat, Metric beat etc.● Data visualization and dashboard reporting using Kibana● Setting up backup and restore using snapshot and restore● Fine tuning and optimization of queries in order to get response faster.● Preparation of multi index architecture (Time series Indexing arch) in order
to perform faster search and get the response as quick as possible.
Technologies JSON
Tools/Tool Elasticsearch 5.x/2.x, Logstash 5.x/2.x, Kibana 5.x/4.x, Beats, X-
Page 2 of 2
chain pack, Head Plug-in, Kopf Plug-in, Carrot2, Lingo3g, Putty, Win SCP
Project - 2
Project Name Data Lake Modules in Spark Team Size 1
Start Date June 2016 End Date Till Now
Description This project involves creating a generic data lake solution to migrate the data from one SQL/No SQL database to another.
Role & Contribution
● Created a module on spark which reads data from SQL or Hbase and dumps the same to HDFS as AVRO or ORC
● Module is created in a way where user can specify their input type if its has to be Hbase or SQL and output type as to be ORC or AVRO in HDFS
● Implemented the data lake modules to run periodically using the oozie scheduler, where job runs every hour and dump the data.
● Added a functionality to auto clean up the older dumps which is X version and Y days are old.
● Created an offline index module which acts as secondary index to hbase table with required fields only to speedup the select query.
Technologies Scala
Tools/Tool chain
Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Hbase , Hive , Maven, Git
Project - 3
Project Name Spark ETL for fact and dim types of data
Team Size 2
Page 3 of 3
Start Date Jan 2016 End Date May 2016
DescriptionThis project involves ETL processing with Spark which involves pulling up the information from a variety of sources, transforms the data, and then pushes to Presto for OLTP/OLAP analytics for dim and fact type of data.
Role & Contribution
● Consume the dim/fact Kafka message in spark using Kafka consumer which are produced by Kafka producer
● Extract the messages consumed by Kafka consumer● Perform business logic and transformation on those messages● Push the transformed data to either Hive or Presto for further OLTP and
OLAP analytics
Technologies Scala
Tools/Tool chain
Maven, Git, Kafka, Spark, Spark Streaming and Spark SQL, Phoenix, Presto
Project - 4
Project Name Deployment of Elasticsearch to handle big data in IIMS
Team Size 2
Start Date Feb 2014 End Date Dec 2015
DescriptionThis Project Involves deployment of Elasticsearch to handle data in GB’s (in a day)
Role & Contribution
● Deployment architecture preparation to handle such a big data using cluster with 12 nodes.
● Data visualization and dashboard loading using Kibana● Well verse in setting up the various nodes in deployment
such as Master, Data and client node.● Hands on experience with various plug-in such as Mapper
Attachment, Head, Big desk, Carrot2 (with lingo3g categorization algorithm)
● Preparation of schema to store the data in Elasticsearch● Preparation of search query to be used for retrieval of data
from Elasticsearch using Query String, Match, Boolean, Aggregation etc.
● Though its big data still backup and restore is required in case of any failure hence setup the Snapshot and restore process to take daily backup of the same.
Page 4 of 4
● Implementation of shards and replica (to avoid single node failure)
● Fine tuning and optimization of queries in order to get response faster.
● Preparation of multi index architecture in order to perform faster search and get the response as quick as possible.
● Implementation of feature like Synonyms, Stemming, Grammar extension, wild card search etc.
Technologies C#, Elasticsearch(NoSQL Database)
Tools/Tool chain
Elasticsearch 1.5.x, Mapper Attachment, Head Plug-in, Big desk Plug-in, Carrot2, Lingo3g, Putty, WinSCP
Project – 5
Project Name Big Data Platform Development Team Size 2
Start Date June 2015 End Date Dec 2015
Description
This project involves processing of the Logs being generated from various system and devices and perform predictive analysis to form the pattern and trace the attacking device or user
Role & Contribution
● Collecting the high speed logs coming from various devices and inject the same using Flume
● Pass on the logs data from flume to spark streaming and spark SQL so as to store the same onto Memory (As Spark is being known for in memory data processing)
● Perform predictive analysis onto the log as per the defined algorithm, also perform the computation with self derived algorithm as well
● Persist the log, in memory for specific time duration using spark streaming and then persist the same permanently to Elastic
● Performed all above operation and computation using distributed computing. Which includes setting up the 5 node cluster of HDFS using Horton Works Development Platform
Technologies Core Java, Flume, HDFS, HDP Clustering, Spark, Spark Streaming and Spark SQL, Elasticsearch
Page 5 of 5
Tools/Tool chain Maven, Git, MySQL
Educational Qualifications:
Course Board/University Year of Passing Percentage10th CBSE 2005 79.2012th CBSE 2007 76.00B.E.(Computer Science)
RGPV Bhopal 2011 76.44
GATE - 2011 91 PercentileCAT - 2010 85 Percentile
Personal Profile:
Date of Birth : 23rd Nov, 1989Passport No : J3031277Willing to re-allocate : Depends upon LocationWillingness for Onsite : YesPAN : ATXPA9120F
Declaration:
I hereby declare that the information provided above is correct and true to the best of my knowledge and believe.
Page 6 of 6
Date: 09 Jan 2017Place: Pune (Prashant Agrawal)
Page 7 of 7