Splunk as a_big_data_platform_for_developers_spring_one2gx

Post on 27-Jan-2015

104 views 0 download

description

 

Transcript of Splunk as a_big_data_platform_for_developers_spring_one2gx

A Big Data Platform for Developers Damien Dallimore

Developer Evangelist at Splunk

© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.

•  Developer Evangelist at Splunk since July 2012 •  Splunk Community Member

•  Splunk for JMX •  SplunkJavaLogging •  SplunkBase – Apps and Answers

•  Splunk Architect and Administrator •  Coder

•  Been paying my mortgage developing Enterprise Java solutions most of my career •  Kia Ora

•  I do not have a speech impediment, I am from Aotearoa, so please restrain all your sheep, Lord of the Rings and Kim Dotcom heckles until beer o’clock !!

About me

2

•  Overview of the Splunk platform •  Splunk for Developers

•  Custom Visualization Demo

•  Splunk Java SDK

•  Spring Integration Splunk Extensions •  Integration Adaptors Demo

•  Some other JVM/Java related tools •  SplunkJavaLogging •  Splunk for JMX

•  Questions

Agenda

3

What is

•  Splunk is an engine for machine data •  Provides visibility, reporting and search across

all your IT systems and infrastructure •  Doesn’t lock you into a fixed schema

So What is Splunk, Exactly?

5  

•  It’s software – download and install it in 5 minutes, “freemium” model

•  Runs on all modern platforms •  Open and extensible architecture

•  Capture events from logs in real time •  Run scripts to gather system metrics, connect to APIs and databases •  Listen to syslog, raw TCP/UDP, gather Windows events •  Universally indexes any data format so it doesn’t need adapters, “schema on the fly” •  Stream in data directly from your application code •  Decode binary data and feed in

Indexes any Machine Data

6  

Windows • Registry • Event logs • File system •  sysinternals

Linux/Unix • Configurations • Syslog • File system • Ps, iostat, top

Virtualization • Hypervisor • Guest OS • Guest Apps

Applications • Web logs • Log4J, JMS, JMX

•  .NET events • Code and scripts

Databases • Configurations • Audit/query logs

• Tables • Schemas

Network • Configurations •  syslog • SNMP • netflow

Centralizes Data Across the Environment

7  

Indexing/Search  Server  

Splunk  Forwarders  

•  Splunk Universal Forwarder sends data to Splunk Indexer from remote systems •  Uses minimal system resources, easy to install and deploy •  Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints

Scales to TBs/day and Thousands of Users

8  

•  Automatic load balancing linearly scales indexing •  Distributed search and MapReduce linearly scales search and reporting

Provides Strong Machine Data Governance

9  

•  Provides comprehensive controls for data security, retention and integrity

•  Single sign-on integration enables pass-through authentication of user credentials

•  Splunk is an implementation of the Map Reduce algorithmic approach •  It is not Apache Hadoop MapReduce(MR) the product •  Splunk is not agnostic of its underlying data source , optimized to Splunk Index files •  Real time vs Batch Jobs •  Optimal for time series based data •  End to End Integrated Big Data Solution •  Fine grained protection of access and data using role based permissions •  Data retention and aging controls •  Users can submit “Map Reduce” jobs without needing to know how to code a job

•  Splunk Search Language vs Pig/Sawzill •  But why not get the best of both worlds

•  Splunk Hadoop Ops •  Splunk Hadoop Connect •  Shuttl (archiving to HDFS / S3)

Splunk and Apache Hadoop MR/HDFS

10

•  Searching and Reporting (Search Head)

•  Indexing and Search Services (Indexer)

•  Local and Distributed Management (Deployment Server)

•  Data Collection and Forwarding (Forwarder)

Splunk Has Four Primary Functions

11  

A  Splunk  install  can  be  one  or  all  roles…    

Agent and Agent-less Approach for Flexibility.

Getting Data into Splunk

12  

perf  

shell  code  

Mounted  File  Systems  \\hostname\mount  

syslog  TCP/UDP  

WMI  Event  Logs  Performance  

Ac>ve    Directory  

syslog  compa>ble  hosts  and  network  devices  

Unix,  Linux  and  Windows  hosts  

Windows  hosts   Custom  apps  and  scripted  API  connec>ons  

Local  File  Monitoring  log  files,  config  files  dumps  and  trace  files  

Windows  Inputs  Event  Logs  

performance  counters  registry  monitoring  

AcAve  Directory  monitoring  

virtual  host  

Windows  hosts  

Scripted  Inputs  shell  scripts  custom  parsers  batch  loading  

 

Agent-­‐less  Data  Input   Splunk  Forwarder  

•  Delivers secure, distributed, real-time universal data collection for 10’s of thousands of endpoints

•  Extends Splunk data fabric to large scale private cloud and desktop environments

•  Uses minimal system resources, easy to install and deploy

–  < half memory and footprint of Splunk 4.1; <1% of single core

Universal Data Forwarder

Scripts  

Universal  Forwarder  Deployment  

Logs   ConfiguraHons  Messages   Metrics  

Central  Deployment  Management  

13

Forward  data  without  negaHvely  impacHng  producHon  performance.  

Monitor  files,  changes  and  the  system  registry;  capture  metrics  and  status.  

Load balanced search and indexing for massive, linear scale out.

Horizontal Scaling

14  

Forwarder      Auto  Load  Balancing  

Distributed  Search  

Index and store locally. Distribute searches to datacenters, networks & geographies.

Multiple Datacenters

15  

Headquarters  

London   Hong  Kong   Tokyo   New  York  

Distributed Search

Problem  InvesHgaHon  

Service  Desk  

Event  Console  

SIEM  

Route raw data in real time or send alerts based on searches.

Send Data to Other Systems

High Availability / DR

17

Combine auto load balancing and data replication.

Splunk  Forwarders  Auto  Load  Balancing  

Distributed  Search  

Primary  Cluster   Secondary  Cluster  Data  Clone  

Extend search with lookups to external data sources.

Integrate External Data

18  

LDAP,  AD   Watch    Lists  

CRM/ERP  

CMDB  

Correlate  IP  addresses  with  locaHons,  accounts  with  regions  

Integrate authentication with LDAP and Active Directory.

Integrate Users and Roles

19  

Problem  InvesHgaHon   Problem  InvesHgaHon   Problem  InvesHgaHon  

Save  Searches  

Share  Searches  

LDAP,  AD    Users  and  Groups  

Splunk  Flexible  Roles  

Manage  Users  

Manage  Indexes  

CapabiliHes  &  Filters  

NOT  tag=PCI  

App=ERP   …  

Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.

Groups, Stacks, and Pools for Enterprise Deployments.

Centralized Licensing Management

20  

Problem  InvesHgaHon  

Keep Tabs On Your Splunk Enterprise Deployment. Deployment Monitoring

21  

Forwarders  Indexers  Sourcetypes  Licenses  

Real-time Search

22  

Data  

Parsing  Que

ue  

Parsing  Pipeline  •  Source,  event  typing  •  Character  set  normalizaHon  

•  Line  breaking  •  Timestamp  idenHficaHon  •  Regex  transforms   Indexing  

Pipeline  

Real-­‐Hme  Buffer  

Raw  data  Index  Files  

 Index  

Real-­‐Hme  Search  Process  

Monitor  Input  

Inde

x  Que

ue  

TCP/UDP  Input  

Scripted  Input  

Real-time Alerting

23  

Data  Parsing  Que

ue  

Parsing  Pipeline  •  Source,  event  typing  •  Character  set  

normalizaHon  •  Line  breaking  •  Timestamp  idenHficaHon  •  Regex  transforms   Indexing  

Pipeline  

Real-­‐Hme  Buffer  

Raw  data  Index  Files   Index  

Real-­‐Hme  Search  Process  

Monitor  Input  

Inde

x  Que

ue  

TCP/UDP  Input  

Scripted  Input  

source=“/var/log/secure.log”  “BAD  SU”  

New Approach to Heterogeneous Data

24  

Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value

• No data normalization • Automatically handles

timestamps • Parsers not required •  Index every term &

pattern “blindly” • No attempt to

“understand” up front

• Knowledge applied at search-time

• No brittle schema to work around

• Multiple views into the same data

• Splunk helps find transactions, patterns and trends

• Normalization as it’s needed

• Faster implementation • Easy search language • Multiple views into the

same data

Inside Universal Indexing

25  

...enable  accurate  searching  and  trending  by  Hme  across  all  data:  

AutomaHc  event  boundary  idenHficaHon  

AutomaHc  Hmestamp  normalizaHon  

Inside Search-time Knowledge Extraction

26  

And  user-­‐defined  fields  AutomaHcally  discovered  fields  

...  enable  staHsHcs  and  precise  search  on  specific  fields:  

Inside Search-time Knowledge Extraction

27  

Searches  saved  as  event  types  

Plus  tagging  of  event  types,  hosts  and  other  fields  

...  enable  normalized  reporHng,  knowledge  sharing  and  granular  access  control.  

Splunk for Developers

28

Accelerate development & testing Integrate data from Splunk into your existing IT environment for operational visibility Build custom solutions to deliver real-time business insights from Big Data

Splunk  &  Developers  

29

REST API

Custom/Existing

Applications

SDKs Search, chart and graph

Save and schedule searches as alerts Export search results

Manage inputs and indexes Add & remove users and roles

SplunkUI (Splunk Apps)

Machine  Data  

Engine

•  Over 1,000 unique visitors per week to dev.splunk.com •  Over 500 followers on Twitter @splunkdev •  Over 350 enterprise developer trial licenses granted

Splunk in the Developer Community

Accelerate development & testing

•  Splunk frees you from upfront database design for analytics •  late binding schema

•  Developers and QA/test engineers don’t have to ask IT/Ops to get logs off machines

•  Role base access to all data within one console without having to log into production systems

•  All events are indexed and accessible in real-time in one place. •  Ad-Hoc real-time monitoring and historical investigation searchable from one place •  Correlations and insights across multiple tiers.

•  Splunk lets you find issues quickly, so you can fix issues quickly •  Integrate Splunk search results into testing assertions

How does Splunk Accelerate Dev/Test?

32

StubHub & Splunk

33

“Splunk  filled  a  vacuum  we  didn’t  know  we  had.” - Nathan Pratt, Tech Lead, Tools & Automation, StubHub

Engineering uses Splunk to investigate bugs QA uses it during dev cycles

High-level view of application errors - used by site operations, engineering, and upper management

•  Started with Site Operations to resolve issues

•  Grew to engineers, QA, upper management in technology

•  Release requirement – Projects are required to certify that all logs are Splunk-friendly

Integrate Splunk into your IT environment

splunkd REST API

Splunk UI (Splunk Apps)

Your application

SDKs

The Splunk development platform is optimized for core enterprise developer skills REST API communicates directly with a Splunk instance for search, management and admin •  Provides full control to the developer •  Use any language or tool that supports

HTTP SDKs provide broad coverage of the REST API in popular languages •  Log directly to Splunk from any app •  Build a UI on any web stack •  Integrate into existing infrastructure

Integration into existing IT tools

35  

•  Exposes an API method for every feature in the product •  Whatever you can do in the UI – you can do through the API. •  Run searches •  Manage Splunk configurations

•  API is RESTful •  Endpoints are served by splunkd •  Requests are GET, POST, and DELETE HTTP methods •  Responses are Atom XML Feeds

•  JSON coming in 5.0 •  Search results can be output in CSV/JSON/XML/Raw

Splunk REST API

36

•  We want to make it as easy as possible for developers to build Big Data apps on top of the Splunk platform

•  Several different language offerings, Software Development Kits (SDKs) •  Javascript, Java, Python, PHP, C#(private), Ruby(private)

•  All Splunk functionality is accessible via our SDKs •  Get Data into Splunk •  Execute Splunk Searches, get data out of Splunk •  Manage Splunk •  Customized User Interfaces

Developer Platform SDKs

37

Comcast & Splunk

38

Content browsed, purchased and

watched All tracked by time and MAC address

Customer profile and MAC address / device assignments

+

Correlate usage and profile data to analyze customer behavior: •  Revenues driven by content browsed •  Improving local content mix •  Better search results •  Tailor content promotion

Bosch & Splunk

39

Healthcare Management

Evidence-based Telehealth

Cardiac Rhythm Monitoring

Splunking data sent from ARM-based devices •  Uses the Java SDK to send data

to Splunk

Splunk as an integrated, enterprise-ready Big Data platform

•  No need to write MapReduce jobs, just get data into Splunk and analyze

•  Splunk delivers real-time insight – like clickstream analysis, IT early-warning systems, security and fraud protection

•  Late-binding schema allows for faster, more flexible data insight gathering

•  Data collection is integrated •  Distributed architecture offers scale-out

capabilities with access control •  Out-of-the-box reporting and analytics

capabilities •  SDKs cover over 170 REST API

endpoints

Splunk  =  Integrated,  Enterprise-­‐ready  Big  Data  Plajorm  

41  

Socialize & Splunk

42

“Splunk eliminates the need to write large MapReduce jobs to get meaningful information out of our data. This means we can get powerful stats and information to our key stakeholders in a fraction of the time.” - Isaac Mosquera, CTO, Socialize

•  Splunkweb has rich, but sometimes limited, visualization options

•  You can use the SDKs to extract data from Splunk using a search, and visualize it

•  Real-time searches can be especially powerful •  Using the Javascript SDK you can integrate with third

party charting librarys like Google Charts & D3.

Visualizing Splunk with the SDKs

43

•  Twitter feeds being “firehosed” into Splunk and searched over in realtime •  Uses the Splunk Javascript SDK to stream the realtime search results from Splunk into

a totally customized web based user interface •  Visualization of most popular hashtags with interactive pie chart,word cloud and geo

heatmap using D3

Realtime Twitter Visualization Demo

45

Javascript SDK Browser

Realtime Twitter Demo

46

Splunk Java SDK(Software Development Kit)

47

•  Open sourced under the Apache v2.0 license •  Clone from Github : git clone https://github.com/splunk/splunk-sdk-java.git •  Project level support for Eclipse and Intellij IDE’s •  Pre-requisites

•  JRE 6+ •  Ant ( Maven support is in the works ) •  Splunk installed

•  Loads of code examples •  Project examples folder •  Unit Tests •  http://dev.splunk.com •  http://gist.github.com/damiendallimore

•  Comprehensive coverage of the REST API

Get the Java SDK

48

Java SDK Class Model

49

Service

Resource

ResourceCollection Entity

EntityCollection Application Index

HTTPService

Input

InputCollection SavedSearchCollection

•  Collections use a common mechanism to create and remove entities •  Entities use a common mechanism to retrieve and update property values, and access entity metadata •  Service is a wrapper that facilitates access to all Splunk REST endpoints

•  Connect and Authenticate •  Manage •  Input Events •  Search

Key Java SDK Use cases

50

Connect and Authenticate

51

public static Service connectAndLoginToSplunkExample() { Map<String, Object> connectionArgs = new HashMap<String, Object>(); connectionArgs.put("host", ”somehost"); connectionArgs.put("username", ”spring"); connectionArgs.put("password", ”integration"); connectionArgs.put("port", 8089); connectionArgs.put("scheme", "https"); // will login and save the session key which gets put in the HTTP Authorization header Service splunkService = Service.connect(connectionArgs); return splunkService;}

Manage

52

public static void getServerInfoExample() { Service splunkService = connectAndLoginToSplunkExample(); ServiceInfo info = splunkService.getInfo(); System.out.println("Info:"); for (String key : info.keySet()) System.out.println(" " + key + ": " + info.get(key)); Entity settings = splunkService.getSettings(); System.out.println("\nSettings:"); for (String key : settings.keySet()) System.out.println(" " + key + ": " + settings.get(key));}

Input Events

53

public static void logEventToSplunkExample() { Service splunkService = connectAndLoginToSplunkExample(); // Get a Receiver object Receiver receiver = splunkService.getReceiver(); // Set the sourcetype Args logArgs = new Args(); logArgs.put("source", ”http-rest"); logArgs.put("sourcetype", ”spring-example"); // Log an event into the spring index receiver.log(”spring", logArgs, ”SpringOne 2GX rocks");}

•  Other Input transports •  HTTP REST Streaming •  Raw TCP Oneshot & Streaming •  Raw UDP & Syslog

Search

54

•  Search query •  a set of commands and functions you use to retrieve events from an index or a real-time stream ,

"search index=spring error OR exception | head 10” •  Saved search

•  a search query that has been saved to be used again and can be set up to run on a regular schedule •  Search job

•  an instance of a completed or still-running search operation.Using a search ID you can access the results of the search when they become available. Job results are saved for a period of time on the server and can be retrieved

•  Search Modes •  Normal : asynchronous , poll job for status and results •  Realtime : same as normal, but stream is kept open a results streamed in realtime •  Blocking : synchronous , a job handle is returned when search is completed •  Oneshot : synchronous , no job handle is returned, results are streamed •  Export : synchronous, not a search per say, doesn’t create a job, results are streamed oldest to newest

Blocking Searches

55

public static void exportSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception | head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-1d@d"); queryArgs.put("latest_time", "now"); // perform the export , blocks here InputStream stream = splunkService.export(searchQuery, queryArgs); processInputStream(stream);} public static void simpleSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception| head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-3d@d"); queryArgs.put("latest_time", "-1d@d"); // perform the search , blocks here InputStream stream = splunkService.search(searchQuery, queryArgs); processInputStream(stream);}

Non Blocking Search

56

public static void searchJobExample() { Service splunkService = connectAndLoginToSplunkExample(); String outputMode = "csv";// xml,json,csv // submit the job Job job = splunkService.getJobs().create("search index=spring error OR fatal | head 10"); while (!job.isDone()) { try {Thread.sleep(500);} catch (Exception e) {} } Args outputArgs = new Args(); outputArgs.put("output_mode", outputMode); InputStream stream = job.getResults(outputArgs); processInputStream(stream, outputMode); // uses xml stream, opencsv and gson}

Realtime Search

57

public static void realTimeSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); Args queryArgs = new Args(); queryArgs.put("earliest_time", "rt-5m"); queryArgs.put("latest_time", "rt"); // submit the job Job job = splunkService.getJobs().create("search index=spring exception OR error”, queryArgs);…}

Scala Groovy Clojure

Javascript(Rhino) JRuby PHP(Quercus)

Ceylon Kotlin Jython

Alternate JVM Languages

58

We don’t need SDK’s for these languages , we can just use the Java SDK !

Groovy

59

class SplunkJavaSDKWrapper { static main(args) { //connect and login def connectionParameters = [host:”somehost",username:"spring",password:"integration"] Service service = Service.connect(connectionParameters) //get Splunk Server info ServiceInfo info = service.getInfo() def splunkInfo = [:] for (key in info.keySet()) splunkInfo.put(key,info.get(key)) printSplunkInfo(splunkInfo) } static printSplunkInfo(splunkInfo) { println "Info” splunkInfo.each { key, value ->println key + " : " + value} }}

import com.splunk.Service._import scala.collection.mutable.HashMapimport scala.collection.JavaConversions._object SplunkJavaSDKWrapper { def main(args: Array[String]) = { //connect and login val connectionArgs = HashMap[String, Object]("host" ->”somehost”,"username" ->”me”,"password" ->”foo") val service = connect(connectionArgs) //get Splunk Server info val info = service.getInfo // Scala/Java conversion val javaSet = info.keySet val scalaSet = javaSet.toSet //print out Splunk Server info for (key <- scalaSet) println(key + ":" + info.get(key)) }}

Scala

60

Spring Integration Splunk Extensions

61

Special thanks to Jianwei Li(Jarred) & Mark Pollack for creating this !

•  Spring Integration is an extension to core Spring •  Based on “Enterprise Integration Patterns” model •  Messaging model and Declarative Adaptors •  Makes it easier to build integration solutions

Spring Integration

62

•  Splunk Java SDK makes it easier to use the REST API •  Building on this , the Spring Integration Adaptors make it easier for Spring/Java

developers to declaratively build data integration solutions and utilize the power of the Splunk platform

•  https://github.com/SpringSource/spring-integration-extensions

•  Inbound Adaptor –  Search and export the data from Splunk and push into message channels –  Filter, transform, export to other destinations

•  Outbound Adaptor –  Can consume data acquired by other Integration adaptors(Twitter, JDBC…) and

push it into Splunk for indexing, searching and visualization

Spring Integration Splunk Adaptors

63

Spring Integration Splunk Inbound Adaptor

64

•  Blocking, Non Blocking, Saved & Realtime Searches •  Exporting

Spring Integration Splunk Outbound Adaptor

65

•  HTTP REST Input •  TCP Input

XML Configuration

66

<int-splunk:server id="splunkServer" host=”somehost" port="8089" userName=”damien" password=”foobar"/>

<int-splunk:inbound-channel-adapter id="splunkInboundChannelAdapter” auto-startup="true" search="search index=spring error OR exception” splunk-server-ref="splunkServer”channel="inputFromSplunk" mode="blocking" initEarliestTime="-1d"> <int:poller fixed-rate="5" time-unit="SECONDS"/></int-splunk:inbound-channel-adapter>

<int-splunk:outbound-channel-adapter id="splunkOutboundChannelAdapter" auto-startup="true" order="1” channel="outputToSplunkWithMessageStore" splunk-server-ref="splunkServer”pool-server-connection="true" index="spring" sourceType="twitter-feed" source="spring-integration-httprest” ingest="submit"></int-splunk:outbound-channel-adapter>

Common Splunk settings

Searching/exporting from Splunk

Inputting events to Splunk

Spring Integration Splunk Twitter Demo

67

SplunkJavaLogging

68

•  A logging framework to allow developers to as seamlessly as possible integrate Splunk best practice logging semantics into their code and transport events directly to Splunk.

•  Custom handler/appender implementations(REST and Raw TCP) for the 3 most prevalent Java logging frameworks in play. Splunk events directly from your code.

•  LogBack •  Log4j •  java.util.logging

•  Better handling of stacktraces •  All code and examples is on Github

SplunkJavaLogging

69

Splunk for JMX

70

•  SplunkBase App for monitoring JVM Applications •  Out of the box dashboards for JVM level monitoring (java.lang domain)

•  Memory , Threading, GC, CPU etc… •  Very simple configuration to wire up monitoring of any Mbeans from applications

(Tomcat, Jboss, Cassandra, Coherence etc…) •  Hotspot, JRockit, IBMJ9, OpenJDK •  Poll JMX attributes and operations , index data over time, correlate with other data •  Supports large scale deployments of JVMs •  Extensible and Customizable •  Many connectivity options

•  RMI , IIOP •  Direct Process Attachment •  MX4J Hessian, Burlap and Soap

•  Freely available download from SplunkBase & all code is on Github

Splunk for JMX

71

72

At SpringOne 2GX : •  Come by our booth

•  Splunk demos ,Q & A •  SDK code

•  Tee Shirts !!

Web : •  Developer Platform : http://dev.splunk.com •  SplunkBase : http://splunk-base.splunk.com •  Twitter : @splunkdev , @damiendallimore •  Email : devinfo@splunk.com , ddallimore@splunk.com •  Blog : http://blogs.splunk.com/dev •  Github : http://github.com/splunk •  Splunk Live! Events and Online Videos at http://www.splunk.com

Learn More. Stay Connected.

Thanks for coming.

73