A Big Data Platform for Developers Damien Dallimore
Developer Evangelist at Splunk
© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.
• Developer Evangelist at Splunk since July 2012 • Splunk Community Member
• Splunk for JMX • SplunkJavaLogging • SplunkBase – Apps and Answers
• Splunk Architect and Administrator • Coder
• Been paying my mortgage developing Enterprise Java solutions most of my career • Kia Ora
• I do not have a speech impediment, I am from Aotearoa, so please restrain all your sheep, Lord of the Rings and Kim Dotcom heckles until beer o’clock !!
About me
2
• Overview of the Splunk platform • Splunk for Developers
• Custom Visualization Demo
• Splunk Java SDK
• Spring Integration Splunk Extensions • Integration Adaptors Demo
• Some other JVM/Java related tools • SplunkJavaLogging • Splunk for JMX
• Questions
Agenda
3
What is
• Splunk is an engine for machine data • Provides visibility, reporting and search across
all your IT systems and infrastructure • Doesn’t lock you into a fixed schema
So What is Splunk, Exactly?
5
• It’s software – download and install it in 5 minutes, “freemium” model
• Runs on all modern platforms • Open and extensible architecture
• Capture events from logs in real time • Run scripts to gather system metrics, connect to APIs and databases • Listen to syslog, raw TCP/UDP, gather Windows events • Universally indexes any data format so it doesn’t need adapters, “schema on the fly” • Stream in data directly from your application code • Decode binary data and feed in
Indexes any Machine Data
6
Windows • Registry • Event logs • File system • sysinternals
Linux/Unix • Configurations • Syslog • File system • Ps, iostat, top
Virtualization • Hypervisor • Guest OS • Guest Apps
Applications • Web logs • Log4J, JMS, JMX
• .NET events • Code and scripts
Databases • Configurations • Audit/query logs
• Tables • Schemas
Network • Configurations • syslog • SNMP • netflow
Centralizes Data Across the Environment
7
Indexing/Search Server
Splunk Forwarders
• Splunk Universal Forwarder sends data to Splunk Indexer from remote systems • Uses minimal system resources, easy to install and deploy • Delivers secure, distributed, real-time universal data collection for tens of thousands of endpoints
Scales to TBs/day and Thousands of Users
8
• Automatic load balancing linearly scales indexing • Distributed search and MapReduce linearly scales search and reporting
Provides Strong Machine Data Governance
9
• Provides comprehensive controls for data security, retention and integrity
• Single sign-on integration enables pass-through authentication of user credentials
• Splunk is an implementation of the Map Reduce algorithmic approach • It is not Apache Hadoop MapReduce(MR) the product • Splunk is not agnostic of its underlying data source , optimized to Splunk Index files • Real time vs Batch Jobs • Optimal for time series based data • End to End Integrated Big Data Solution • Fine grained protection of access and data using role based permissions • Data retention and aging controls • Users can submit “Map Reduce” jobs without needing to know how to code a job
• Splunk Search Language vs Pig/Sawzill • But why not get the best of both worlds
• Splunk Hadoop Ops • Splunk Hadoop Connect • Shuttl (archiving to HDFS / S3)
Splunk and Apache Hadoop MR/HDFS
10
• Searching and Reporting (Search Head)
• Indexing and Search Services (Indexer)
• Local and Distributed Management (Deployment Server)
• Data Collection and Forwarding (Forwarder)
Splunk Has Four Primary Functions
11
A Splunk install can be one or all roles…
Agent and Agent-less Approach for Flexibility.
Getting Data into Splunk
12
perf
shell code
Mounted File Systems \\hostname\mount
syslog TCP/UDP
WMI Event Logs Performance
Ac>ve Directory
syslog compa>ble hosts and network devices
Unix, Linux and Windows hosts
Windows hosts Custom apps and scripted API connec>ons
Local File Monitoring log files, config files dumps and trace files
Windows Inputs Event Logs
performance counters registry monitoring
AcAve Directory monitoring
virtual host
Windows hosts
Scripted Inputs shell scripts custom parsers batch loading
Agent-‐less Data Input Splunk Forwarder
• Delivers secure, distributed, real-time universal data collection for 10’s of thousands of endpoints
• Extends Splunk data fabric to large scale private cloud and desktop environments
• Uses minimal system resources, easy to install and deploy
– < half memory and footprint of Splunk 4.1; <1% of single core
Universal Data Forwarder
Scripts
Universal Forwarder Deployment
Logs ConfiguraHons Messages Metrics
Central Deployment Management
13
Forward data without negaHvely impacHng producHon performance.
Monitor files, changes and the system registry; capture metrics and status.
Load balanced search and indexing for massive, linear scale out.
Horizontal Scaling
14
Forwarder Auto Load Balancing
Distributed Search
Index and store locally. Distribute searches to datacenters, networks & geographies.
Multiple Datacenters
15
Headquarters
London Hong Kong Tokyo New York
Distributed Search
Problem InvesHgaHon
Service Desk
Event Console
SIEM
Route raw data in real time or send alerts based on searches.
Send Data to Other Systems
High Availability / DR
17
Combine auto load balancing and data replication.
Splunk Forwarders Auto Load Balancing
Distributed Search
Primary Cluster Secondary Cluster Data Clone
Extend search with lookups to external data sources.
Integrate External Data
18
LDAP, AD Watch Lists
CRM/ERP
CMDB
Correlate IP addresses with locaHons, accounts with regions
Integrate authentication with LDAP and Active Directory.
Integrate Users and Roles
19
Problem InvesHgaHon Problem InvesHgaHon Problem InvesHgaHon
Save Searches
Share Searches
LDAP, AD Users and Groups
Splunk Flexible Roles
Manage Users
Manage Indexes
CapabiliHes & Filters
NOT tag=PCI
App=ERP …
Map LDAP & AD groups to flexible Splunk roles. Define any search as a filter.
Groups, Stacks, and Pools for Enterprise Deployments.
Centralized Licensing Management
20
Problem InvesHgaHon
Keep Tabs On Your Splunk Enterprise Deployment. Deployment Monitoring
21
Forwarders Indexers Sourcetypes Licenses
Real-time Search
22
Data
Parsing Que
ue
Parsing Pipeline • Source, event typing • Character set normalizaHon
• Line breaking • Timestamp idenHficaHon • Regex transforms Indexing
Pipeline
Real-‐Hme Buffer
Raw data Index Files
Index
Real-‐Hme Search Process
Monitor Input
Inde
x Que
ue
TCP/UDP Input
Scripted Input
Real-time Alerting
23
Data Parsing Que
ue
Parsing Pipeline • Source, event typing • Character set
normalizaHon • Line breaking • Timestamp idenHficaHon • Regex transforms Indexing
Pipeline
Real-‐Hme Buffer
Raw data Index Files Index
Real-‐Hme Search Process
Monitor Input
Inde
x Que
ue
TCP/UDP Input
Scripted Input
source=“/var/log/secure.log” “BAD SU”
New Approach to Heterogeneous Data
24
Universal Indexing Search-time Knowledge Flexibility and Fast Time to Value
• No data normalization • Automatically handles
timestamps • Parsers not required • Index every term &
pattern “blindly” • No attempt to
“understand” up front
• Knowledge applied at search-time
• No brittle schema to work around
• Multiple views into the same data
• Splunk helps find transactions, patterns and trends
• Normalization as it’s needed
• Faster implementation • Easy search language • Multiple views into the
same data
Inside Universal Indexing
25
...enable accurate searching and trending by Hme across all data:
AutomaHc event boundary idenHficaHon
AutomaHc Hmestamp normalizaHon
Inside Search-time Knowledge Extraction
26
And user-‐defined fields AutomaHcally discovered fields
... enable staHsHcs and precise search on specific fields:
Inside Search-time Knowledge Extraction
27
Searches saved as event types
Plus tagging of event types, hosts and other fields
... enable normalized reporHng, knowledge sharing and granular access control.
Splunk for Developers
28
Accelerate development & testing Integrate data from Splunk into your existing IT environment for operational visibility Build custom solutions to deliver real-time business insights from Big Data
Splunk & Developers
29
REST API
Custom/Existing
Applications
SDKs Search, chart and graph
Save and schedule searches as alerts Export search results
Manage inputs and indexes Add & remove users and roles
SplunkUI (Splunk Apps)
Machine Data
Engine
• Over 1,000 unique visitors per week to dev.splunk.com • Over 500 followers on Twitter @splunkdev • Over 350 enterprise developer trial licenses granted
Splunk in the Developer Community
Accelerate development & testing
• Splunk frees you from upfront database design for analytics • late binding schema
• Developers and QA/test engineers don’t have to ask IT/Ops to get logs off machines
• Role base access to all data within one console without having to log into production systems
• All events are indexed and accessible in real-time in one place. • Ad-Hoc real-time monitoring and historical investigation searchable from one place • Correlations and insights across multiple tiers.
• Splunk lets you find issues quickly, so you can fix issues quickly • Integrate Splunk search results into testing assertions
How does Splunk Accelerate Dev/Test?
32
StubHub & Splunk
33
“Splunk filled a vacuum we didn’t know we had.” - Nathan Pratt, Tech Lead, Tools & Automation, StubHub
Engineering uses Splunk to investigate bugs QA uses it during dev cycles
High-level view of application errors - used by site operations, engineering, and upper management
• Started with Site Operations to resolve issues
• Grew to engineers, QA, upper management in technology
• Release requirement – Projects are required to certify that all logs are Splunk-friendly
Integrate Splunk into your IT environment
splunkd REST API
Splunk UI (Splunk Apps)
Your application
SDKs
The Splunk development platform is optimized for core enterprise developer skills REST API communicates directly with a Splunk instance for search, management and admin • Provides full control to the developer • Use any language or tool that supports
HTTP SDKs provide broad coverage of the REST API in popular languages • Log directly to Splunk from any app • Build a UI on any web stack • Integrate into existing infrastructure
Integration into existing IT tools
35
• Exposes an API method for every feature in the product • Whatever you can do in the UI – you can do through the API. • Run searches • Manage Splunk configurations
• API is RESTful • Endpoints are served by splunkd • Requests are GET, POST, and DELETE HTTP methods • Responses are Atom XML Feeds
• JSON coming in 5.0 • Search results can be output in CSV/JSON/XML/Raw
Splunk REST API
36
• We want to make it as easy as possible for developers to build Big Data apps on top of the Splunk platform
• Several different language offerings, Software Development Kits (SDKs) • Javascript, Java, Python, PHP, C#(private), Ruby(private)
• All Splunk functionality is accessible via our SDKs • Get Data into Splunk • Execute Splunk Searches, get data out of Splunk • Manage Splunk • Customized User Interfaces
Developer Platform SDKs
37
Comcast & Splunk
38
Content browsed, purchased and
watched All tracked by time and MAC address
Customer profile and MAC address / device assignments
+
Correlate usage and profile data to analyze customer behavior: • Revenues driven by content browsed • Improving local content mix • Better search results • Tailor content promotion
Bosch & Splunk
39
Healthcare Management
Evidence-based Telehealth
Cardiac Rhythm Monitoring
Splunking data sent from ARM-based devices • Uses the Java SDK to send data
to Splunk
Splunk as an integrated, enterprise-ready Big Data platform
• No need to write MapReduce jobs, just get data into Splunk and analyze
• Splunk delivers real-time insight – like clickstream analysis, IT early-warning systems, security and fraud protection
• Late-binding schema allows for faster, more flexible data insight gathering
• Data collection is integrated • Distributed architecture offers scale-out
capabilities with access control • Out-of-the-box reporting and analytics
capabilities • SDKs cover over 170 REST API
endpoints
Splunk = Integrated, Enterprise-‐ready Big Data Plajorm
41
Socialize & Splunk
42
“Splunk eliminates the need to write large MapReduce jobs to get meaningful information out of our data. This means we can get powerful stats and information to our key stakeholders in a fraction of the time.” - Isaac Mosquera, CTO, Socialize
• Splunkweb has rich, but sometimes limited, visualization options
• You can use the SDKs to extract data from Splunk using a search, and visualize it
• Real-time searches can be especially powerful • Using the Javascript SDK you can integrate with third
party charting librarys like Google Charts & D3.
Visualizing Splunk with the SDKs
43
• Twitter feeds being “firehosed” into Splunk and searched over in realtime • Uses the Splunk Javascript SDK to stream the realtime search results from Splunk into
a totally customized web based user interface • Visualization of most popular hashtags with interactive pie chart,word cloud and geo
heatmap using D3
Realtime Twitter Visualization Demo
45
Javascript SDK Browser
Realtime Twitter Demo
46
Splunk Java SDK(Software Development Kit)
47
• Open sourced under the Apache v2.0 license • Clone from Github : git clone https://github.com/splunk/splunk-sdk-java.git • Project level support for Eclipse and Intellij IDE’s • Pre-requisites
• JRE 6+ • Ant ( Maven support is in the works ) • Splunk installed
• Loads of code examples • Project examples folder • Unit Tests • http://dev.splunk.com • http://gist.github.com/damiendallimore
• Comprehensive coverage of the REST API
Get the Java SDK
48
Java SDK Class Model
49
Service
Resource
ResourceCollection Entity
EntityCollection Application Index
HTTPService
Input
InputCollection SavedSearchCollection
• Collections use a common mechanism to create and remove entities • Entities use a common mechanism to retrieve and update property values, and access entity metadata • Service is a wrapper that facilitates access to all Splunk REST endpoints
• Connect and Authenticate • Manage • Input Events • Search
Key Java SDK Use cases
50
Connect and Authenticate
51
public static Service connectAndLoginToSplunkExample() { Map<String, Object> connectionArgs = new HashMap<String, Object>(); connectionArgs.put("host", ”somehost"); connectionArgs.put("username", ”spring"); connectionArgs.put("password", ”integration"); connectionArgs.put("port", 8089); connectionArgs.put("scheme", "https"); // will login and save the session key which gets put in the HTTP Authorization header Service splunkService = Service.connect(connectionArgs); return splunkService;}
Manage
52
public static void getServerInfoExample() { Service splunkService = connectAndLoginToSplunkExample(); ServiceInfo info = splunkService.getInfo(); System.out.println("Info:"); for (String key : info.keySet()) System.out.println(" " + key + ": " + info.get(key)); Entity settings = splunkService.getSettings(); System.out.println("\nSettings:"); for (String key : settings.keySet()) System.out.println(" " + key + ": " + settings.get(key));}
Input Events
53
public static void logEventToSplunkExample() { Service splunkService = connectAndLoginToSplunkExample(); // Get a Receiver object Receiver receiver = splunkService.getReceiver(); // Set the sourcetype Args logArgs = new Args(); logArgs.put("source", ”http-rest"); logArgs.put("sourcetype", ”spring-example"); // Log an event into the spring index receiver.log(”spring", logArgs, ”SpringOne 2GX rocks");}
• Other Input transports • HTTP REST Streaming • Raw TCP Oneshot & Streaming • Raw UDP & Syslog
Search
54
• Search query • a set of commands and functions you use to retrieve events from an index or a real-time stream ,
"search index=spring error OR exception | head 10” • Saved search
• a search query that has been saved to be used again and can be set up to run on a regular schedule • Search job
• an instance of a completed or still-running search operation.Using a search ID you can access the results of the search when they become available. Job results are saved for a period of time on the server and can be retrieved
• Search Modes • Normal : asynchronous , poll job for status and results • Realtime : same as normal, but stream is kept open a results streamed in realtime • Blocking : synchronous , a job handle is returned when search is completed • Oneshot : synchronous , no job handle is returned, results are streamed • Export : synchronous, not a search per say, doesn’t create a job, results are streamed oldest to newest
Blocking Searches
55
public static void exportSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception | head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-1d@d"); queryArgs.put("latest_time", "now"); // perform the export , blocks here InputStream stream = splunkService.export(searchQuery, queryArgs); processInputStream(stream);} public static void simpleSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); String searchQuery = "search error OR exception| head 10"; Args queryArgs = new Args(); queryArgs.put("earliest_time", "-3d@d"); queryArgs.put("latest_time", "-1d@d"); // perform the search , blocks here InputStream stream = splunkService.search(searchQuery, queryArgs); processInputStream(stream);}
Non Blocking Search
56
public static void searchJobExample() { Service splunkService = connectAndLoginToSplunkExample(); String outputMode = "csv";// xml,json,csv // submit the job Job job = splunkService.getJobs().create("search index=spring error OR fatal | head 10"); while (!job.isDone()) { try {Thread.sleep(500);} catch (Exception e) {} } Args outputArgs = new Args(); outputArgs.put("output_mode", outputMode); InputStream stream = job.getResults(outputArgs); processInputStream(stream, outputMode); // uses xml stream, opencsv and gson}
Realtime Search
57
public static void realTimeSearchExample() { Service splunkService = connectAndLoginToSplunkExample(); Args queryArgs = new Args(); queryArgs.put("earliest_time", "rt-5m"); queryArgs.put("latest_time", "rt"); // submit the job Job job = splunkService.getJobs().create("search index=spring exception OR error”, queryArgs);…}
Scala Groovy Clojure
Javascript(Rhino) JRuby PHP(Quercus)
Ceylon Kotlin Jython
Alternate JVM Languages
58
We don’t need SDK’s for these languages , we can just use the Java SDK !
Groovy
59
class SplunkJavaSDKWrapper { static main(args) { //connect and login def connectionParameters = [host:”somehost",username:"spring",password:"integration"] Service service = Service.connect(connectionParameters) //get Splunk Server info ServiceInfo info = service.getInfo() def splunkInfo = [:] for (key in info.keySet()) splunkInfo.put(key,info.get(key)) printSplunkInfo(splunkInfo) } static printSplunkInfo(splunkInfo) { println "Info” splunkInfo.each { key, value ->println key + " : " + value} }}
import com.splunk.Service._import scala.collection.mutable.HashMapimport scala.collection.JavaConversions._object SplunkJavaSDKWrapper { def main(args: Array[String]) = { //connect and login val connectionArgs = HashMap[String, Object]("host" ->”somehost”,"username" ->”me”,"password" ->”foo") val service = connect(connectionArgs) //get Splunk Server info val info = service.getInfo // Scala/Java conversion val javaSet = info.keySet val scalaSet = javaSet.toSet //print out Splunk Server info for (key <- scalaSet) println(key + ":" + info.get(key)) }}
Scala
60
Spring Integration Splunk Extensions
61
Special thanks to Jianwei Li(Jarred) & Mark Pollack for creating this !
• Spring Integration is an extension to core Spring • Based on “Enterprise Integration Patterns” model • Messaging model and Declarative Adaptors • Makes it easier to build integration solutions
Spring Integration
62
• Splunk Java SDK makes it easier to use the REST API • Building on this , the Spring Integration Adaptors make it easier for Spring/Java
developers to declaratively build data integration solutions and utilize the power of the Splunk platform
• https://github.com/SpringSource/spring-integration-extensions
• Inbound Adaptor – Search and export the data from Splunk and push into message channels – Filter, transform, export to other destinations
• Outbound Adaptor – Can consume data acquired by other Integration adaptors(Twitter, JDBC…) and
push it into Splunk for indexing, searching and visualization
Spring Integration Splunk Adaptors
63
Spring Integration Splunk Inbound Adaptor
64
• Blocking, Non Blocking, Saved & Realtime Searches • Exporting
Spring Integration Splunk Outbound Adaptor
65
• HTTP REST Input • TCP Input
XML Configuration
66
<int-splunk:server id="splunkServer" host=”somehost" port="8089" userName=”damien" password=”foobar"/>
<int-splunk:inbound-channel-adapter id="splunkInboundChannelAdapter” auto-startup="true" search="search index=spring error OR exception” splunk-server-ref="splunkServer”channel="inputFromSplunk" mode="blocking" initEarliestTime="-1d"> <int:poller fixed-rate="5" time-unit="SECONDS"/></int-splunk:inbound-channel-adapter>
<int-splunk:outbound-channel-adapter id="splunkOutboundChannelAdapter" auto-startup="true" order="1” channel="outputToSplunkWithMessageStore" splunk-server-ref="splunkServer”pool-server-connection="true" index="spring" sourceType="twitter-feed" source="spring-integration-httprest” ingest="submit"></int-splunk:outbound-channel-adapter>
Common Splunk settings
Searching/exporting from Splunk
Inputting events to Splunk
Spring Integration Splunk Twitter Demo
67
SplunkJavaLogging
68
• A logging framework to allow developers to as seamlessly as possible integrate Splunk best practice logging semantics into their code and transport events directly to Splunk.
• Custom handler/appender implementations(REST and Raw TCP) for the 3 most prevalent Java logging frameworks in play. Splunk events directly from your code.
• LogBack • Log4j • java.util.logging
• Better handling of stacktraces • All code and examples is on Github
SplunkJavaLogging
69
Splunk for JMX
70
• SplunkBase App for monitoring JVM Applications • Out of the box dashboards for JVM level monitoring (java.lang domain)
• Memory , Threading, GC, CPU etc… • Very simple configuration to wire up monitoring of any Mbeans from applications
(Tomcat, Jboss, Cassandra, Coherence etc…) • Hotspot, JRockit, IBMJ9, OpenJDK • Poll JMX attributes and operations , index data over time, correlate with other data • Supports large scale deployments of JVMs • Extensible and Customizable • Many connectivity options
• RMI , IIOP • Direct Process Attachment • MX4J Hessian, Burlap and Soap
• Freely available download from SplunkBase & all code is on Github
Splunk for JMX
71
72
At SpringOne 2GX : • Come by our booth
• Splunk demos ,Q & A • SDK code
• Tee Shirts !!
Web : • Developer Platform : http://dev.splunk.com • SplunkBase : http://splunk-base.splunk.com • Twitter : @splunkdev , @damiendallimore • Email : [email protected] , [email protected] • Blog : http://blogs.splunk.com/dev • Github : http://github.com/splunk • Splunk Live! Events and Online Videos at http://www.splunk.com
Learn More. Stay Connected.
Thanks for coming.
73
Top Related