Data Onboarding

46
Copyright © 2014 Splunk Inc. Data Onboarding Ingestion without the Indigestion Jeff Meyers Sales Engineer

Transcript of Data Onboarding

Page 1: Data Onboarding

Copyright  ©  2014  Splunk  Inc.  

Data Onboarding

Ingestion without the

Indigestion

Jeff  Meyers  Sales  Engineer  

Page 2: Data Onboarding

•  Major  components  involved  in  data  indexing  

•  What  happens  to  data  within  Splunk  

•  What  the  data  pipeline  is  &  how  to  influence  it  •  Shaping  data  understanding  via  props.conf  •  Configuring  data  inputs  via  inputs.conf  

•  What  goes  where  

•  Heavy  Forwarders  vs.  Universal  Forwarders  •  How  to  get  your  data  into  Splunk  (mostly  correctly)    

~  60  minutes  from  now...  

Page 3: Data Onboarding

•  SystemaMc  way  to  bring  new  data  sources  into  Splunk  

•  Make  sure  that  new  data  is  instantly  usable    

&  has  maximum  value  for  users  

•  Goes  hand-­‐in-­‐hand  with  the  User  Onboarding  process  (sold  separately)  

 

What  is  the  Data  Onboarding  Process?  

Page 4: Data Onboarding

4  

Machine Data > Business Value Index  Untapped  Data:  Any  Source,  Type,  Volume  

Online  Services   Web  

Services  

Servers  Security   GPS  

LocaMon  

Storage  Desktops  

Networks  

Packaged  ApplicaMons  

Custom  ApplicaMons  Messaging  

Telecoms  Online  

Shopping  Cart  

Web  Clickstream

s  

Databases  

Energy  Meters  

Call  Detail  Records  

Smartphones  and  Devices  

RFID  

On-­‐  Premises  

Private    Cloud  

Public    Cloud  

 Ask  Any  QuesMon  

ApplicaMon  Delivery  

Security,  Compliance  and  Fraud  

IT  OperaMons  

Business  AnalyMcs  

Industrial  Data  and  the  Internet  of  Things  

Page 5: Data Onboarding

Flavors of Machine Data

Order  Processing  

TwiRer  

Care  IVR  

Middleware    Error  

Page 6: Data Onboarding

Getting Data Into Splunk

6

Agent  and  Agent-­‐less  Approach  for  Flexibility  

perf  

shell  code  

Mounted  File  Systems  \\hostname\mount  

syslog  TCP/UDP  

WMI  Event  Logs  Performance  

AcMve    Directory  

syslog  compaMble  hosts  and  network  devices  

Unix,  Linux  and  Windows  hosts  

Windows  hosts   Custom  apps  and  scripted  API  connecMons  

Local  File  Monitoring  log  files,  config  files  dumps  and  trace  files  

Windows  Inputs  Event  Logs  

performance  counters  registry  monitoring  

AcAve  Directory  monitoring  

virtual  host  

Windows  hosts  

Scripted  Inputs  shell  scripts  custom  parsers  batch  loading  

 

Agent-­‐less  Data  Input   Splunk  Forwarder  

Page 7: Data Onboarding

Splunk  Data  Ingest  

UF   UF   HF   UF  

IDX  

SH  

Splunk  Enterprise    (with  opMonal  configs)  

Splunk  Universal  Forwarder  

Summary:  when  it  comes  to  "core"  Splunk,  there  are  two  dis8nct  products:  Splunk  Universal  Forwarder  and  Splunk  Enterprise.      "Everything  else"  –  Indexer,  Search  Head,  License  Server,  Deployment  Server,  Cluster  Master,  Deployer,  Heavy  Forwarder,  etc.  are  all  instances  of  Splunk  Enterprise  with  varying  configs.  

Page 8: Data Onboarding

Data Pipeline (what the what?)

Page 9: Data Onboarding

The  Data  Pipeline  

Page 10: Data Onboarding

The  Data  Pipeline  

Any  QuesMons?  

Page 11: Data Onboarding

The  Data  Pipeline  

Page 12: Data Onboarding

•  Input  Processors:  Monitor,  FIFO,  UDP,  TCP,  Scripted            

•  No  events  yet-­‐-­‐  just  a  stream  of  bytes  

•  Break  data  stream  into  64KB  blocks  

•  Annotate  stream  with  metadata  keys  (host,  source,  sourcetype,  index,  etc.)  

•  Can  happen  on  UF,  HF  or  indexer  

Inputs–  Where  it  all  starts  

Page 13: Data Onboarding

• Check  character  set  

• Break  lines  

• Process  headers  

• Can  happen  on  HF  or  indexer  

Parsing  

Page 14: Data Onboarding

•  Merge  lines  for  mulM-­‐line  events  

•  IdenMfy  events  (finally!)  

•  Extract  Mmestamps  

•  Exclude  events  based  on  Mmestamp  (MAX_DAYS_AGO,  ..)  

•  Can  happen  on  HF  or  indexer  

AggregaMon/Merging  

Page 15: Data Onboarding

•  Do  regex  replacement  (field  extracMon,  punctuaMon  extracMon,  event  rouMng,  host/source/sourcetype  overrides)  

•  Annotate  events  with  metadata  keys    (host,  source,  sourcetype,  ..)  

•  Can  happen  on  HF  or  indexer  

Typing  

Page 16: Data Onboarding

• Output  processors:  TCP,  syslog,  HTTP  •  indexAndForward  •  Sign  blocks  •  Calculate  license  volume  and  throughput  metrics  •  Index  •  [Write  to  disk  ]  /  [forward  elsewhere]  /  ...  •  Can  happen  on  HF  or  indexer  

Indexing  

Page 17: Data Onboarding

The  Data  Pipeline  

Page 18: Data Onboarding

Data  Pipeline:  UF  &  Indexer  

Page 19: Data Onboarding

Data  Pipeline:  HF  &  Indexer  

Page 20: Data Onboarding

Data  Pipeline:  UF,  IF  &  Indexer  

Page 21: Data Onboarding

UF  vs.  HF  

209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:16]  "GET  /oldlink?itemId=EST-­‐6&JSESSIONID=SD0SL6FF7AD...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:17]  "GET  /product.screen?productId=BS-­‐AG-­‐G09&JSESSION...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:19]  "POST  /category.screen?categoryId=STRATEGY&JSESSI...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:20]  "GET  /product.screen?productId=FS-­‐SG-­‐G03&JSESSION...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:20]  "POST  /cart.do?acMon=addtocart&itemId=EST-­‐21&pro...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:21]  "POST  /cart.do?acMon=purchase&itemId=EST-­‐21&JSES...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:22]  "POST  /cart/success.do?JSESSIONID=SD0SL6FF7ADFF49...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:21]  "GET  /cart.do?acMon=remove&itemId=EST-­‐11&product...  209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:22]  "GET  /oldlink?itemId=EST-­‐14&JSESSIONID=SD0SL6FF7A...  112.111.162.4  -­‐  -­‐  [23/Feb/2016:18:26:36]  "GET  /product.screen?productId=WC-­‐SH-­‐G04&JSESSION...  

209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:16]  "GET  /oldlink?itemId=EST-­‐6&JSESSIONID=SD0SL6FF7AD...  

209.160.24.63  -­‐  -­‐  [23/Feb/2016:18:22:17]  "GET  /product.screen?productId=BS-­‐AG-­‐G09&SSN=xxxyyyzzz...  

sourcetype=access_combined,  _8me=1456251739,  index=foo,  host=bar,  …  

sourcetype=access_combined,  _8me=1456251739,  index=foo,  host=bar,  …  

sourcetype=access_combined,  index=foo,  host=bar,  …  

UF    

HF    emits  events  

emits  chunks  of  data  

Page 22: Data Onboarding

Splunk  Data  Ingest  

UF   UF   HF   UF  

IDX  

SH  

Parsing  

Not  Parsing  

Note:  the  data  is  parsed  at  the  first  component  that  has  a  parsing  engine  –  and  not  again    This  effects  where  you  put  certain  props.conf  and  transforms.conf  files  (a.k.a.  some8mes  they  go  on  the  forwarder)  

Page 23: Data Onboarding

Data Onboarding Process (bringing it together)

Page 24: Data Onboarding

•  IdenMfy  the  specific  sourcetype(s)  -­‐  onboard  each  separately  •  Check  for  pre-­‐exisMng  app/TA  on  splunk.com-­‐-­‐  don't  reinvent  the  wheel!  •  Gather  info  

•  Where  does  this  data  originate/reside?    How  will  Splunk  collect  it?  •  Which  users/groups  will  need  access  to  this  data?    Access  controls?  •  Determine  the  indexing  volume  and  data  retenMon  requirements  •  Will  this  data  need  to  drive  exisMng  dashboards  (ES,  PCI,  etc.)?  •  Who  is  the  SME  for  this  data?  

•  Map  it  out  •  Get  a  "big  enough"  sample  of  the  event  data  •  IdenMfy  and  map  out  fields  •  Assign  sourcetype  and  TA  names  according  to  CIM  convenMons  

On-­‐boarding  Process  

Page 25: Data Onboarding

•  Dev  •  Create  (or  use)  an  app  •  Props  /  inputs  definiMon  

•  Sourcetype  definiMon  

•  Use  data  import  wizard  •  Import,  tweak,  repeat  •  Oneshot  •  [hook  up  monitor]  

On-­‐boarding  Process  

•  Prod  •  Deploy  app  •  Validate  •  Monitor  

•  Test  •  Deploy  app  •  Oneshost  •  Validate  •  Hook  up  monitor  •  Validate    

1   2  

3  

Page 26: Data Onboarding

•  General:  •  Use  apps  for  configs  

•  Use  TAs  /  add-­‐ons  from  Splunk  if  possible  •  Use  dev,  test,  prod  

•  Dev  can  be  laptop,  test  can  be  ephemeral  •  UF  when  possible  

•  HF  only  if  filtering  /  transforming  is  required  in  foreign  land  •  Unique  Sourcetype  per  event  stream  •  Don't  send  data  through  Search  Heads  •  Don't  send  data  direct  to  Indexers  

Good  Hygiene  

Page 27: Data Onboarding

•  inputs.conf  •  As  specific  as  possible  •  Set  sourcetype,  if  possible  

•  Don't  let  splunk  auto-­‐sourcetype  (no  ...too_small)  •  Specify  index  if  possible  

•  props.conf  •  Set:  TIME_PREFIX,  TIME_FORMAT,  MAX_TIMESTAMP_LOOKAHEAD  

•  OpMmally:  SHOULD_LINEMERGE  =  false,  LINE_BREAKER,  TRUNCATE  

Good  Hygiene  

Page 28: Data Onboarding

Data Onboarding Process (details)

Page 29: Data Onboarding

•  IdenMfy  the  specific  sourcetype(s)  -­‐  onboard  each  separately  •  Check  for  pre-­‐exisMng  app/TA  on  splunk.com-­‐-­‐  don't  reinvent  the  wheel!  •  Gather  info  

•  Where  does  this  data  originate/reside?    How  will  Splunk  collect  it?  •  Which  users/groups  will  need  access  to  this  data?    Access  controls?  •  Determine  the  indexing  volume  and  data  retenMon  requirements  •  Will  this  data  need  to  drive  exisMng  dashboards  (ES,  PCI,  etc.)?  •  Who  is  the  SME  for  this  data?  

•  Map  it  out  •  Get  a  "big  enough"  sample  of  the  event  data  •  IdenMfy  and  map  out  fields  •  Assign  sourcetype  and  TA  names  according  to  CIM  convenMons  

Pre-­‐Board  

Page 30: Data Onboarding

•  The  Common  InformaMon  Model  (CIM)  defines  relaMonships  in  the  underlying  data,  while  leaving  the  raw  machine  data  intact  

•  A  naming  convenMon  for  fields,  evensypes  &  tags  •  More  advanced  reporMng  and  correlaMon  requires  that  the  data  be  normalized,  categorized,  and  parsed  

•  CIM-­‐compliant  data  sources  can  drive  CIM-­‐based  dashboards  (ES,  PCI,  others)  

Tangent:  What  is  the  CIM  and  why  should  I  care?  

Page 31: Data Onboarding

•  IdenMfy  necessary  configs  (inputs,  props  and  transforms)  to  properly  handle:  

•  Mmestamp  extracMon,  Mmezone,  event  breaking,  sourcetype/host/source  assignments  

•  Do  events  contain  sensiMve  data  (i.e.,  PII,  PAN,  etc.)?  Create  masking  transforms  if  necessary  

•  Package  all  index-­‐Mme  configs  into  the  TA  

Build  the  index-­‐Mme  configs  

Page 32: Data Onboarding

•  Assign  sourcetype  according  to  event  format;  events  with  similar  format  should  have  the  same  sourcetype  

• When  do  I  need  a  separate  index?  •  When  the  data  volume  will  be  very  large,  or  when  it  will  be  searched  exclusively  a  lot  

•  When  access  to  the  data  needs  to  be  controlled  •  When  the  data  requires  a  specific  data  retenMon  policy  

•  Resist  the  temptaMon  to  create  lots  of  indexes  

Tangent:  Best  &  Worst  PracMces  

Page 33: Data Onboarding

•  Always  specify  a  sourcetype  and  index  

•  Be  as  specific  as  possible:  use  /var/log/fubar.log,                                                                                                  not  /var/log/  

•  Arrange  your  monitored  filesystems  to  minimize  unnecessary  monitored  logfiles  

•  Use  a  scratch  index  while  tesMng  new  inputs  

Best  &  Worst  PracMces  –  [monitor]  

Page 34: Data Onboarding

•  Lookout  for  inadvertent,  runaway  monitor  clauses  

•  Don’t  monitor  thousands  of  files  unnecessarily–    that’s  the  NSA’s  job  

•  From  the  CLI:  splunk  show  monitor  

•  From  your  browser:  hsps://your_splunkd:8089/services/admin/inputstatus/TailingProcessor:FileStatus  

Best  &  Worst  PracMces  –  [monitor]  

Page 35: Data Onboarding

•  Find  &  fix  index-­‐Mme  problems  BEFORE  polluMng  your  index  

•  A  try-­‐it-­‐before-­‐you-­‐fry-­‐it  interface  for  figuring  out  •  Event  breaking  •  Timestamp  recogniMon  

•  Timezone  assignment  

•  Provides  the  necessary  props.conf  parameter  sewngs  

Your  friend,  the  Data  Previewer  Another  Tangent!  

Page 36: Data Onboarding

Data Onboarding Process, continued

Page 37: Data Onboarding

•  IdenMfy  "interesMng"  events  which  should  be  tagged  with  an  exisMng  CIM  tag  (hsp://docs.splunk.com/DocumentaMon/CIM/latest/User/Alerts)  

•  Get  a  list  of  all  current  tags:  |  rest  splunk_server=local  /services/admin/tags  |  rename  tag_name  as  tag,  field_name_value  AS  definiMon,  eai:acl.app  AS  app  |  eval  definiMon_and_app=definiMon  .  "  ("  .  app  .  ")"  |  stats  values(definiMon_and_app)  as  "definiMons  (app)"  by  tag  |  sort  +tag  

•  Get  a  list  of  all  evensypes  (with  associated  tags):  |  rest  splunk_server=local  /services/admin/evensypes  |  rename  Mtle  as  evensype,  search  AS  definiMon,  eai:acl.app  AS  app  |  table  evensype  definiMon  app  tags  |  sort  +evensype  

•  Examine  the  current  list  of  CIM  tags.    For  each  "interesMng"  event,  idenMfy  which  tags  should  be  applied  to  each.    A  parMcular  event  may  have  mulMple  tags.  

•  Are  there  new  tags  which  should  be  created,  beyond  those  in  the  current  CIM  tag  library?    If  so,  add  them  to  the  CIM  library  

Build  the  search-­‐Mme  configs:  evenRypes  &  tags  

Page 38: Data Onboarding

•  Extract  "interesMng"  fields  •  If  already  in  your  CIM  library,  name  or  alias  appropriately  •  If  not  already  in  your  CIM  library,  name  according  to  CIM  convenMons  

•  Add  lookups  for  missing/desirable  fields  •  Lookups  may  be  required  to  supply  CIM-­‐compliant  fields/field  values  (for  example,  to  convert  'sev=42'  to  'severity=medium'  

•  Make  the  values  more  readable  for  humans  •  Put  everything  into  the  TA  package  

Build  the  search-­‐Mme  configs:  extracMons  &  lookups  

Page 39: Data Onboarding

•  Create  data  models.    What  will  be  interesMng  for  end  users?  

•  Document!    (Especially  the  fields,  evensypes  &  tags)  

•  Test  •  Does  this  data  drive  relevant  exisMng  dashboards  correctly?  •  Do  the  data  models  work  properly  /  produce  correct  results?  •  Is  the  TA  packaged  properly?  •  Check  with  originaMng  user/group;  is  it  OK?  

Keep  Going  

Page 40: Data Onboarding

•  Determine  addiMonal  Splunk  infrastructure  required;  can  exisMng  infrastructure  &  license  support  this?    

•  Will  new  forwarders  be  required?    If  so,  iniMate  CR  process(es)  

•  Will  firewall  changes  be  required?    If  so,  iniMate  CR  process(es)  

•  Will  new  Splunk  roles  be  required?    Create  &  map  to  AD  roles  

•  Will  new  app  contexts  be  required?    Create  app(s)  as  necessary  

•  Will  new  users  be  added?    Create  the  accounts  

Get  ready  to  deploy  

Page 41: Data Onboarding

•  Deploy  new  search  heads  &  indexers  as  needed  

•  Install  new  forwarders  as  needed  

•  Deploy  new  app  &  TA  to  search  heads  &  indexers  

•  Deploy  new  TA  to  relevant  forwarders  

Bring  it!  

Page 42: Data Onboarding

•  All  sources  reporMng?  •  Event  breaking,  Mmestamp,  Mmezone,  host,  source,  sourcetype?  

•  Field  extracMons,  aliases,  lookups?  •  Evensypes,  tags?  •  Data  model(s)?  •  User  access?  •  Confirm  with  original  requesMng  user/group:  looks  OK?  

Test  &  Validate  

Page 43: Data Onboarding

Done!

Page 44: Data Onboarding

•  Bring  new  data  sources  in  correctly  the  first  Mme  

•  Reduce  the  amount  of  “bad”  data  in  your  indexes–  and  the  Mme  spent  dealing  with  it  

•  Make  the  new  data  immediately  useful  to  ALL  users–  not  just  the  ones  who  originally  requested  it  

•  Allow  the  data  to  drive  all  sorts  of  dashboards  without  extra  modificaMons  

Gee,  this  seems  like  a  lot  of  work…  

Page 45: Data Onboarding

•  What  splunk  can  monitor:  •  hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Data/WhatSplunkcanmonitor  

•  How  data  moves  through  splunk:  •  hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Deploy/Datapipeline  

•  Components  of  the  data  pipeline:  •  hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Deploy/Componentsofadistributedenvironment  

•  Common  informaMon  model  app:  •  hsps://splunkbase.splunk.com/app/1621  

•  Common  informaMon  model  docs:  •  hsp://docs.splunk.com/DocumentaMon/CIM/latest/User/Overview  

•  Where  do  I  put  configs:  •  hsp://wiki.splunk.com/Where_do_I_configure_my_Splunk_sewngs    

Reference  

Page 46: Data Onboarding

Copyright  ©  2014  Splunk  Inc.  

Thank You!

Jeff Meyers [email protected]