Splunk

Intro. What is Splunk

Splunk turns Machine Data Into Answers

  • Real-Time – Splunk gives you the real-time answers you need to meet customer expectations and business goals.
    See How Zillow is Taking Advantage
  • Machine Data – Use Splunk to connect your machine data and gain insights into opportunities and risks for your business.
    Gain Answers With Machine Data
  • Scale – Splunk scales to meet modern data needs — embrace the complexity, get the answers.
  • AI and Machine Learning  – Leverage artificial intelligence (AI) powered by machine learning for actionable and predictive insights.
    Learn About the Must Have Technology
  • Reporting health conditions in real time
  • Delve deeper into the patient’s health record and analyze patterns
  • Alarms / Alerts to both the doctor and patient when the patient’s health degrades

Splunk is the engine for machine data

  • Machine  data  is  more  than  just  logs  -­‐  it’s  configuration  data,  data from  APIs  and  message  queues,  change  events,  the  output  of diagnostic  commands  and  more
  • Log  types:  ApplicaFon,  Web  Access  and  Proxy,  Call  Detail  Records (CDR),  Clickstream,  Message  Queues,  Packet,  Database  audit  and tables,  File  audit,  Syslog,  WMI,  PerfMon

Quick  and  easy  way  to…

  • Easily  visualize  the  data  into events  rather  then  lines  of  text
  • Quickly  get  the  data  properly broken  into  events
  • Accurately  get  the  Timestamp extracted
  • All  in  a  wicked  cool  GUI… – Once  everything  is  good  you  take  your  PROPS  secngs  and  deploy

Splunk structure

Test Environment

  • Every  Splunk  deployment  should  have a  test  environment
  • It  can  be  a  laptop,  virtual  machine  or spare  server
  • Should  have  the  same  version  of Splunk  running  in  production
  • Accessible  to  other  Splunk  developers and  administrators

CONSIDERATION IN MIND when instaling Splunk

The following considerations need to be taken into account before installing configuring:

  • 1.Disc capacity
  • 2. Prformance CPU
  • 3.SSH as best practices for app configuratuions
  • 4. SE/CIM setuo
  • 5.Universal forwarder config/install

Planning for Splunk setup

Setting up a Splunk AWS instance details: Instance URL:  ec2-1-2-3-4.eu-west-1.compute.amazonaws.com

Diagram of systems with a single EC2 Instance being the AIO. Only the UF agent (installed manually to clients) and TA (pushed to clients via DeploymentApp on Server, no manual install)  are installed remote clients/hosts.

The AIO server is comprising of all these modules All-In-One:

  • Search Indexer
  • Deployment Apps
  • SE
  • CIM
  • Generally Splunk keeps 14 days of logs, keeping 6/12months is an overkill, measured in TB which is not justified in Storage volumes
  • Data freezing: There’s HOT/WARM bucket, COLD bucket and FROZEN (archive) bucket
  • Capacity planning is key for healthy Splunk
  • Monitoring console is Healthcheck area

Apps to Install:

  • Common interface model (SE/CIM)
  • Indexes volume indexer  # Always use local, do not edit default folder. Config file is indexes.conf
  • Splunk take precedense of LOCAL ovr DEFAULT folder locations.
  • Installing apps via SSH as best practice, with configs always in LOCAL folder (and create one if missing that stores configs) as opposed to defaults DEFAULTS one.
  • It’s best to test out configs/installs in DEV-SPLUNK box and use a Trial for 60 days, then it’s free with 500MB of indexes data !!
  • Data is stored in .tsidx format and not a SQL db. Raw data is stored in tsidx

PREPARATIONS

1. Prepare Drives

Live-Splunk-App1 has the following:

  • system drive – 20GB (system)
  • primary drive – 300GB (data-drive hot holding
  • secondary drive – 100GB (holding FROSEN data, past 10/14 days as configured)

List of apps command:

 cd /ops/splunk/apps/
ls -lrt

/MNT/DATA is the 300GB DATA drive. A splunkdata folder needs to be created and then user SPLUNK has access to manage filder

chown -R splunk:splunk /splunkdata

Rebooting to refresh config:

/opt/splunk/bin/splunk restart

2. Prepare indexerbase configs

  • Editing indexes and configs mostly needs a restarts of splunk service
  • Everything in Splunk is measured in seconds

3.Prepare SE / CIM

  • lookup editor
  • SA CIM
  • Splunk_TA_nix
  • Security esentials.zip
  • We need permissions setup for TA (Technical Add-ons) which are actually scripts
chmod -R u+x /opt/splunk/apps/splunk TA_nix/bin
chown -R opt/splunk/appps

Then reboot. Thus apps asre visible pm left and also DATA MODELS

4.Pрепаре Universal Forwarders

DOMAIN_all_deployments

DOMAIN_all forwarders

PORTS need to be whitelisted – 8089, 8081,8082, 9997 etc (see further for common ports

AGENT IS INSTALLED with a quiet CMD

>>>

5. Prepare SPLUNK APPS

Splunk Server is v7

  • Agents are best to be matching version or older. The latest v7.1 is a bit risky to use. Might work but have that in mind
  • Agents are downloaded and copied to Webservers – Installation is run by a quiet CMD command:

Cluster Classes:

Creating an all_windows_server_test. Then edit classes to include relevant IP/DNS/hostname (whitelist IP/hostmame/DNS. Then add APPS, edit app, click to include and then SAVE)

Deploying RESTARTS the agent

Forwarding agent installation: Once installed to check if app is installed, click EDIT

Once installed and internal logs will start pushing (used for troubleshooting and proof)

6. Prepare TA_AGENTS

TA-agents are important, these define what is being collected for Universal Forwarder Agent to push to Splunk

Unzip file in /deployment-apps/

Then’s the security defined:

chown – R – splunk:splunk /opt/splunk/etc/splunk/apps/

su splunk

pwd

cd splunk_TA_windows

DEPLOYMENT

Forward Managemetn – Edit – Click Move to right – Now we have 3 apps deployed

Then troubleshoot if TA works in  > Splunk>Volume.Instances thus confirming Windows logs logging

Changes need to be applied:

OVERALL>SETTINGS>MONITORING CONSOLE> APPLY settings

In case Win Security is not showing – Windows Audit logs need to be enabled in MMC

7. Review with runnning some Search/Reports

Generic APP installlation steps

1.Spluink Admin

Settings>Forward Mangement (top right)

Server classes > Create new class: LIVE (this is a new group for LIVE servers) # This is needed for new GROUPS of servers

Then we have two areas:

ADD APPS – All three apps – selected to be installed

ADD CLASSES – defines which servers to add

(include) – whitelist – prefered to allow whole VPC or server IP – Addind 10.1.100.* (NOTE: Dns does not work, splunk cannot ping hostname, even when visible in gui)

Note: AWS GATEWAY must  be whitelisted for server with Private IP and VPC GW public IP

2.INSTALL THE AGENT

2.1.Agent is downloaded and silently installed via command. Go to folder and execute fillowinf

msiexec.exe /i splunkforwarder-7.1.1-8f0ead9ec3db-x64-release.msi DEPLOYMENT_SERVER=”1.2.3.4:8089″ AGREETOLICENSE=Yes SPLUNKPASSWORD=RELEVANT_CONPL /quiet

2.2.Firewall Whitelist the ProgramFiles > bin/splunkd.exe file

2.3.Enable Windows Security Logs in Locals Security Policy!!! (choose prefered success//failure audits)

2.4.Note: AWS GATEWAY must  be whitelisted in SPLUNK ADMIN

2.5.SPL management – Forwarder Management – the new server is now showing as listed

2.6. Then to push apps to Agent Servers a deploy-server command need to be executed:

su splunk

(sudo -u) splunk /opt/splunk/bin/splunk reload deploy-server

2.7 Troubleshoot if agent is not connecting

    Open logs in C:/ProgramFiles/UniversalForwarder/var/logs .. and read logs

   Next image of logs listed the pointer of Splunk as an internal IP, which was not resolved by agent. Thus SPLUNK required additional outputs.config edit to add Splunk-server identified with its PUBLIC IP also!!!

3. Once installed, a verification can be done via SEARCH:

index=_internal | stats count by host

Handy Info

Diagrams – Overview of Splunk systems

Optimisation

  • Whitelist  or  Blacklist  Windows Events
  • This  will  selectively  include  or  exclude  events  from  collection  on  a  Windows forwarder
  • Available  feature  on  6.x  or  greater  Windows  forwarders
  • All  controlled  through  inputs.conf  on  the  Windows  forwarders

Example:

[WinEventLog://Security]
whitelist  =  4,5,7,100-­‐200

[WinEventLog://Security]
blacklist  =  EventCode=%^200$%  User=%duca%

  • Provides  reliable  and  consistent  indexing  of  data  with  headers
  • Address  issue  on  forwarder:

  INDEX_EXTRACTIONS  =  {CSV  |  W3C  |  TSV  |  PSV  |  JSON}

  • Supports  custom  header  parsing  and  easy  mode  for  common formats
  • Extract  IIS  fields  using  Props.conf  on  Windows  forwarder: [IIS]

INDEX_EXTRACTIONS  =  w3c

  • Modular Inputs – Splunk  Enterprise  app  or  add-­‐on  that  extends  the  Splunk  Enterprise  framework  to  define  a  custom  input  capability. Examples: (Checkpoint OPSEC,  Twider,  Stream,  Amazon  S3  Online storage)
  • Scripted Imputs – A  scripted  input  is  used  to  get  data  from  applicaFon  program  interfaces  (APIs)  and other  remote  data  interfaces  and  message  queues. Examples (VMStat,  Top,  iostat)
  • Scripted  Inputs  Example – This is Shell  script  saved  in  /opt/splunk/bin/scripts/  OR  in  a  specific  App; It Allows  you  to  execute  any  program  on  Splunk  Forwarder  and  index

STDOUT  data

  • Splunk  DB  Connect is also an option – Allows  for  indexing  data  directly from  database  queries.
  • DB  Connect  Best  Practice:

— Normalize  Fmestamps  naFvely  inside  the  SQL  Query

— Filter  results  down  in  SQL  Query  to  reduce  garbage  in Splunk  Index.

— Repeated  DBLookups  should  be  converted  to  static lookup

— Search  Head  Pooling  requires  encrypted  password replication

— Search  Head  Clustering  Supported

  • Splunk  App  For  Stream – Provides  the  ability  to  capture  real-­‐Fme  streaming  wire  data  from anywhere  in  your  datacenter  or  from  any  public  Cloud infrastructure (Win, Mac, Unix)
  • Splunk  Stream  DNS  Capture – Full  DNS  Queries  without  logging  enabled 

Ports used by Splunk

Common ports listed below (All ports are TCP)

  • 9997 for forwarders to the Splunk indexer. 9997 is not a default; just a convention. You need to set it explicitly on the receiving instance (indexer). Flows on port 9997 from the search heads, deployment server, license server, and cluster master to the indexers, with a footnote that this is an optional flow used for forwarding Splunk’s internal indexes (a recommended best practice).
  • 8000 for clients to the Splunk Search page
  • 8089 for splunkd (also used by deployment server).

Optional ports for distributed systems:

  • 8080 – Indexer Replication port
  • 514 – Network port
  • 8191 – KV store port (since v6.2)
  • Search Head Clustering uses a new replication port that you can pick, e.g. 8181. Also with SHC you need the KV store port (by default, 8191) must be available to all other members. You can use the CLI command splunk show kvstore-port to identify the port number. The replication port must be available to all other members.

Note: There’s confusion about port required from UFs to a HF. Which is 9997 too i.e. Many uses HF & DS as same server.

UFs —9997—> HF — 9997—> Indexers
UFs, Indexers, SHs —8089 —> DS

Directions of ports. Generally as below. Use tcpdump to verify

  • 8089 for the deployment server is only needed from the client to the deployment server. Client being indexer, UF, etc.
  • 9997 from the forwarder to the indexer. No connection is needed back from the indexers.
  • 8089 is also used from a Search Head to your indexers. Again only single direction.
  • port 8089 for the license-master (from license-slave to license-master)
  • port XXXX for the replication cluster master, and slaves.

Source: https://answers.splunk.com/answers/58888/what-are-the-ports-that-i-need-to-open.html

Writing Effective Queries for Splunk with SPL

Source: https://www.zeroex00.com/2018/06/writing-effective-queries-for-splunk.html

Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason. It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk Processing Language which is used to generate queries for searching through data within Splunk.

The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely to be conducted, though it can apply just about any where. It is key to be able to have relevant data sets for which to properly vet queries against. Fortunately, there are many example data sets available for testing on GitHub, from Splunk, and some mentioned below. There are also “data generators” which can generate noise for testing. Best of all would be to create your own though :).

I was fortunate to have had the enjoyable experience of participating in a Boss of the SOC CTF a few years back, which had some pretty good exemplar security related data. Earlier this year, they released the data set publicly here.

This guide is not meant to be a deep dive into the structuring of a query using the SPL. The best place for that is the Splunk documentation itself, starting with this. This is geared more towards operations in which multiple queries are written, maintained, and used in an operational capacity. Many of these concepts can be generalized and applied to other signatures, rules, code or programmatic functions, such as Snort, YARA, or ELK, in which a large quantity of multi-version discrete units must be maintained.

1. Balance efficiency with enough specificity to minimize false positives

The ultimate goal of any Splunk query is to search and present data in order to answer some question(s). There are many right ways to search in Splunk, but there are often far fewer best ways (yes, multiple bests, see next sentence). Before formulating a search query, a couple considerations should be weighed and prioritized, such as accuracy, efficiency, clarity, integrity, and duration. It is easy to get spoiled by simply doing wildcard searches, but also just as easy to unnecessarily bog down a search with superfluous key value mappings. An over reliance of either can lead to problems.

Accuracy – are there multiple sources which can answer the question? If so, which is more reliable and authoritative? More importantly, how important is it to reduce or eliminate false positives from your results? There is a heavy inverse correlation between accuracy and efficiency.
Clarity – filtering down to the most relevant information needed to answer the question is only half of the battle –you still need to interpret it. It may be fine to view the results as raw data if there are only one or two results of non-complex data, but when there are rows of deeply structured data, taking the time to present it in the most appropriate manner will go a long way.

Duration – the length required for the query to complete. Is this a search that will be run often, and so delays are additive and add to total inefficiency; is there an urgent need to answer something ASAP; is a longer duration eating up resources on other running functions on the search head? Sometimes it is necessary to break a search into smaller sub-searches or to target smaller sets of data and then pivot from there.

Efficiency – closely tied to duration, an inefficient query will lead to unnecessary delays, excessive resource consumption, and could even effect the integrity of the data (pay close attention to implicit limitations of results on certain commands!). Paying attention to efficiency is especially important if there are per-user limitations on number of searches, memory usage, or other constraints.Too many explicitly defined wildcard placeholders could become very expensive, and the atomicity of a formulated query should always be considered.

Integrity – will you be manipulating any data as part of your search? If so, understand the risks to compromising the integrity of your results in doing so. The more pivots made on returned data, the more susceptible to loss of integrity the search becomes.

2. Make it readable

Write queries in a consistent and clear manner. Sometimes it is better to have a query take up many additional lines for the sake of better readability. Breaking into newlines on pipes is the defacto standard for readability purposes, as can be seen below.

event_simpleName IN (SyntheticProcessRollup2, ProcessRollup2) ImageFileName="*Windows\\\System32\\\\regsvr32.exe" CommandLine="*/i:http*" AND ParentCommandLine="*scrobj.dll*"
| rex field=CommandLine "/i:(?<sct_file_tmp>\S+)"
| eval sct_file=replace(sct_file_tmp, ":", "[:]")
| eval ParentProcess=ImageFileName
| eval ParentCLI=CommandLine
| eval ParentUser=UserName
| rename TargetProcessId_decimal AS ParentProcessId_decimal
| join ParentProcessId_decimal 
	[search event_simpleName IN (SyntheticProcessRollup, ProcessRollup2)
	| eval ChildProcess=ImageFileName
	| eval ChildCLI=CommandLine
	| eval ChildUser=UserName]
| table _time ParentUser ParentCLI ChildProcess ChildCLI sct_file

3. Make it extensible

Queries should be written in such a way that other people can modify it for their own adaptations or to update or expand a current one. Some ways to accomplish this would be using obvious variable names, readability, or even leaving in inexpensive functionality or variables which can be used for other purposes.

4. Make it modular

Modularity will lead to extensibility, maintainability, and resiliency. This will also increase efficiency as code reuse will be much simpler.

5. Make it feasible

If the query is written for the purpose of manual sifting and analysis, then 50k results is not very reasonable. However, if it is for stateful preservation, alerts, or lookups, then that is more acceptable. Incorporating pivots on the information with subsearches and filtering or even, if necessary, breaking it up in to multiple different queries will make managing the results a surmountable task.

6. Make it resilient

The data can change and so can the SPL itself (or even custom commands if used), so writing queries that are less effected by potential changes is important, especially if the effects of the changes are not obvious, which could lead to a loss of integrity in the results. (This is where testing is also important)

7. Make it consistent

Having a style guide may seem like overkill, but if your operation is highly dependent on maintaining a repository of queries, it can go a long way. Naming conventions, spacing, line breaks, use of quotations, ordering, and style are some of the things to standardize to help with consistency.

8. Make it identifiable

Something as simple as:

 | eval queryID=wxp-110 

This ID can then be printed out with the results if needed or purely used as a means to categorize and quickly identify. Naming conventions should be obvious or recognizable (wxp = Windows XP, query 110), or even mappable to the repository itself. 

9. Make it noob friendly

This is obviously highly dependent on your usage and organizational structure, however, it never hurts to keep queries as simple as can be, since there is always the chance that someone else will need to maintain or interpret them. Bonus* less time needing to train people on their purpose!

10. RTFM!

I am a huge proponent of RTFM (F!=field, btw) for both myself and others. Splunk has put a lot of effort into meticulous documentation, which is clearly reflected in the detailed and thorough documentation. With regards to writing SPL queries, the search reference is your absolute best friend!

11. Know your data

The first two things that I tell anyone to do that is new to Splunk is to familiarize yourself with the syntax of SPL (#10) and just as importantly, to get to know how the data is structured. The simplest way to do this is to do a wildcard search (*) and start reviewing the raw results under the events tab. The data will usually be structure in XML or JSON. Initially, it will be less important to know which data was structured from indexing, field extractions, or other transforms, but may become important with more advanced searches.

12. Test it

Do not ever merge a query into production ops, bless off on it, trust it, or whatever it is you do to give it legitimacy without first testing and confirmation of positive results. Regardless of how simple the query is, you can never guarantee that some other confounding issue isn’t occurring. If it is a matter of missing the applicable data, well then, Try Harder! There are many great products out there to help with this at scale, such as Red Canary’s atomic red team or Mitre’s caldera.

13. Build it out piecemeal 

It can get stressful spending a lot of time on a query, only for it to not return the correct or any results, regardless of tweaking. The best way to build complex queries is to build them in pieces, testing as you go along. This is especially convenient because you can point to available data for the sake of testing to ensure positive results, and then change it as it is built out.

# ensure you have data for the computer
host=ComputerA  

# ensure you have data being parsed from that computer to the CommandLine field
host=ComputerA CommandLine=*  

# search for all occurences of python in command line activity for the computer
host=ComputerA CommandLine="*python*"

...

#search for all systems where powershell spawned a python program in which 3 or more parameters are passed
host=* ParentProcess="powershell.exe" process="python.exe"
| rex field=CommandLine "(\s-{1,2})(?<flags>\S+)" max_match=0
| stats count values(flags) by host
| where count>3
| sort 0 host

14. Implement version control

The necessity of this is really dependent on the amount of queries and modifications, though it makes sense even for small quantities. This can be accomplished as simply as baking a version into the query itself, such as from #8 with revisions tacked on with periods (wxp-110.3) or even in its own field:

 | eval version=3

Even better than that would be to maintain them in a database or repository such as GitHub, which gives the added benefit of stateful change representations. It is also possible to save searches directly in Splunk, the version control is less intuitive in this way.

15. Maintain multiple versions of the same thing

This doesn’t just apply to older versions of the same query, but queries which may search the same thing but present it in a different manner, search a different data set, or search a different time window.

16. Don’t reinvent the wheel

It is all too easy to blow a full 12 hour shift perfecting a query, which may not even end up working at all. While it is important to have these search queries catered to your specific need, it is not always necessary to MacGyver it alone. There are lots of great resources available to borrow ideas or techniques from, such as the Splunk blogs and forums, or you can even work with a co-worker.

17. Don’t depend on the wheel

Counter to #16, you do not want to become over reliant on searching for help, as this could lead to running queries which may not be working as you think they are. This could also potentially compromise the integrity of the results. Worse yet, it could be an inefficient way of doing something which has caught on and persisted through the forums.

18. Share it

If you have written a gem or come up with a novel approach to something, share it back with the community. Even if the data set is different, there may still be much which can be gleaned from it. It also helps to drive conversations which benefit the community as a whole. 

19. Save it

This is such an obvious one, but in spite of that, I still constantly find myself rewriting queries that I had previously written over and over again…

20. REGEX! 

I don’t know why I have this all the way down at #20, because this is easily one of the most powerful and important concepts for which to be able to pivot on results with. There are several commands where regex is able to be leveraged, but the two most significant are regex and rex
Regex does exactly what it says –allows you to filter on respective fields (or _raw) using regex, which in Splunk is a slimmed down version of PCRE. The rex command is much more powerful, in that it allows you to create fields based on the parsed data, which can then be used to pivot your searches on. You can even build it as a multivalued field if more than one match occurs. An example of the rex command (and potentially more than one value) can be seen in the example from #13.

21. Know when its better to go beyond just using a search with SPL

Finally, we made it all the way to #21! Sometimes, depending on circumstance, function, and operational usage, manual searching with SPL queries is just not the best answer. Splunk has a lot of other functionality which can accomplish many of the same things, with less manual requirements. Alerts, scheduled reports, dashboards, and any of a number of apps built within or against the API allow for almost limitless capability. If you are struggling to maintain or achieve some of the topics annotated here, it may mean it is time to explore some of these alternative options.

Overall

This is certainly not an all inclusive list, as there are many more practices which can apply here. Ultimately, it depends on the specific deployment, implementation, and usage of Splunk which should dictate exactly how you create and maintain search queries. This was also not meant to go too deep in the weeds on generating advanced queries (though that may come in the future), but rather a high level approach to maintaining quality and standards. There are many other people who are far more experienced and with much greater Splunk-fu out there, so if you have any input or insight, please feel free to reach out.