Intro. What is Splunk
Splunk turns Machine Data Into Answers
- Real-Time – Splunk gives you the
real-time answers you need to meet customer expectations and business
goals.
See How Zillow is Taking Advantage - Machine Data – Use Splunk to connect
your machine data and gain insights into opportunities and risks for your
business.
Gain Answers With Machine Data - Scale – Splunk scales to meet modern data needs — embrace the complexity, get the answers.
- AI and Machine
Learning – Leverage artificial intelligence (AI) powered by
machine learning for actionable and predictive insights.
Learn About the Must Have Technology - Reporting health conditions in real time
- Delve deeper into the patient’s health record and analyze patterns
- Alarms / Alerts to both the doctor and patient when the patient’s health degrades
Splunk is the engine for machine data
- Machine data is more than just logs -‐ it’s configuration data, data from APIs and message queues, change events, the output of diagnostic commands and more
- Log types: ApplicaFon, Web Access and Proxy, Call Detail Records (CDR), Clickstream, Message Queues, Packet, Database audit and tables, File audit, Syslog, WMI, PerfMon
Quick and easy way to…
- Easily visualize the data into events rather then lines of text
- Quickly get the data properly broken into events
- Accurately get the Timestamp extracted
- All in a wicked cool GUI… – Once everything is good you take your PROPS secngs and deploy
Splunk structure
Test Environment
- Every Splunk deployment should have a test environment
- It can be a laptop, virtual machine or spare server
- Should have the same version of Splunk running in production
- Accessible to other Splunk developers and administrators
CONSIDERATION IN MIND when instaling Splunk
The following considerations need to be taken into account before installing configuring:
- 1.Disc capacity
- 2. Prformance CPU
- 3.SSH as best practices for app configuratuions
- 4. SE/CIM setuo
- 5.Universal forwarder config/install
Planning for Splunk setup
Setting up a Splunk AWS instance details: Instance URL: ec2-1-2-3-4.eu-west-1.compute.amazonaws.com
Diagram of systems with a single EC2 Instance being the AIO. Only the UF agent (installed manually to clients) and TA (pushed to clients via DeploymentApp on Server, no manual install) are installed remote clients/hosts.
The AIO server is comprising of all these modules All-In-One:
- Search Indexer
- Deployment Apps
- SE
- CIM
- Generally Splunk keeps 14 days of logs, keeping 6/12months is an overkill, measured in TB which is not justified in Storage volumes
- Data freezing: There’s HOT/WARM bucket, COLD bucket and FROZEN (archive) bucket
- Capacity planning is key for healthy Splunk
- Monitoring console is Healthcheck area
Apps to Install:
- Common interface model (SE/CIM)
- Indexes volume indexer # Always use local, do not edit default folder. Config file is indexes.conf
- Splunk take precedense of LOCAL ovr DEFAULT folder locations.
- Installing apps via SSH as best practice, with configs always in LOCAL folder (and create one if missing that stores configs) as opposed to defaults DEFAULTS one.
- It’s best to test out configs/installs in DEV-SPLUNK box and use a Trial for 60 days, then it’s free with 500MB of indexes data !!
- Data is stored in .tsidx format and not a SQL db. Raw data is stored in tsidx
PREPARATIONS
1. Prepare Drives
Live-Splunk-App1 has the following:
- system drive – 20GB (system)
- primary drive – 300GB (data-drive hot holding
- secondary drive – 100GB (holding FROSEN data, past 10/14 days as configured)
List of apps command:
cd /ops/splunk/apps/
ls -lrt
/MNT/DATA is the 300GB DATA drive. A splunkdata folder needs to be created and then user SPLUNK has access to manage filder
chown -R splunk:splunk /splunkdata
Rebooting to refresh config:
/opt/splunk/bin/splunk restart
2. Prepare indexerbase configs
- Editing indexes and configs mostly needs a restarts of splunk service
- Everything in Splunk is measured in seconds
3.Prepare SE / CIM
- lookup editor
- SA CIM
- Splunk_TA_nix
- Security esentials.zip
- We need permissions setup for TA (Technical Add-ons) which are actually scripts
chmod -R u+x /opt/splunk/apps/splunk TA_nix/bin
chown -R opt/splunk/appps
Then reboot. Thus apps asre visible pm left and also DATA MODELS
4.Pрепаре Universal Forwarders
DOMAIN_all_deployments
DOMAIN_all forwarders
PORTS need to be whitelisted – 8089, 8081,8082, 9997 etc (see further for common ports
AGENT IS INSTALLED with a quiet CMD
>>>
5. Prepare SPLUNK APPS
Splunk Server is v7
- Agents are best to be matching version or older. The latest v7.1 is a bit risky to use. Might work but have that in mind
- Agents are downloaded and copied to Webservers – Installation is run by a quiet CMD command:
Cluster Classes:
Creating an all_windows_server_test. Then edit classes to include relevant IP/DNS/hostname (whitelist IP/hostmame/DNS. Then add APPS, edit app, click to include and then SAVE)
Deploying RESTARTS the agent
Forwarding agent installation: Once installed to check if app is installed, click EDIT
Once installed and internal logs will start pushing (used for troubleshooting and proof)
6. Prepare TA_AGENTS
TA-agents are important, these define what is being collected for Universal Forwarder Agent to push to Splunk
Unzip file in /deployment-apps/
Then’s the security defined:
chown – R – splunk:splunk /opt/splunk/etc/splunk/apps/
su splunk
pwd
cd splunk_TA_windows
DEPLOYMENT
Forward Managemetn – Edit – Click Move to right – Now we have 3 apps deployed
Then troubleshoot if TA works in > Splunk>Volume.Instances thus confirming Windows logs logging
Changes need to be applied:
OVERALL>SETTINGS>MONITORING CONSOLE> APPLY settings
In case Win Security is not showing – Windows Audit logs need to be enabled in MMC
7. Review with runnning some Search/Reports
Generic APP installlation steps
1.Spluink Admin
Settings>Forward Mangement (top right)
Server classes > Create new class: LIVE (this is a new group for LIVE servers) # This is needed for new GROUPS of servers
Then we have two areas:
ADD APPS – All three apps – selected to be installed
ADD CLASSES – defines which servers to add
(include) – whitelist – prefered to allow whole VPC or server IP – Addind 10.1.100.* (NOTE: Dns does not work, splunk cannot ping hostname, even when visible in gui)
Note: AWS GATEWAY must be whitelisted for server with Private IP and VPC GW public IP
2.INSTALL THE AGENT
2.1.Agent is downloaded and silently installed via command. Go to folder and execute fillowinf
msiexec.exe /i splunkforwarder-7.1.1-8f0ead9ec3db-x64-release.msi DEPLOYMENT_SERVER=”1.2.3.4:8089″ AGREETOLICENSE=Yes SPLUNKPASSWORD=RELEVANT_CONPL /quiet
2.2.Firewall Whitelist the ProgramFiles > bin/splunkd.exe file
2.3.Enable Windows Security Logs in Locals Security Policy!!! (choose prefered success//failure audits)
2.4.Note: AWS GATEWAY must be whitelisted in SPLUNK ADMIN
2.5.SPL management – Forwarder Management – the new server is now showing as listed
2.6. Then to push apps to Agent Servers a deploy-server command need to be executed:
su splunk
(sudo -u) splunk /opt/splunk/bin/splunk reload deploy-server
2.7 Troubleshoot if agent is not connecting
Open logs in C:/ProgramFiles/UniversalForwarder/var/logs .. and read logs
Next image of logs listed the pointer of Splunk as an internal IP, which was not resolved by agent. Thus SPLUNK required additional outputs.config edit to add Splunk-server identified with its PUBLIC IP also!!!
3. Once installed, a verification can be done via SEARCH:
index=_internal | stats count by host
Handy Info
Diagrams – Overview of Splunk systems
Optimisation
- Whitelist or Blacklist Windows Events
- This will selectively include or exclude events from collection on a Windows forwarder
- Available feature on 6.x or greater Windows forwarders
- All controlled through inputs.conf on the Windows forwarders
Example:
[WinEventLog://Security]
whitelist = 4,5,7,100-‐200
…
[WinEventLog://Security]
blacklist = EventCode=%^200$% User=%duca%
…
- Provides reliable and consistent indexing of data with headers
- Address issue on forwarder:
INDEX_EXTRACTIONS = {CSV | W3C | TSV | PSV | JSON}
- Supports custom header parsing and easy mode for common formats
- Extract IIS fields using Props.conf on Windows forwarder: [IIS]
INDEX_EXTRACTIONS = w3c
- Modular Inputs – Splunk Enterprise app or add-‐on that extends the Splunk Enterprise framework to define a custom input capability. Examples: (Checkpoint OPSEC, Twider, Stream, Amazon S3 Online storage)
- Scripted Imputs – A scripted input is used to get data from applicaFon program interfaces (APIs) and other remote data interfaces and message queues. Examples (VMStat, Top, iostat)
- Scripted Inputs Example – This is Shell script saved in /opt/splunk/bin/scripts/ OR in a specific App; It Allows you to execute any program on Splunk Forwarder and index
STDOUT data
- Splunk DB Connect is also an option – Allows for indexing data directly from database queries.
- DB Connect Best Practice:
— Normalize Fmestamps naFvely inside the SQL Query
— Filter results down in SQL Query to reduce garbage in Splunk Index.
— Repeated DBLookups should be converted to static lookup
— Search Head Pooling requires encrypted password replication
— Search Head Clustering Supported
- Splunk App For Stream – Provides the ability to capture real-‐Fme streaming wire data from anywhere in your datacenter or from any public Cloud infrastructure (Win, Mac, Unix)
- Splunk Stream DNS Capture – Full DNS Queries without logging enabled
Ports used by Splunk
Common ports listed below (All ports are TCP)
- 9997 for forwarders to the Splunk indexer. 9997 is not a default; just a convention. You need to set it explicitly on the receiving instance (indexer). Flows on port 9997 from the search heads, deployment server, license server, and cluster master to the indexers, with a footnote that this is an optional flow used for forwarding Splunk’s internal indexes (a recommended best practice).
- 8000 for clients to the Splunk Search page
- 8089 for splunkd (also used by deployment server).
Optional ports for distributed systems:
- 8080 – Indexer Replication port
- 514 – Network port
- 8191 – KV store port (since v6.2)
- Search Head Clustering uses a new replication port that you can pick, e.g. 8181. Also with SHC you need the KV store port (by default, 8191) must be available to all other members. You can use the CLI command splunk show kvstore-port to identify the port number. The replication port must be available to all other members.
Note: There’s confusion about port required from UFs to a HF. Which is 9997 too i.e. Many uses HF & DS as same server.
UFs —9997—> HF — 9997—> Indexers
UFs, Indexers, SHs —8089 —> DS
Directions of ports. Generally as below. Use tcpdump to verify
- 8089 for the deployment server is only needed from the client to the deployment server. Client being indexer, UF, etc.
- 9997 from the forwarder to the indexer. No connection is needed back from the indexers.
- 8089 is also used from a Search Head to your indexers. Again only single direction.
- port 8089 for the license-master (from license-slave to license-master)
- port XXXX for the replication cluster master, and slaves.
Source: https://answers.splunk.com/answers/58888/what-are-the-ports-that-i-need-to-open.html
Writing Effective Queries for Splunk with SPL
Source: https://www.zeroex00.com/2018/06/writing-effective-queries-for-splunk.html
Splunk is arguably one of the most popular and powerful tools across the security space at the moment, and for good reason. It is an incredibly powerful way to sift through and analyze big sets of data in an intuitive manner. SPL is the Splunk Processing Language which is used to generate queries for searching through data within Splunk.
The organization I have in mind when writing this is a SOC or CSIRT, in which large scale hunting via Splunk is likely to be conducted, though it can apply just about any where. It is key to be able to have relevant data sets for which to properly vet queries against. Fortunately, there are many example data sets available for testing on GitHub, from Splunk, and some mentioned below. There are also “data generators” which can generate noise for testing. Best of all would be to create your own though :).
I was fortunate to have had the enjoyable experience of participating in a Boss of the SOC CTF a few years back, which had some pretty good exemplar security related data. Earlier this year, they released the data set publicly here.
This guide is not meant to be a deep dive into the structuring of a query using the SPL. The best place for that is the Splunk documentation itself, starting with this. This is geared more towards operations in which multiple queries are written, maintained, and used in an operational capacity. Many of these concepts can be generalized and applied to other signatures, rules, code or programmatic functions, such as Snort, YARA, or ELK, in which a large quantity of multi-version discrete units must be maintained.
1. Balance efficiency with enough specificity to minimize false positives
The ultimate goal of any Splunk query is to search and present data in order to answer some question(s). There are many right ways to search in Splunk, but there are often far fewer best ways (yes, multiple bests, see next sentence). Before formulating a search query, a couple considerations should be weighed and prioritized, such as accuracy, efficiency, clarity, integrity, and duration. It is easy to get spoiled by simply doing wildcard searches, but also just as easy to unnecessarily bog down a search with superfluous key value mappings. An over reliance of either can lead to problems.
Accuracy – are there multiple sources which can answer the question? If so, which is more reliable and authoritative? More importantly, how important is it to reduce or eliminate false positives from your results? There is a heavy inverse correlation between accuracy and efficiency.
Clarity – filtering down to the most relevant information needed to answer the question is only half of the battle –you still need to interpret it. It may be fine to view the results as raw data if there are only one or two results of non-complex data, but when there are rows of deeply structured data, taking the time to present it in the most appropriate manner will go a long way.
Duration – the length required for the query to complete. Is this a search that will be run often, and so delays are additive and add to total inefficiency; is there an urgent need to answer something ASAP; is a longer duration eating up resources on other running functions on the search head? Sometimes it is necessary to break a search into smaller sub-searches or to target smaller sets of data and then pivot from there.
Efficiency – closely tied to duration, an inefficient query will lead to unnecessary delays, excessive resource consumption, and could even effect the integrity of the data (pay close attention to implicit limitations of results on certain commands!). Paying attention to efficiency is especially important if there are per-user limitations on number of searches, memory usage, or other constraints.Too many explicitly defined wildcard placeholders could become very expensive, and the atomicity of a formulated query should always be considered.
Integrity – will you be manipulating any data as part of your search? If so, understand the risks to compromising the integrity of your results in doing so. The more pivots made on returned data, the more susceptible to loss of integrity the search becomes.
2. Make it readable
Write queries in a consistent and clear manner. Sometimes it is better to have a query take up many additional lines for the sake of better readability. Breaking into newlines on pipes is the defacto standard for readability purposes, as can be seen below.
event_simpleName IN (SyntheticProcessRollup2, ProcessRollup2) ImageFileName="*Windows\\\System32\\\\regsvr32.exe" CommandLine="*/i:http*" AND ParentCommandLine="*scrobj.dll*" | rex field=CommandLine "/i:(?<sct_file_tmp>\S+)" | eval sct_file=replace(sct_file_tmp, ":", "[:]") | eval ParentProcess=ImageFileName | eval ParentCLI=CommandLine | eval ParentUser=UserName | rename TargetProcessId_decimal AS ParentProcessId_decimal | join ParentProcessId_decimal [search event_simpleName IN (SyntheticProcessRollup, ProcessRollup2) | eval ChildProcess=ImageFileName | eval ChildCLI=CommandLine | eval ChildUser=UserName] | table _time ParentUser ParentCLI ChildProcess ChildCLI sct_file
3. Make it extensible
Queries should be written in such a way that other people can modify it for their own adaptations or to update or expand a current one. Some ways to accomplish this would be using obvious variable names, readability, or even leaving in inexpensive functionality or variables which can be used for other purposes.
4. Make it modular
Modularity will lead to extensibility, maintainability, and resiliency. This will also increase efficiency as code reuse will be much simpler.
5. Make it feasible
If the query is written for the purpose of manual sifting and analysis, then 50k results is not very reasonable. However, if it is for stateful preservation, alerts, or lookups, then that is more acceptable. Incorporating pivots on the information with subsearches and filtering or even, if necessary, breaking it up in to multiple different queries will make managing the results a surmountable task.
6. Make it resilient
The data can change and so can the SPL itself (or even custom commands if used), so writing queries that are less effected by potential changes is important, especially if the effects of the changes are not obvious, which could lead to a loss of integrity in the results. (This is where testing is also important)
7. Make it consistent
Having a style guide may seem like overkill, but if your operation is highly dependent on maintaining a repository of queries, it can go a long way. Naming conventions, spacing, line breaks, use of quotations, ordering, and style are some of the things to standardize to help with consistency.
8. Make it identifiable
Something as simple as:
| eval queryID=wxp-110
This ID can then be printed out with the results if needed or purely used as a means to categorize and quickly identify. Naming conventions should be obvious or recognizable (wxp = Windows XP, query 110), or even mappable to the repository itself.
9. Make it noob friendly
This is obviously highly dependent on your usage and organizational structure, however, it never hurts to keep queries as simple as can be, since there is always the chance that someone else will need to maintain or interpret them. Bonus* less time needing to train people on their purpose!
10. RTFM!
I am a huge proponent of RTFM (F!=field, btw) for both myself and others. Splunk has put a lot of effort into meticulous documentation, which is clearly reflected in the detailed and thorough documentation. With regards to writing SPL queries, the search reference is your absolute best friend!
11. Know your data
The first two things that I tell anyone to do that is new to Splunk is to familiarize yourself with the syntax of SPL (#10) and just as importantly, to get to know how the data is structured. The simplest way to do this is to do a wildcard search (*) and start reviewing the raw results under the events tab. The data will usually be structure in XML or JSON. Initially, it will be less important to know which data was structured from indexing, field extractions, or other transforms, but may become important with more advanced searches.
12. Test it
Do not ever merge a query into production ops, bless off on it, trust it, or whatever it is you do to give it legitimacy without first testing and confirmation of positive results. Regardless of how simple the query is, you can never guarantee that some other confounding issue isn’t occurring. If it is a matter of missing the applicable data, well then, Try Harder! There are many great products out there to help with this at scale, such as Red Canary’s atomic red team or Mitre’s caldera.
13. Build it out piecemeal
It can get stressful spending a lot of time on a query, only for it to not return the correct or any results, regardless of tweaking. The best way to build complex queries is to build them in pieces, testing as you go along. This is especially convenient because you can point to available data for the sake of testing to ensure positive results, and then change it as it is built out.
# ensure you have data for the computer host=ComputerA # ensure you have data being parsed from that computer to the CommandLine field host=ComputerA CommandLine=* # search for all occurences of python in command line activity for the computer host=ComputerA CommandLine="*python*" ... #search for all systems where powershell spawned a python program in which 3 or more parameters are passed host=* ParentProcess="powershell.exe" process="python.exe" | rex field=CommandLine "(\s-{1,2})(?<flags>\S+)" max_match=0 | stats count values(flags) by host | where count>3 | sort 0 host
14. Implement version control
The necessity of this is really dependent on the amount of queries and modifications, though it makes sense even for small quantities. This can be accomplished as simply as baking a version into the query itself, such as from #8 with revisions tacked on with periods (wxp-110.3) or even in its own field:
| eval version=3
Even better than that would be to maintain them in a database or repository such as GitHub, which gives the added benefit of stateful change representations. It is also possible to save searches directly in Splunk, the version control is less intuitive in this way.
15. Maintain multiple versions of the same thing
This doesn’t just apply to older versions of the same query, but queries which may search the same thing but present it in a different manner, search a different data set, or search a different time window.
16. Don’t reinvent the wheel
It is all too easy to blow a full 12 hour shift perfecting a query, which may not even end up working at all. While it is important to have these search queries catered to your specific need, it is not always necessary to MacGyver it alone. There are lots of great resources available to borrow ideas or techniques from, such as the Splunk blogs and forums, or you can even work with a co-worker.
17. Don’t depend on the wheel
Counter to #16, you do not want to become over reliant on searching for help, as this could lead to running queries which may not be working as you think they are. This could also potentially compromise the integrity of the results. Worse yet, it could be an inefficient way of doing something which has caught on and persisted through the forums.
18. Share it
If you have written a gem or come up with a novel approach to something, share it back with the community. Even if the data set is different, there may still be much which can be gleaned from it. It also helps to drive conversations which benefit the community as a whole.
19. Save it
This is such an obvious one, but in spite of that, I still constantly find myself rewriting queries that I had previously written over and over again…
20. REGEX!
I don’t know why I have this all the way down at #20, because this is easily one of the most powerful and important concepts for which to be able to pivot on results with. There are several commands where regex is able to be leveraged, but the two most significant are regex and rex.
Regex does exactly what it says –allows you to filter on respective fields (or _raw) using regex, which in Splunk is a slimmed down version of PCRE. The rex command is much more powerful, in that it allows you to create fields based on the parsed data, which can then be used to pivot your searches on. You can even build it as a multivalued field if more than one match occurs. An example of the rex command (and potentially more than one value) can be seen in the example from #13.
21. Know when its better to go beyond just using a search with SPL
Finally, we made it all the way to #21! Sometimes, depending on circumstance, function, and operational usage, manual searching with SPL queries is just not the best answer. Splunk has a lot of other functionality which can accomplish many of the same things, with less manual requirements. Alerts, scheduled reports, dashboards, and any of a number of apps built within or against the API allow for almost limitless capability. If you are struggling to maintain or achieve some of the topics annotated here, it may mean it is time to explore some of these alternative options.
Overall
This is certainly not an all inclusive list, as there are many more practices which can apply here. Ultimately, it depends on the specific deployment, implementation, and usage of Splunk which should dictate exactly how you create and maintain search queries. This was also not meant to go too deep in the weeds on generating advanced queries (though that may come in the future), but rather a high level approach to maintaining quality and standards. There are many other people who are far more experienced and with much greater Splunk-fu out there, so if you have any input or insight, please feel free to reach out.