Log Analysis Part 3: Using Greynoise with Logstash
2020-06-15
The Problem: A Lack of Context
Exposing anything to the public internet does not come without risks. The threat of server takeover for cryptocurrency mining, malicious content hosting, unauthorized scanning activity, or website defacement are the primary threats faced by a blog hosted on the public internet.
When analyzing web server activity for exploit attempts in line with the above objectives, the primary questions seeking to be answered are:
What activity do I need to pay attention to? - With systems fully patched, commodity malware or scanning activity does not pose a serious threat. Information sharing amongst devices on the internet can help identify if activity is seen by most or all devices, or only a targeted device. Targeted activity is often an indicator of a more advanced adversary, which can narrow a proactive search for malicious activity.
What activity can be ignored? - In line with the above question, if systems are fully patched, routine scanning activity or exploit attempts can be ignored. Once that activity is filtered out, the remaining activity warrants further investigation, as it is likely targeted activity.
Was this activity / exploit attempt successful? - Once targeted activity is identified, context of both an application's request/response handling and payload activity can be used to identify if malicious activity was successful.
Greynoise
Greynoise is a service that analyzes internet background activity. Their blog post describes the problem similar to above, asking the question "is this just regular Internet background noise or is machine actually targeting and attacking ME specifically?" With an "anti-threat intelligence" approach, this service can inform responders what activity is being seen by the rest of their internet, not just their systems. This information be used to filter events that are not targeted activity and are being seen by systems within the Greynoise network.
Greynoise has a free front-end visualizer that allows users to view trends and search their dataset for specific IP addresses.
Greynoise also has an API, which integrates their dataset with existing security tools. Pricing is a bit steep for a non-enterprise user, but they offer a free 15-day trial for the API.
Enriching Logs with Greynoise in Logstash
As mentioned in the previous blog post Log Analysis Part 2: Using Logstash’s Grok Filter to Parse Docker Nginx Logs, the blakejarvis.com web server logs are being sent through a Logstash pipeline to an Elasticsearch database. The greynoise community maintains an excellent Logstash Greynoise plugin, allowing for log events to be enriched with the Greynoise dataset in a Logstash pipeline.
In less than 5 minutes, Greynoise can begin enriching Logstash data by performing the following steps:
Install the Greynoise Logstash Plugin. When using Docker, add the RUN logstash-plugin line within the Dockerfile.
Edit the logstash.conf Pipeline to Include Greynoise enrichment by adding the below greynoise {}
block within a filter {}
block. To limit unnecessary Greynoise queries, the primary IP address used to publish this blog was excluded from Greynoise queries.
Rebuild the docker container using docker-compose:
Verify Greynoise enrichment is occurring by using Kibana to filter on greynoise fields.
Visualizing Greynoise Data
Dashboards can be a useful tool when proactive search for bad activity in line with the questions asked above. Five metrics are created in Kibana to visualize Greynoise data:
Log Count: Total number of log events seen over the time period.
Greynoise Classification Over Time: Timeline of the count of malicious, unknown, and benign scanning activity.
Greynoise Tags Over Time: Timeline of the count of the kinds of scanners observed. One IP address can be associated with one or more scanner type.
Greynoise Actor: Bar chart of Greynoise actor and classification. Both malicious actors (red) and unknown actors (orange) are classified within the unknown actor category.
Greynoise Tags: Bar chart of the most common tags associated with scanners seen accessing the blog. 4/10 tags are directly associated with HTTP/HTTPS scanners.
Scope: The timeline is set to 30 days and the "AWS Security Scanner" Greynoise actor and primary IP used to write this blog are filtered out.
Filtering the Noise
Validating no Known Bad Activity was Successful: Filtering for greynoise.classification.keyword :"malicious"
and response codes of 2xx
is a quick way to see the successful HTTP requests from bad actors.
There are only 8 requests from maliciously classified actors, all requesting the root directory /
, which return a 200. This is expected behavior. As explained in a previous blog post, this blog is configured to return a response code of 444
for all client activity attempting to access blakejarvis.com via direct IP access (i.e. 3.84.178.124/
), which is common for internet scanners. This Nginx configuration significantly reduces the number of internet scanners making web requests from 399 to 8. The below image is including response codes of 444
, which are attributed to scanning activity.
Filtering on the Unknowns: Filtering for (greynoise.classification.keyword : "unknown") OR (greynoise.seen: false)
and response codes of 2xx
return the events of unknown actors or actors not previously seen by Greynoise who received a successful response code. A thorough review of this dataset finds most of this activity is either legitimate end-user activity or search engine crawlers indexing pages.
Who is Port Scanning?: In a previous blog post, Nginx logs are split into 2 categories: standard HTTP web requests and a non-standard HTTP requests, which do not conform to the HTTP protocol specification. These requests are most likely port scans. This distinction allows for proper log parsing of both types, and a payload
value is created in the dataset for all requests that do not conform to a standard HTTP request. The port scan seen by Nginx (i.e. the application layer) are scans that request some form of application layer data, such as HTTP headers or operating system version information. Stealthy port scans that complete part or all of the TCP three-way handshake will not appear in this dataset. Excluding the AWS scanners, 25 port scans have occurred over the past 15 days targeting the blakejarvis.com site, with 21 actors previously seen by Greynoise, all of which are classified as benign.
Who is Port Scanning?: In a previous blog post, Nginx logs are split into 2 categories: standard HTTP web requests and a non-standard HTTP requests, which do not conform to the HTTP protocol specification. These requests are most likely port scans. This distinction allows for proper log parsing of both types, and a payload
value is created in the dataset for all requests that do not conform to a standard HTTP request. The port scan seen by Nginx (i.e. the application layer) are scans that request some form of application layer data, such as HTTP headers or operating system version information. Stealthy port scans that complete part or all of the TCP three-way handshake will not appear in this dataset. Excluding the AWS scanners, 25 port scans have occurred over the past 15 days targeting the blakejarvis.com site, with 21 actors previously seen by Greynoise, all of which are classified as benign.
Initial Questions: Revisited
Any data enrichment needs to be understood in the context of the application and scope of the data source. First understand what normal application traffic looks like, so deviances from normal can be identified. External enrichment, whether threat intelligence or "anti-threat intelligence", serves to expedite this process by providing additional context.
After Greynoise enrichment, the following questions are revisited:
What activity do I need to pay attention to?
Port scan data followed by HTTP requests with the correct host header of blakejarvis.com. This is more targeted activity compared to scanning by IP address or requesting common URIs.
Hosts not seen by Greynoise or hosts that are seen by greynoise that are malicious making requests that return a response of
2xx
for non-GET requests, or for requests that include the/ghost/
URI.Spikes in activity, as visualized in the "Greynoise Classification Over Time" and "Greynoise Tags Over Time" Kibana visuals. This site saw a significant increase in benign web crawler activity on May 19, 2020. Upon investigation, this spike was the result of this site having been submitted to Palo Alto Networks for domain classification. When classifying, Palo Alto crawls the site to identify which category it best falls in line with.
Benign actors receiving a
2xx
response for weird URIs. This could identify misconfigurations that would appear in internet device search engines.
What activity can be ignored?
Activity targeting technology not used, according to Greynoise tags. For example, in the case of this blog, malicious actors focused on the exploitation of WordPress sites can be ignored. However, since a single IP address can contain multiple Greynoise tags indicating a variety of scanning activity, caution is required when filtering to not exclude IP addresses that are a partial tag match.
Benign web crawlers. Usually a web crawler will identify itself in its User Agent (e.g.
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
), but a pre-classified tag ofgreynoise.tags: Web Scanner
is a quick way of filtering out the noise.Most traffic classified as benign accoring to Greynoise can be ignored. However, as mentioned above, successful response codes sent to benign actors could help identify server misconfigurations.
Was this activity / exploit attempt successful?
Unsurprisingly, there was no successful exploit attempt on this site as a result of a thorough review of the logs with Greynoise enrichment.
The attack surface on this blog is small, and the analysis as a result of log review and Greynoise enrichment reflect this. However, the process of understanding expected application behavior, enriching data with additional context, and filtering noise holds true for larger, more robust applications.
With just a few simple lines of code, the Greynoise dataset can integrate in a Logstash pipeline and immediately begin adding context to noisy logs, giving responders direction of where to focus searches and further analysis.
Last updated