<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>[blog.rayfoo] &#187; log analysis</title>
	<atom:link href="http://blog.rayfoo.info/tag/log-analysis/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.rayfoo.info</link>
	<description>Infosec, DFIR, tech geekery, thoughts and whatnot</description>
	<lastBuildDate>Wed, 25 Jan 2012 00:36:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Definitions</title>
		<link>http://blog.rayfoo.info/2012/01/definitions</link>
		<comments>http://blog.rayfoo.info/2012/01/definitions#comments</comments>
		<pubDate>Tue, 24 Jan 2012 22:47:42 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[DFIR]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[log forensics]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=986</guid>
		<description><![CDATA[Quoting from Anton Chuvakin's slides in his presentation in 2006 at FIRST: Log analysis is (the) trying to make sense of system and network logs. Computer forensics is (the) application of the scientific method to digital media in order to establish factual information for judicial review. So... Log forensics is (the) trying to make sense [...]]]></description>
			<content:encoded><![CDATA[<p>Quoting from Anton Chuvakin's slides in his presentation in 2006 at FIRST:</p>
<blockquote><p><strong>Log analysis</strong> is (the) trying to make sense of system and network logs. </p>
<p><strong>Computer forensics</strong> is (the) application of the scientific method to digital media in order to  establish factual information for judicial review. </p>
<p>So...</p>
<p><strong>Log forensics</strong> is (the) trying to make sense of system and network logs, in order to  establish factual information for judicial review. </p></blockquote>
<p>Makes sense, maybe I've been googling for the wrong keywords all this time! Till of late, I've been looking at this field largely from a data mining viewpoint.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2012/01/definitions/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dynamic conversion of epoch timestamps in logs</title>
		<link>http://blog.rayfoo.info/2011/05/dynamic-conversion-of-epoch-timestamps-in-logs</link>
		<comments>http://blog.rayfoo.info/2011/05/dynamic-conversion-of-epoch-timestamps-in-logs#comments</comments>
		<pubDate>Fri, 27 May 2011 17:37:50 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[CLI]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[security]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=870</guid>
		<description><![CDATA[In the course of your logs or text processing, you may come across certain timestamps in epoch format.  Whilst there's always online resources to assist with the conversion of such timestamps, it may not be the best way if you need to keep the timestamp "secret" during then, or if you have many timestamps to [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-871" title="timestamp" src="http://blog.rayfoo.info/wp-content/uploads/2011/05/timestamp_logo.jpg" alt="" width="250" height="278" />In the course of your logs or text processing, you may come across certain timestamps in <a href="http://en.wikipedia.org/wiki/Unix_time">epoch</a> format.  Whilst there's always <a href="http://www.google.com/search?q=online+convert+epoch+timestamp">online resources</a> to assist with the conversion of such timestamps, it may not be the best way if you need to keep the timestamp "secret" during then, or if you have many timestamps to convert going by the thousands, millions, etc.</p>
<p>Whilst there's always free tools like Splunk which is available for free to the masses (and yes, it does automatically convert epoch timestamps for you), there's always our "humble" awk. <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p><span id="more-870"></span>The linux awk command has the ability to invoke other commands as part of its computation.  The date command can be used to convert epoch times to local times.  Putting both together would allow us to do just what we need here!</p>
<p>First some examples with the date command:</p>
<blockquote>
<pre>$ date -d @1280921130.313
Wed Aug  4 19:25:30 SGT 2010</pre>
</blockquote>
<p>Or should we want to get the dates only:</p>
<blockquote>
<pre>$ date -d @1280921130.313 +%D
08/04/10</pre>
</blockquote>
<p>Now, making use of awk to convert only one epoch timestamp:</p>
<blockquote>
<pre>$ echo -n "1280921130.313" | \
awk '{<span style="color: #00ff00;"><strong>"date -d @"$1" +%D"</strong></span> <span style="color: #ff00ff;"><strong>| getline</strong></span> <span style="color: #00ffff;"><strong>myvariable</strong></span>; print myvariable}'
08/04/10</pre>
</blockquote>
<p>The important part to note is that we must <span style="color: #00ff00;"><strong>enclose the "external" command in quotes</strong></span> (we use the unquoted $1 variable to pass the epoch timestamp from awk), and that we <span style="color: #ff00ff;"><strong>pipe the output of that command to the getline directive</strong></span> in awk.  getline by itself would replace the $0 variable in awk when referencing it subsequently, whereas specifying a <span style="color: #00ffff;"><strong>variable</strong></span> ("myvariable" in this example) would keep the $0 variable as it is, allowing you to use the variable to reference the output of the external command.</p>
<p>Final example showing how logs preprocessing using these commands might look like:</p>
<blockquote>
<pre>$ cat sample.log
1280921130.313 logentry1
1280921131.313 logentry2
1280921132.313 logentry3
1280921133.313 logentry4
</pre>
<pre>$ cat sample.log | \
awk '{"date -d @"$1 | getline myvariable2; print myvariable2 "\t" $0}'
Wed Aug  4 19:25:30 SGT 2010    1280921130.313 logentry1
Wed Aug  4 19:25:31 SGT 2010    1280921131.313 logentry2
Wed Aug  4 19:25:32 SGT 2010    1280921132.313 logentry3
Wed Aug  4 19:25:33 SGT 2010    1280921133.313 logentry4
</pre>
</blockquote>
<p>Have fun!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2011/05/dynamic-conversion-of-epoch-timestamps-in-logs/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Profiling of persistent SSHD brute force attack</title>
		<link>http://blog.rayfoo.info/2011/04/profiling-of-persistent-sshd-brute-force-attack</link>
		<comments>http://blog.rayfoo.info/2011/04/profiling-of-persistent-sshd-brute-force-attack#comments</comments>
		<pubDate>Sun, 03 Apr 2011 19:04:19 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[brute forcing]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[log collection]]></category>
		<category><![CDATA[profiling]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[SSH]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=823</guid>
		<description><![CDATA[Proper setting up and regular monitoring of logs gives you the avenue to know what's really happening with your box sitting out there in the internets, and to anticipate when bad things are about to happen.  One of the warning signs would be that someone has been poking around your box, looking for an (easy?) [...]]]></description>
			<content:encoded><![CDATA[<p><strong><img class="alignright size-full wp-image-824" title="Brute Force" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/BruteForce.jpg" alt="" width="300" height="240" /></strong></p>
<p>Proper setting up and regular monitoring of logs gives you the avenue to know what's really happening with your box sitting out there in the internets, and to anticipate when <em>bad things</em> are about to happen.  One of the warning signs would be that <em>someone</em> has been poking around your box, looking for an (easy?) way in.</p>
<p>The natural thing that would jump out at you then, is that this <em>someone</em> has been accessing your box in far higher volumes/durations, especially on services that should not be accessed by others.</p>
<p>This is one example of such accesses on a linux box: <em>SSHD brute forcing over long periods of time.</em></p>
<p><span id="more-823"></span>Note: This post is more to talk about the process of digging/profiling, rather than the actual setup processes/log sources involved.  Feel free to ping me/comment below if you wish to discuss though.</p>
<p>The first thing you may ask is: what is "persistent"?  This would be the opposite of the run-of-the-mill opportunistic attackers.  These guys tend to bang your machine for a bit, then leave you alone immediately after failing:</p>
<div id="attachment_826" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/01-opportunistic.png"><img class="size-medium wp-image-826" title="Opportunistic" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/01-opportunistic-300x81.png" alt="" width="300" height="81" /></a><p class="wp-caption-text">Opportunistic attack: Tries and gives up.</p></div>
<p>This contrasts greatly with the persistent buggers:</p>
<div id="attachment_827" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/02-persistent.png"><img class="size-medium wp-image-827" title="Persistent" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/02-persistent-300x83.png" alt="Persistent Bugger" width="300" height="83" /></a><p class="wp-caption-text">Whoa!</p></div>
<p>After digging around first on the IP and supposed country of origin, we want to find out what did the attacker try to do?  One of the logs (*cough*... p0f... *cough*...) feeds info on the ports that were attempted to connect to, this could be a starting point:</p>
<p style="text-align: left;">
<div id="attachment_828" class="wp-caption aligncenter" style="width: 508px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/03-ports-accessed.png"><img class="size-full wp-image-828 " title="Ports Accessed" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/03-ports-accessed.png" alt="" width="498" height="272" /></a><p class="wp-caption-text">Mostly port 22 (SSH), only 1 for port 80 (HTTP)?</p></div>
<p style="text-align: left;">Searching for and viewing the port 80 access attempt, by itself and in relation to the other activities shows the following:</p>
<p style="text-align: center;">
<div id="attachment_832" class="wp-caption aligncenter" style="width: 492px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/04-port-80-access.png"><img class="size-full wp-image-832  " title="04-port-80-access" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/04-port-80-access.png" alt="" width="482" height="205" /></a><p class="wp-caption-text">Pinpointing the port 80 connection</p></div>
<p style="text-align: center;">
<div id="attachment_833" class="wp-caption aligncenter" style="width: 491px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/05-confirming-access-profile.png"><img class="size-full wp-image-833  " title="05-confirming-access-profile" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/05-confirming-access-profile.png" alt="" width="481" height="208" /></a><p class="wp-caption-text">Viewing the logs in chronological order (Splunk defaults to reverse chronological)</p></div>
<p style="text-align: left;">Viewing the logs in chronological order (Splunk defaults to reverse chronological) shows that the port 80 connection preceeded the many many many port 22 connections by 2 minutes.  What's going on here?  If <em>somebody</em> wanted to get at the SSH accounts, why not go for them straight, rather than accessing the web service only once?  Checking the web access logs might give the answer we're looking for:</p>
<p style="text-align: center;">
<div id="attachment_834" class="wp-caption aligncenter" style="width: 524px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/06-accessed-http-page.png"><img class="size-full wp-image-834 " title="06-accessed-http-page" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/06-accessed-http-page.png" alt="" width="514" height="157" /></a><p class="wp-caption-text">So in that TCP/80 connection....NOTHING was retrieved</p></div>
<p style="text-align: left;">Accessing <em>nothing </em>in that (only) one connection makes this look like a ping of sorts, but we can't be certain.</p>
<p style="text-align: left;">The next thing is to look at what this <em>somebody</em> was doing over the past two weeks!  First we get an idea of the kinds of things that were happening:</p>
<p style="text-align: center;">
<div id="attachment_835" class="wp-caption aligncenter" style="width: 545px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/07a-sshd-invalid-user.png"><img class="size-full wp-image-835 " title="07a-sshd-invalid-user" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/07a-sshd-invalid-user.png" alt="" width="535" height="301" /></a><p class="wp-caption-text">Mostly &quot;Attempts to login using a non-existent user&quot;, ala our dear Mr Force, Brute Force</p></div>
<div id="attachment_836" class="wp-caption aligncenter" style="width: 523px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/07b-ssh-scan.png"><img class="size-full wp-image-836" title="07b-ssh-scan" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/07b-ssh-scan.png" alt="" width="513" height="205" /></a><p class="wp-caption-text">...and &quot;SSH scan&quot;</p></div>
<p style="text-align: left;">What do these SSH scans mean?</p>
<p style="text-align: center;">
<div id="attachment_837" class="wp-caption aligncenter" style="width: 491px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/08-ssh-scan-no-ident-str-received.png"><img class="size-full wp-image-837  " title="08-ssh-scan-no-ident-str-received" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/08-ssh-scan-no-ident-str-received.png" alt="" width="481" height="217" /></a><p class="wp-caption-text">Just means that the SSH handshake was not properly done/completed.</p></div>
<p style="text-align: left;">Since we already know that this is a brute force attempt, judging by the frequency of the failed SSH handshakes per day we can assume for now that they're just resulting from either the connections being blocked, or just "normal" failures in the midst of thousands of attempts.  More can be done to confirm this by zooming into the times where these errors occur, but let's say we're not interested in confirming this fact for now.</p>
<p style="text-align: left;">Looking at the nature of the attack provides some clues on the tools being used too.  For that we extract some stats concerning the tool's attack:</p>
<p style="text-align: center;">
<div id="attachment_838" class="wp-caption aligncenter" style="width: 548px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/09-targeted-ssh-user-counts.png"><img class="size-full wp-image-838 " title="09-targeted-ssh-user-counts" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/09-targeted-ssh-user-counts.png" alt="" width="538" height="313" /></a><p class="wp-caption-text">Extracting and counting targeted SSH userids show that 473 userids are attempted in a range from 1 to 21 times each</p></div>
<p style="text-align: center;">
<div id="attachment_839" class="wp-caption aligncenter" style="width: 491px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/11-distribution-first-occurrences-targeted-users.png"><img class="size-full wp-image-839  " title="11-distribution-first-occurrences-targeted-users" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/11-distribution-first-occurrences-targeted-users.png" alt="" width="481" height="132" /></a><p class="wp-caption-text">First occurrence of each targeted userid is spread out fairly evenly throughout the time period...</p></div>
<div id="attachment_840" class="wp-caption aligncenter" style="width: 491px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/12-distribution-last-occurrences-targeted-users.png"><img class="size-full wp-image-840  " title="12-distribution-last-occurrences-targeted-users" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/12-distribution-last-occurrences-targeted-users.png" alt="" width="481" height="126" /></a><p class="wp-caption-text">...and last occurrences of each userid being fairly even throughout too.</p></div>
<p style="text-align: left;">More stats would be needed depending on the theory you're trying to prove/disprove, but you get the picture.</p>
<p style="text-align: left;">One of the things I usually would want to see is the list of userids used to brute force.  In this case, it looks like a predominantly Japanese/Chinese wordlist/namelist being used.  Interesting.</p>
<p style="text-align: center;">
<div id="attachment_841" class="wp-caption aligncenter" style="width: 624px"><a href="http://blog.rayfoo.info/wp-content/uploads/2011/04/14-targeted-usernames.png"><img class="size-full wp-image-841 " title="14-targeted-usernames" src="http://blog.rayfoo.info/wp-content/uploads/2011/04/14-targeted-usernames.png" alt="" width="614" height="360" /></a><p class="wp-caption-text">Am I Japanese?  Am I Chinese? <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p></div>
<p>Maybe I should start blogging in other languages to see what kind of brute force wordlists turn up <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
<p>For now, in any case, <span style="color: #ff0000;"><strong>122.166.127.116 (abts-kk-static-116.127.166.122.airtelbroadband.in), I AM WATCHING YOU</strong></span>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2011/04/profiling-of-persistent-sshd-brute-force-attack/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Splunk 4.2</title>
		<link>http://blog.rayfoo.info/2011/03/splunk-4-2</link>
		<comments>http://blog.rayfoo.info/2011/03/splunk-4-2#comments</comments>
		<pubDate>Mon, 21 Mar 2011 11:14:07 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[Splunk]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/2011/03/splunk-4-2</guid>
		<description><![CDATA[The next version of Splunk is out! Amongst the new features that Splunk's advertising, a quick glance through the new version reveals that the revamped management interface might seem to make administering it/clusters easier. Also that the search and reporting features seem to have been beefed up too! More to come after I poke around [...]]]></description>
			<content:encoded><![CDATA[<p>The next version of Splunk is out!</p>
<p>Amongst the new features that Splunk's advertising, a quick glance through the new version reveals that the revamped management interface might seem to make administering it/clusters easier. Also that the search and reporting features seem to have been beefed up too!</p>
<p>More to come after I poke around some more, and if I have the time to write something <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2011/03/splunk-4-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Profiling client internet connections</title>
		<link>http://blog.rayfoo.info/2010/07/profiling-client-internet-connections</link>
		<comments>http://blog.rayfoo.info/2010/07/profiling-client-internet-connections#comments</comments>
		<pubDate>Thu, 08 Jul 2010 10:20:57 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[information gathering]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[p0f]]></category>
		<category><![CDATA[Splunk]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=628</guid>
		<description><![CDATA[Some more fun with p0f and Splunk...Now with profiling of client internet connections! Setup of the p0f and logging is the same as in the OS Profiling post. The Splunk search string has been extended to extract the source's internet link as a field too (go for the portion in bold for the field extracting [...]]]></description>
			<content:encoded><![CDATA[<p>Some more fun with p0f and Splunk...Now with profiling of client internet connections!</p>
<p>Setup of the p0f and logging is the same as in the <a href="http://blog.rayfoo.info/2010/07/os-profiling">OS Profiling</a> post.</p>
<p>The Splunk search string has been extended to extract the source's internet link as a field too (go for the portion in <strong>bold</strong> for the field extracting commands):</p>
<p><span style="color: #339966;">| file /home/path/to/p0f.log | <strong>rex field=_raw "&gt; (?&lt;srcip&gt;[^:]+):(?&lt;srcport&gt;[^ ]+) – (?&lt;srcos&gt;.+?) \(" | rex field=_raw "-&gt; (?&lt;dstip&gt;[^:]+):(?&lt;dstport&gt;[^ ]+) " | rex field=_raw "link: (?&lt;srclink&gt;.*)\)$"</strong> |  regex srclink!="(unspecified|unknown)" | top limit=0 srclink</span></p>
<p>The fields that I extract with this:</p>
<ul>
<li>srcip -&gt; source IP</li>
<li>srcport -&gt; source TCP port</li>
<li>srcos -&gt; source's OS (woot!)</li>
<li>dstip -&gt; destination IP (which is my machine's)</li>
<li>dstport -&gt; the destination port which the TCP connection was initiated to</li>
<li>srclink -&gt; the internet link of the source machine</li>
</ul>
<p>After filtering out the "unspecified" and "unknown" links, the list of the detected links are as follows:</p>
<p style="text-align: center;"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionlink.png"><img class="size-full wp-image-629 aligncenter" title="p0fsplunk-connectionlink" src="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionlink.png" alt="" width="600" height="310" /></a></p>
<p style="text-align: left;">"ethernet/modem" points to mostly cable connections.  There're some interesting entries in the list though, like <a href="http://en.wikipedia.org/wiki/VTun">vtun</a>, <a href="http://en.wikipedia.org/wiki/Point-to-Point_Protocol_over_Ethernet">pppoe</a>, Google/AOL, <a href="http://en.wikipedia.org/wiki/IP_tunnel">IPv6</a>/<a href="http://www.linuxfoundation.org/collaborate/workgroups/networking/tunneling">IPIP</a> (early adopters? haha).  Don't have any idea on what's IPSec/GRE, or vLAN here in this context though.</p>
<p style="text-align: left;">Just for the heck of it, here's the chart for this table, generated from the reports link in Splunk.</p>
<p style="text-align: left;"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionchart.png"><img class="aligncenter size-full wp-image-630" title="p0fsplunk-connectionchart" src="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionchart.png" alt="" width="600" height="377" /></a></p>
<p style="text-align: left;">I like the charts, because they allow some interaction with the charts for simple datasets, but I digress <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
<p style="text-align: center;"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionchartmouseover.png"><img class="aligncenter size-full wp-image-631" title="p0fsplunk-connectionchartmouseover" src="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk-connectionchartmouseover.png" alt="" width="600" height="369" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/07/profiling-client-internet-connections/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OS Profiling</title>
		<link>http://blog.rayfoo.info/2010/07/os-profiling</link>
		<comments>http://blog.rayfoo.info/2010/07/os-profiling#comments</comments>
		<pubDate>Tue, 06 Jul 2010 16:00:24 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[information gathering]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[p0f]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[tee]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=605</guid>
		<description><![CDATA[Trying out p0f along with Splunk.. p0f allows you to determine the OS of the remote machine based on the TCP fields characteristics.  It can also tell whether the machine is behind a firewall, what kind of internet connection it is running from...pretty useful for information junkies like me Here's what I did: ./p0f -t [...]]]></description>
			<content:encoded><![CDATA[<p>Trying out <a href="http://lcamtuf.coredump.cx/p0f.shtml">p0f</a> along with <a href="http://www.splunk.com/download">Splunk</a>..</p>
<p>p0f allows you to determine the OS of the remote machine based on the TCP fields characteristics.  It can also tell whether the machine is behind a firewall, what kind of internet connection it is running from...pretty useful for information junkies like me <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>Here's what I did:</p>
<p><span style="color: #339966;">./p0f -t -u MyUseridHere -i eth0 'src not MyIPAddressHere' | tee -a p0f.log</span></p>
<p>Runs p0f, logging with actual timestamps (-t), chroot and setuid to MyUserIdHere (-u), listening on eth0 (-i), and filtering out packets for connections initiated from my machine itself (since I'm not interested in profiling my own machine).</p>
<p><a href="http://en.wikipedia.org/wiki/Tee_(command)">tee</a> is a (really nifty!) linux command.  What it does is to "split" the input (stdin) to two parts: stdout and the file specified.  The -a option tells it to append to the file instead of overwriting it.</p>
<p>Using this, p0f outputs logs like this one:</p>
<p><span style="color: #339966;">&lt;Sat Jul  3 07:03:56 2010&gt; 175.40.12.47:1095 - Windows 2000 SP2+, XP SP1+ (seldom 98)<br />
-&gt; 74.207.229.183:80 (distance 12, link: sometimes DSL (2))</span></p>
<p>One of the Splunk queries that I poked around with:</p>
<p><span style="color: #339966;">| file /path/to/p0f.log | rex field=_raw "&gt; (?&lt;srcip&gt;[^:]+):(?&lt;srcport&gt;[^ ]+) - (?&lt;srcos&gt;.+?) \(" | rex field=_raw "-&gt; (?&lt;dstip&gt;[^:]+):(?&lt;dstport&gt;[^ ]+) " | regex srcos!="UNKNOWN" | top limit=0 srcos</span></p>
<p>This query extracts out the source and destination IP and port, and the source OS.  Then after filtering out the OS tagged with UNKNOWN, the remaining entries are ranked...</p>
<p>The resulting chart, of not much real interest by itself, just shows that other than that the connections are predominantly from linux machines (hurhur), and there's a connection from a really old Netware machine (<a href="http://en.wikipedia.org/wiki/Novell_NetWare#NetWare_5.x">5 was released in Oct 1998!</a>).</p>
<p style="text-align: center;"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk.png"><img class="aligncenter size-full wp-image-606" title="p0fsplunk" src="http://blog.rayfoo.info/wp-content/uploads/2010/07/p0fsplunk.png" alt="" width="480" height="250" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/07/os-profiling/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing sshd brute-force attempts (part 2)</title>
		<link>http://blog.rayfoo.info/2010/06/visualizing-sshd-brute-force-attempts-part-2</link>
		<comments>http://blog.rayfoo.info/2010/06/visualizing-sshd-brute-force-attempts-part-2#comments</comments>
		<pubDate>Wed, 02 Jun 2010 16:42:57 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[afterglow]]></category>
		<category><![CDATA[brute forcing]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[graphviz]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[sed]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[SSH]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=581</guid>
		<description><![CDATA[It's always better to Read The Fine Manual (or run perl afterglow.pl -h for the more updated helpfile)...though it's not really that well documented  Afterglow allows for two column inputs, rather than us having to do weird tricks to make them 3-column. (Note to self: get the raw data with fields in the order that [...]]]></description>
			<content:encoded><![CDATA[<p>It's always better to Read The Fine <a href="http://afterglow.sourceforge.net/manual.html#6">Manual</a> (or run <span style="color: #339966;">perl afterglow.pl -h</span> for the more updated helpfile)...though it's not really <em>that</em> well documented <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' />   Afterglow allows for two column inputs, rather than us having to do weird tricks to make them 3-column.</p>
<p>(Note to self: get the raw data with fields in the order that you want where possible/faster, rather than pumping it through <span style="color: #339966;">sed</span>.  Makes for good practice though.)</p>
<p>Using the csv file containing userids (visualized in yellow) and IPs (visualized in green) over the past few months from Splunk, here're the results of some of the experiments.</p>
<p>Oh, for the Windows users, you can use <span style="color: #339966;">type</span> instead of <span style="color: #339966;">cat</span> <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p>First test using <a href="http://www.graphviz.org/About.php">GraphViz's</a> neato to layout:</p>
<p style="text-align: center;"><span style="color: #339966;">perl afterglow.pl -b 1 -i &lt;infile&gt; -c color.properties -t | neato -Tgif -o output.gif</span></p>
<div class="wp-caption aligncenter" style="width: 410px"><a href="http://lh4.ggpht.com/_evPUEWAwFrY/TAaB9H39-rI/AAAAAAAAI_E/bjhxhWE5vUc/test-neato.png"><img class="    " title="test afterglow neato" src="http://lh4.ggpht.com/_evPUEWAwFrY/TAaB9H39-rI/AAAAAAAAI_E/bjhxhWE5vUc/s400/test-neato.png" alt="" width="400" height="356" /></a><p class="wp-caption-text">Huge, but better visualized with -e 5 option (Resulting image for that is too huge to upload though <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> ).  Note the single IP in the middle (the yellow explosion) that had been trying a LOT of userids to date.</p></div>
<p>Second test using fdp:</p>
<p style="text-align: center;"><span style="color: #339966;">perl afterglow.pl -b 1 -i &lt;infile&gt; -c color.properties -t | fdp -Tgif -o output.gif</span></p>
<div class="wp-caption aligncenter" style="width: 226px"><a href="http://lh6.ggpht.com/_evPUEWAwFrY/TAaCCQCGs8I/AAAAAAAAI_I/Sogy7NxglyE/test-fdp.png"><img title="test afterglow fdp" src="http://lh6.ggpht.com/_evPUEWAwFrY/TAaCCQCGs8I/AAAAAAAAI_I/Sogy7NxglyE/s400/test-fdp.png" alt="" width="216" height="400" /></a><p class="wp-caption-text">fdp doesn&#39;t seem to be well suited for this</p></div>
<p>Third test using sfdp:</p>
<p>No command here, you should have noticed the pattern from the first two...</p>
<div class="wp-caption aligncenter" style="width: 410px"><a href="http://lh5.ggpht.com/_evPUEWAwFrY/TAaCESgte6I/AAAAAAAAI_M/Z-jVk3Xf3AE/test-sfdp.png"><img title="test afterglow sfdp" src="http://lh5.ggpht.com/_evPUEWAwFrY/TAaCESgte6I/AAAAAAAAI_M/Z-jVk3Xf3AE/s400/test-sfdp.png" alt="" width="400" height="394" /></a><p class="wp-caption-text">_even_ less suited for this type of data...</p></div>
<p>Last test using twopi:</p>
<p>According to the <a href="http://www.graphviz.org/About.php">GraphViz</a> site, twopi's more suited for visualizing stuff like telecommunications flows.</p>
<div class="wp-caption aligncenter" style="width: 386px"><a href="http://lh4.ggpht.com/_evPUEWAwFrY/TAaCFUsQLcI/AAAAAAAAI_Q/9Y9wHwDpzrI/test-twopi.png"><img title="test afterglow twopi" src="http://lh4.ggpht.com/_evPUEWAwFrY/TAaCFUsQLcI/AAAAAAAAI_Q/9Y9wHwDpzrI/s400/test-twopi.png" alt="" width="376" height="400" /></a><p class="wp-caption-text">twopi</p></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/06/visualizing-sshd-brute-force-attempts-part-2/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting additional (IP/network/location) info along with your Splunk searches</title>
		<link>http://blog.rayfoo.info/2010/04/getting-additional-ipnetworklocation-info-along-with-your-splunk-searches</link>
		<comments>http://blog.rayfoo.info/2010/04/getting-additional-ipnetworklocation-info-along-with-your-splunk-searches#comments</comments>
		<pubDate>Mon, 19 Apr 2010 17:57:07 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[commands]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[HOWTO]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[network]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=529</guid>
		<description><![CDATA[Chanced upon some of the info by accident (smack at the bottom of one part of the Splunk documentation...), but I can't find it now.  Going to share here anyway Some (or probably most/all) of your searches might involve public IP addresses, and more often than not we would want to have additional info along [...]]]></description>
			<content:encoded><![CDATA[<p>Chanced upon some of the info by accident (smack at the bottom of one part of the <a href="http://www.splunk.com/">Splunk</a> <a href="http://www.splunk.com/base/Documentation">documentation</a>...), but I can't find it now.  Going to share here anyway <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>Some (or probably most/all) of your searches might involve public IP addresses, and more often than not we would want to have additional info along with the IP address to work with.</p>
<p>Three of the things that we could do in Splunk automatically would be to get IP-location info, or to reverse lookup an IP to a domain, or to lookup a domain to an IP.</p>
<p><span id="more-529"></span></p>
<h1>1. Geolocation</h1>
<p>There're two ways to do geolocating of IPs: using the iplocation command, or to use the MAXMIND app.</p>
<h2>1a. iplocation</h2>
<p>The command iplocation is described as:</p>
<blockquote><p>Finds ips in _raw and looks up the IP location using the hostip.info database. IPs are extracted as ip1, ip2, etc. Cities and Countries are likewise extracted.</p></blockquote>
<p>What we only need to do is to pipe the search to iplocation and let it do the rest!  The lookups are done from the server on the fly, so make sure that the server is able to do whois/ns lookups on the network.</p>
<p style="text-align: center;"><span style="color: #339966;">index=myindex | iplocation</span></p>
<p><a href="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-iplocation.png"><img class="aligncenter size-medium wp-image-530" title="splunk iplocation" src="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-iplocation-300x138.png" alt="" width="300" height="138" /></a></p>
<h2>1b. MAXMIND app</h2>
<p>Like previously mentioned before: install the <a href="http://www.splunkbase.com/apps/All/4.x/Add-On/app:Geo+Location+Lookup+Script">MAXMIND app</a>, then pipe the field containing IPs to the lookup (the field name <em>must</em> be clientip, if not this will not work duh)</p>
<p>This can work with the server not having any internet connectivity, but the accuracy is entirely dependant on the cached MAXMIND database.</p>
<p style="text-align: center;"><span style="color: #339966;">index=myindex | lookup geoip clientip</span></p>
<p style="text-align: center;">or</p>
<p style="text-align: center;"><span style="color: #339966;">index=myindex2 | lookup geoip clientip as fieldwithip</span></p>
<p><a href="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-geoiplookup.png"><img class="aligncenter size-medium wp-image-531" title="splunk geoiplookup" src="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-geoiplookup-300x137.png" alt="" width="300" height="137" /></a></p>
<h2>2, 3. IP-hostname or hostname-IP</h2>
<p>These two items are pretty similar.  Spunk 4 comes with a lookup script called external_lookup.py, and the config is already in the default transforms.conf.  So we only need to use it!</p>
<p style="text-align: center;">Resolving IPs to hostnames:</p>
<p style="text-align: center;"><span style="color: #339966;">index=myindex | lookup dnslookup clientip</span></p>
<p><a href="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-ip-to-hostname.png"><img class="aligncenter size-medium wp-image-532" title="splunk ip to hostname" src="http://blog.rayfoo.info/wp-content/uploads/2010/04/splunk-ip-to-hostname-300x136.png" alt="" width="300" height="136" /></a></p>
<p style="text-align: center;">Resolving hostnames to IPs:</p>
<p style="text-align: center;"><span style="color: #339966;">index=myindex | lookup dnslookup clienthost</span></p>
<p style="text-align: center;">(no screenshot, sorry <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> )</p>
<p style="text-align: center;"><span style="font-family: Consolas, Monaco, 'Courier New', Courier, monospace; color: #339966;"><span style="line-height: 18px; white-space: pre; font-size: small;"><br />
</span></span></p>
<p>Leave a comment if this helped, or if you want to ask anything!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/04/getting-additional-ipnetworklocation-info-along-with-your-splunk-searches/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fun with Splunk: SSHD</title>
		<link>http://blog.rayfoo.info/2010/03/fun-with-splunk-sshd</link>
		<comments>http://blog.rayfoo.info/2010/03/fun-with-splunk-sshd#comments</comments>
		<pubDate>Sat, 13 Mar 2010 11:02:20 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[brute forcing]]></category>
		<category><![CDATA[geolocation]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[MaxMind]]></category>
		<category><![CDATA[security]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[SSH]]></category>
		<category><![CDATA[tutorials]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=489</guid>
		<description><![CDATA[Thought I'd share a bit on the tip of the iceberg, on what can be done with Splunk.  Linux command line tools are still much needed for raw log analysis (since we can't have the luxury of having a Splunk installation around and ready whenever we need it), but if setup and running properly, Splunk [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-498" title="splunk search" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-search1.png" alt="" width="164" height="38" />Thought I'd share a bit on the tip of the iceberg, on what can be done with Splunk.  Linux command line tools are still much needed for raw log analysis (since we can't have the luxury of having a Splunk installation around and ready whenever we need it), but if setup and running properly, Splunk can be pretty helpful (and not to mention faster) for some things.</p>
<p>(This post is pretty unpolished, partly because I can't be bothered to fiddle around with fitting the search strings into the width of the post, etc.  Nonetheless,  comments/discussions are always welcome heh)</p>
<p>One of my favourite tasks with log analysis is to get information on those people/bots which are brute forcing SSHD, so let's start with SSH attacks as an example.<span id="more-489"></span></p>
<h2>Prerequisites</h2>
<p>Before we start off, we'll need Splunk setup to be monitoring the appropriate logfiles.  I configured and run the <a href="http://www.splunkbase.com/apps/All/4.x/app:Splunk+for+OSSEC+(Splunk+v4+version)">OSSEC</a> and <a href="http://www.splunkbase.com/apps/All/4.x/app:Splunk+for+Unix+and+Linux">Linux</a> apps for Splunk, so that the data inputs are taken care of for me.  If you don't want to run these apps, just make sure you index the /var/log and OSSEC alert logs locations.  If you want to do the geolocation stuff the the <a href="http://www.splunkbase.com/apps/All/4.x/Add-On/app:Geo+Location+Lookup+Script">MaxMind</a> app for Splunk would be needed too.</p>
<h2>List of SSH attacks</h2>
<p>Let's start off with a simple query to see the list of previous SSH attacks:</p>
<pre style="text-align: center;"><span style="color: #00ff00;">source=*auth* sshd invalid user from</span></pre>
<p>Using this search string with the needed time range set shows a pretty graph of how many attacks we've got over time, along with the list of log entries for the attack.</p>
<div id="attachment_499" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-1.png"><img class="size-medium wp-image-499" title="splunk listing of sshd attacks" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-1-300x149.png" alt="" width="300" height="149" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>Seems that the attacks everyday are few, probably due to OSSEC's active responses.  A quick search would confirm that OSSEC is blocking the offending hosts.</p>
<pre style="text-align: center;"><span style="color: #00ff00;">sourcetype="ossec_alerts" </span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;">action="SSHD brute force trying to get access to the system."</span></pre>
<div id="attachment_507" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-2.png"><img class="size-medium wp-image-507" title="splunk ossec active responses" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-2-300x146.png" alt="" width="300" height="146" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<h2>Drilling Down</h2>
<p>Now we know that the attacks were especially active on the 22nd Feb, and OSSEC was responding correctly by blocking them off.  Why the large numbers then?  Was it because the attacks were from different IP addresses, or that that IP address was particularly persistent that day?  We could find out by getting more information on the src_ips for the time range in question.  First we click on the bar for the 22nd Feb, then the src_ip field in the sidebar.</p>
<div id="attachment_509" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-31.png"><img class="size-medium wp-image-509" title="splunk ssh ossec src ips" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-31-300x187.png" alt="" width="300" height="187" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>With the time range fixed onto what we're interested in looking at, and the src_ip field showing the unique source IPs that were blocked, the results show that it was most likely a persistent attack by these two IPs.  A quick check with the auth logs tell the same story:</p>
<div id="attachment_510" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-4.png"><img class="size-medium wp-image-510" title="splunk sshd brute force src ips" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-4-300x236.png" alt="" width="300" height="236" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<h2>GeoIP Lookups</h2>
<p>Now that we know which two IPs were actively poking around, let's map them to a location.  The MaxMind app for Splunk helps nicely for this task.</p>
<pre style="text-align: center;"><span style="color: #00ff00;">source=*auth* sshd invalid user from | </span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;">lookup geoip clientip as src_ip</span></pre>
<div id="attachment_511" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-5.png"><img class="size-medium wp-image-511" title="splunk srcip geoiplookup" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-5-300x210.png" alt="" width="300" height="210" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>The app and local geoip database does the lookups for us nicely, mapping to geolocation information like country, city, latitude, longtidue and region.  The country information is available for most/all at least, the rest would be put in if available it seems.</p>
<h2>List/Count of attacked userids for SSH</h2>
<p>The strings for searching for this depends on your SSHD config, but for me searching for the invalid users is enough.</p>
<pre style="text-align: center;"><span style="color: #00ff00;">source=*auth* sshd invalid user from | </span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;">rex field=_raw "Invalid user (?&lt;atk_user_id&gt;\S+) from "</span></pre>
<p>Searching/sorting by the atk_user_id field would show us the attacked userids.  Click on the "Events Table" button to show the table of results with only the fields that you've selected.</p>
<div id="attachment_512" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-6.png"><img class="size-medium wp-image-512" title="splunk searching for attacked sshd userids" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-6-300x153.png" alt="" width="300" height="153" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>If we want a sorted list of the top attacked userids, pipe the search string to a top command.</p>
<pre style="text-align: center;"><span style="color: #00ff00;">source=*auth* sshd invalid user from | rex field=_raw </span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;">"Invalid user (?&lt;atk_user_id&gt;\S+) from "</span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;"> | top atk_user_id limit=1000</span></pre>
<p>The Results Table should show automatically for this search.</p>
<div id="attachment_513" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-7.png"><img class="size-medium wp-image-513" title="slunk sshd userids brute forced" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-7-300x241.png" alt="" width="300" height="241" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>Maybe we'd like an alphabetical list instead, so we just pipe the search to a sort command:</p>
<pre style="text-align: center;"><span style="color: #00ff00;">source=*auth* sshd invalid user from | rex field=_raw </span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;">"Invalid user (?&lt;atk_user_id&gt;\S+) from "</span></pre>
<pre style="text-align: center;"><span style="color: #00ff00;"> | top atk_user_id limit=1000 | sort atk_user_id</span></pre>
<div id="attachment_518" class="wp-caption aligncenter" style="width: 310px"><a href="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-8.png"><img class="size-medium wp-image-518" title="splunk sshd userids alphabetical sort" src="http://blog.rayfoo.info/wp-content/uploads/2010/03/splunk-sshd-8-300x221.png" alt="" width="300" height="221" /></a><p class="wp-caption-text">Click to enlarge</p></div>
<p>Alright, that's all for now <img src='http://blog.rayfoo.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/03/fun-with-splunk-sshd/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Troubleshooting Splunk</title>
		<link>http://blog.rayfoo.info/2010/03/troubleshooting-splunk</link>
		<comments>http://blog.rayfoo.info/2010/03/troubleshooting-splunk#comments</comments>
		<pubDate>Mon, 08 Mar 2010 14:27:51 +0000</pubDate>
		<dc:creator>ray</dc:creator>
				<category><![CDATA[Everything]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[log analysis]]></category>
		<category><![CDATA[log collection]]></category>
		<category><![CDATA[logs]]></category>
		<category><![CDATA[Splunk]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://blog.rayfoo.info/?p=478</guid>
		<description><![CDATA[Have been fiddling around with Splunk lately.  Splunk's a really good tool to use for log collection and analysis (and that's oversimplifying it, I believe it can even do event correlation...), which really made my love for data mining go crazy of late:P  Best part is that it has a perpetual free license, nice! One [...]]]></description>
			<content:encoded><![CDATA[<p>Have been fiddling around with <a href="http://www.splunk.com/">Splunk</a> lately.  Splunk's a really good tool to use for log collection and analysis (and that's oversimplifying it, I believe it can even do event correlation...), which really made my love for data mining go crazy of late:P  Best part is that it has a perpetual free license, nice!</p>
<p>One of the things I encountered when using Splunk was that it didn't seem to be indexing all the log files that it was set to monitor.  After some reading up and experimenting the reason became clear: Splunk will not work properly if you set it to monitor too many files.</p>
<p>How many is too many?  For example, setting it to monitor a logfile directory which only has one active log and 100+++ rotated logs, is too many.  What should be done instead is to set it to monitor the active logfile only, and use oneshot adding of the other logfiles to the index you want.</p>
<p>Gonna do some more sharing/writeups about this crazily great tool.  There's really a lot that this thing can do man.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.rayfoo.info/2010/03/troubleshooting-splunk/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

