Profiling of persistent SSHD brute force attack

Proper setting up and regular monitoring of logs gives you the avenue to know what's really happening with your box sitting out there in the internets, and to anticipate when bad things are about to happen. One of the warning signs would be that someone has been poking around your box, looking for an (easy?) way in.
The natural thing that would jump out at you then, is that this someone has been accessing your box in far higher volumes/durations, especially on services that should not be accessed by others.
This is one example of such accesses on a linux box: SSHD brute forcing over long periods of time.
Note: This post is more to talk about the process of digging/profiling, rather than the actual setup processes/log sources involved. Feel free to ping me/comment below if you wish to discuss though.
The first thing you may ask is: what is "persistent"? This would be the opposite of the run-of-the-mill opportunistic attackers. These guys tend to bang your machine for a bit, then leave you alone immediately after failing:
This contrasts greatly with the persistent buggers:
After digging around first on the IP and supposed country of origin, we want to find out what did the attacker try to do? One of the logs (*cough*... p0f... *cough*...) feeds info on the ports that were attempted to connect to, this could be a starting point:
Searching for and viewing the port 80 access attempt, by itself and in relation to the other activities shows the following:
Viewing the logs in chronological order (Splunk defaults to reverse chronological) shows that the port 80 connection preceeded the many many many port 22 connections by 2 minutes. What's going on here? If somebody wanted to get at the SSH accounts, why not go for them straight, rather than accessing the web service only once? Checking the web access logs might give the answer we're looking for:
Accessing nothing in that (only) one connection makes this look like a ping of sorts, but we can't be certain.
The next thing is to look at what this somebody was doing over the past two weeks! First we get an idea of the kinds of things that were happening:
What do these SSH scans mean?
Since we already know that this is a brute force attempt, judging by the frequency of the failed SSH handshakes per day we can assume for now that they're just resulting from either the connections being blocked, or just "normal" failures in the midst of thousands of attempts. More can be done to confirm this by zooming into the times where these errors occur, but let's say we're not interested in confirming this fact for now.
Looking at the nature of the attack provides some clues on the tools being used too. For that we extract some stats concerning the tool's attack:

Extracting and counting targeted SSH userids show that 473 userids are attempted in a range from 1 to 21 times each
More stats would be needed depending on the theory you're trying to prove/disprove, but you get the picture.
One of the things I usually would want to see is the list of userids used to brute force. In this case, it looks like a predominantly Japanese/Chinese wordlist/namelist being used. Interesting.
Maybe I should start blogging in other languages to see what kind of brute force wordlists turn up
For now, in any case, 122.166.127.116 (abts-kk-static-116.127.166.122.airtelbroadband.in), I AM WATCHING YOU.











