One recent leak was especially noteworthy because it involved plaintext passwords. Plaintext passwords are extremely easy to analyze. It did not take long for analysis to be done on the leaked credentials and powerful visual graphs to be released.
Even though there has been a considerable amount of work done on these accounts, most of it doesn't apply directly to me. I (hopefully) don't assess organizations that would ever allow 99% of the passwords found in online dumps. For that reason, I like to weed out the simple passwords and look at the complex ones (<1%) for discernible patterns.
A friend of mine has been talking this concept up for what seems like years. As I have seen his dictionaries consistently crack over 50% of complex NT domain hashes, I have become a believer. We apply these same concepts to all passwords that are found. Passwords found in a random text document in one organization can help you crack hashes in another.
Hashes don't need to be cracked (they can be passed), but there is nothing like including a graph that shows how quickly they were cracked. Also, there is no way to use old hashes to figure out what a new hash will be. Passwords with a discernible pattern equate to months or years of persistent access which is a concept that is easy to explain to most C-level decision-makers.
The first step in this case is to get to the actual passwords. For our purposes, we don't really care about tying the passwords to any other information. The leak is just a bunch of log files:
Disclaimer: I am using PowerShell for the sake of using PowerShell. The cmdlets that I chose to use really aren't suited for handling large data sets and I am by no means an expert. If you plan on querying the raw data more than once, you should probably create a database. If you don't have at least 8Gb of RAM, I would not recommend using the "Sort-Object -Unique" like I do below. It would probably be a good idea to save any other work to be safe.
Lets look at how the log files are structured so that we can trim them:
So it seems pretty safe to target the lines with "password=" in them. We can pipe the contents of the log files through "Where-Object -match":
PS> get-content *.log | where {$_ -match "password="} > where.txt
After a few minutes (hours) we are down to 5,836,719 lines that contain the string "password=" and a potential password. Next we need to cut out the passwords.
PS> get-content ./where.txt | foreach{$_.split('=')[1]} > cut.txt
I am trying to avoid the use of aliases for those that are unfamiliar with PowerShell. I found it easier to learn the actual cmdlet names before using aliases. We are piping the contents of the file into a foreach loop (typically expressed with the "%" alias) using the pipeline ($_) and the split method with '=' as the argument. We want the second field, so we use [1] (array starts at 0). I chose "=" for simplicity's sake, but it will affect the accuracy of the final list.
Now we have a list of all the passwords, but the list contains a large amount of duplicates. It is easy to sort by unique lines with PowerShell using the "Sort-Object" cmdlet with the "-unique" argument, but PowerShell doesn't do any of this voodoo so you will see a major spike in memory usage as it reads all of those objects and methods into memory. If you do decide to do this (you were warned), just exit PowerShell to get your RAM back.
PS> get-content ./cut.txt | sort -unique > unique.txt
At this point, I part ways with Windows and PowerShell (UPDATE: You don't have to!) to use a great tool included with THC-Hydra. Pw-Inspector can be used to trim this list to only the passwords that meet our complexity requirements (thanks Matt). Once we have transferred the files to our Linux distro, we need to convert the files from Unicode (thanks Skip):
# strings -el unique.txt > youpornpasswords.txt
Now we can pipe them into pw-inspector to see what meets a generic complexity rule-set:
# cat youpornpasswords.txt | pw-inspector -m 12 -l 2 -u 2 -p 2 > youporncomplex.txt
If you want to save some time, a similar list is on pastebin.
I don't have time to write about every password trend, but let's look at a few examples and see if we can find anything useful:
!QAZ@WSX1qaz2wsx
This is a typical waterfall keyboard pattern. These patterns are actually quite commonly used by administrators. They are easy to remember and meet the most stringent complexity requirements. What people (who use these patterns) don't realize is that it's quite simple to generate a dictionary which includes most keyboard patterns. There is a perl script which can be piped directly into John to test for this. A friend of mine wrote a similar script to generate nearly every 2x2, 3x3, 4x4, 5x4 keyboard pattern and piped them into several sizable dictionaries. They are surprisingly effective on administrative and service account hashes.
!@#123QWEqweASDasd --This is a 3 character pattern
!@QWASZX12qwaszx --This is a 2 character pattern
!QAZ@WSX1qaz2wsx --This is a 4 character pattern
Lets look at some other patterns that can be used to generate useful dictionaries with simple changes to the layout array:
6yhnji9)IJNHY& --This is a V pattern
A!S@D#F$g5h6j7k8 --This is a variation of the 2 character pattern
Dictionaries based on the word "password" have always been effective. The following two examples would most likely already be included in a substitution, toggle and append dictionary with the word "password":
PA$$word4321
PASSW0rd1234!@#$
A good example of the types of passwords I am looking for is:
Amanda161083** --NameDate**
This password only has one capital letter, but with a last name it would have 2. I added this pattern to my list to generate at some point later.
Another interesting pattern:
PUmpkins2@2@2@ --Dictionary word with first 2 letters UC, 1 character pattern
This type of password can be the most significant when dumping hashes and histories from a large domain. This user "may" utilize "PUmpkins3#3#3#" next. Knowing a user's future password equates to months of possible access to a domain.
There are lots of other interesting patterns, but realize that every string that we pulled out wasn't an actual valid password. Many of the logs were generated with errors, but it would be interesting if people were actually using their email address as their password or "www..youporn.com."
Please comment if you see anything I missed or have any questions.
-Chris