It’s one of the most exciting moments in a security researcher’s work: while looking through an obscure log file, you see strings like “James1984″ and “SecureMe!” scattered throughout the data. Upon closer inspection, you realize that you’ve uncovered hundreds if not thousands of cleartext username/password pairs!
Even as you celebrate your success, you are also tempted to use your victory to push for additional security reforms, such as a stronger password policy, or publish your results to educate other security professionals. But how, exactly, would you go about conducting and publishing a password analysis without exposing the company to harm, from insider threats or otherwise?
Step 1: Develop a Remediation Plan and Get It Approved
With a “minimize risk to company” hat affixed firmly to your head, the first thing you should be concerned about is removing passwords from the place they were detected, or at least restricting access to that area. Only after you have achieved this goal should you be concerned about creating a report regarding password quality.
Plan to Remove the Password Data
To remove passwords, you may end up altering files or database records, restricting access to them, or destroying them outright. Add whatever you think is the right way to mitigate this risk, and perhaps a second mitigation method, to your plan.
Plan to Stop the Application from Writing New Password Data
The application that caused the problem may still be writing usernames and passwords out to the location you found. Filing a high-priority security defect with the application developers to have this information redacted in logs should be another part of your plan.
Plan for the Time Gap Between Now and Deployment of Fix
You will also have to worry about the time between when you manually remove current password data and the time when your application developers deliver a fix to their application. To address this gap, add a recommendation to turn off logging, retune logging, set up access controls, or automate the remediation steps you performed during the initial cleanup to your plan.
Plan to Analyze the Passwords—Safely and Somewhere Else
Finally, if you are going to perform a password analysis on your findings, you will need a secure copy of the original data and a secure workspace to develop your reports and intermediate artifacts. Add a section to your plan that states you will make an encrypted copy of the original file, perform analysis in a secure environment, publish a report that describes the quality of the passwords in use without revealing any information about any particular users, and destroy your original work. (We will flesh out each of these elements below.)
Also, if you and your company are familiar with “chain of custody” (CoC) procedures regarding sensitive data or security findings, you may also want to build those steps into your plan. (CoC procedures go above and beyond the steps listed here; you would add your own CoC procedures to the steps listed in this article.)
Get Your Plan Approved
Now, with your four-part plan (remove passwords, submit a bug, remediate until fixed, and analyze securely) under your arm, approach the sponsor of your security work and get his or her approval to proceed with all parts.
If You Do Not Have Approval, Do Not Proceed
At this point it is quite likely that your sponsor will allow you to proceed with certain parts of your plan, but not the analysis. If this is the case, you may continue to lobby for inclusion, but please do not proceed down the path of using company data for an unapproved password analysis and absolutely never retain a “personal” copy of the original data for your own purposes.
Step 2: Preparing Your Password Analysis Lab
Make a Strongly Encrypted Copy of the Source Data
If you have permission to proceed, start by securing a copy of the original data in an encrypted file with an appropriate name. OpenPGP is a good choice for encrypting the data until you have your lab ready. Appropriate names are those that link the file to you, your project codename or the date, but don’t reveal the contents. For example, “jsmith_20131227.pgp” and “bluegoat_01.pgp” would be appropriate names, but “everyones_passwords.pgp” or “securityscan_findings.pgp” would not be appropriate.
Once you have a strongly encrypted copy of the original data, you may proceed with your mitigation plan to remove the original copies of the data. (Remember, mitigation should take a higher priority than password analysis.)
Set Up an Encrypted Folder for Your Workspace
Next, set up a folder that uses automatic encryption through the operating system or another piece of software. For example, on Microsoft operating systems, EFS is a good choice. This folder should be completely empty when you begin because you will delete it and all of its contents at the end of your analysis.
Unpack the Source Data into Your Encrypted Folder
Move (i.e., copy and then delete the original) your original encrypted file into your encrypted folder and then unpack its contents there.
Step 3: Strip the Data Down to Usernames and Passwords
Using scripts or an automated text processor, strip your original files down to just username/password combinations. Note that this step can be time-intensive, particularly if you need to obtain programming services from another part of the company.
Increase the Density of Your Password Data
It is easy to use Windows “findstr,” Unix “grep,” or command-line database commands to locate and filter interesting lines that may contain passwords from original data. Performing this initial filter yields “dense” files (i.e., more password data than before) that make the next step more accurate and efficient.
For example, to quickly locate lines in a file that might contain passwords by filtering for the word “pass” you could use the following grep or findstr commands.
Or, you could use the following SQL Server command to pull possible passwords out of a table.
Parse Your Password Data
Once you have some dense password files, variations on the “split()” function will help you parse the data. For example, imagine a web log with entries such as:
First, you would use a split() command to grab the sixth element. Then you would use a second split() command to grab key/value pairs such as “OldPass=FFrr44″, and a third split() command to break each key/value pair into a key (such as “OldPass”) and a value (such as “FFrr44″).
In this instance, your parsing code might look something like this (in C#):
Step 4: Analyze Your Username and Password Data
Now that you have a file that contains nothing but usernames and passwords, you are ready to analyze your data. You can conduct any number of experiments on your data, but producing some basic length, complexity, and predictability (or “guessability”) statistics is a great place to begin.
I recommend using a two-step approach to your analysis. Step one is to go line-by-line through your password file and calculate statistics for each line, writing a new password statistics line to a new CSV file as you go. Step two is to pull your password statistics CSV file into your favorite spreadsheet and run your final analysis against the individual statistics.
Calculating Password Length
To calculate password length, simply read in each password and write out its length. Most programming languages include a “len” or “length” method or property on strings; use it.
Calculating Password Complexity
Regular expressions (“RegEx”) are the right tool to use when checking passwords for complexity. A simple test looking for upper-case letters, lower-case letters, numbers and special characters can be used with calls to a single RegEx-powered function.
For example (in C#):
To calculate if any two items are similar to each other, you will probably need to build a function. However, the time to build such a function is well worth it, since it will allow you to detect similarity in strings like “JohnSmith” and “j.smith45″.
The following function returns “true” if any set of characters iWindow characters long matches between the two phrases. If returns “false” if iWindow is shorter than either of the two phrases or if no match is discovered.
Use this function or one like it to conduct the rest of your statistical calculations.
Calculating Similarity Between Username and Password
To calculate whether a username is similar to a password, simply feed both into your similarity function. For example:
Calculating Similarity Between Password and Current Year
To calculate whether a password is similar to the current year, simply feed the year (or year part) into your similarity function. For example:
Calculating Similarity Between Password and an Initial Password
To calculate whether a password is similar to an initial static password, simply feed the initial password (or common piece of initial password) into your similarity function. For example:
Calculating Similarity Between Password and the Word “Pass”)
To calculate whether a password is similar to the word “password” (or just “pass”), simply feed the phrase “pass” into your similarity function. For example:
You may want to run it again with the shorter phrase “pwd” as well.
Calculating Similarity Between Password and Dictionary Words
Finally—a challenge! Before we can perform this analysis, we need a dictionary full of words to test. There are many dictionaries available for free from the Internet, but many need to be pre-processed to strip out comments and extra columns before we can use them.
If you are running your analysis on Windows, an incredibly useful test of tools to download now are the “GnuWin32″ tools, especially “wget” to download pages from the Internet and “grep” to parse pages downloading from the Internet. These tools can be combined in a short batch file to download and prepare a batch file for our use.
Or, on Linux:
Now, we can open up the resulting dictionary file and check to see if each password contains a dictionary word. Remember to perform your “contains” comparison while ignoring case sensitivity. For better performance, you may also want to read the entire password file into memory first (most computers can spare the room these days) and reuse it for each password entry.
You will probably also want to ignore any dictionary words shorter than three or four letters (I ignore anything shorter than four letters), which you can do in code (e.g., “if DictWord.Length > 3″) or by erasing the top entries from your dictionary if they are arranged from shortest to longest word.
For example (in C#):
Note that simply flagging a password as bad because it contains a dictionary word is not a good idea in all cases. If the password is long enough to contain multiple (>2) dictionary words and a mix of upper-case and lower-case letters, it may still be a strong password. However, if a password only contains a single dictionary word and is as short as it could be, it probably is not a strong password.
Calculating Similarity Between Password and Keyboard Phrases
Unfortunately there do not appear to be readily accessible lists of keyboard phrases on the Internet. (I hope I’m wrong – please let me know otherwise in the comments below.) With that in mind, you may need to write your own list of keyboard phrases for this test. A few examples of the types of sequences that should be in that file are listed below. (Take a look at your keyboard while you’re typing these you don’t understand where these are coming from.)
Once your file is ready, use code similar to that you used to discover password discoverability to compare each password against an entry in the list. Do not worry about multiple uses of keyboard phrases; a single use of any of these common phrases should be enough to flag a password as weak.
Performing Statistical Analysis
Using the spreadsheet of your choice, load up your line-by-line statistics files and use the spreadsheet’s “MIN”, “MAX”, “MODE”, “AVERAGE” and “COUNTIF” functions to calculate:
- Minimum, maximum and average password length
- Most common password length
- % of passwords containing upper-case letters, lower-case letters, numbers and special characters
- Average number of upper-case letters, lower-case letters, numbers and special characters in each password
- % of passwords similar to their usernames
- % of passwords containing this year
- % of passwords containing the phrase “pass”
- % of passwords containing a dictionary word
- % of passwords containing a keyboard phrase
- Optional: % of passwords similar to the initial static password
Step 5: Publish the Report and Destroy Your Lab
Please do not forget to perform this step, and perform it completely.
When you are ready to publish your results, perform a final check on your report to make sure it doesn’t contain any personally identifiable information or specific usernames. When you are ready, move the final copy of your report out of your encrypted folder into a permanent location in your company’s file store.
Then, delete the entire lab folder. Make sure that it has really been deleted. For example, on a Windows operating system, make sure that it is not just sitting in the Recycle Bin.
Step 6: Sharing Your Results with the Security Community
If you want to share your analysis with the greater security community, please be prepared to wait a while, and to release your results “in waves.”
First, you will absolutely want to wait until all remediation is complete. In some cases this will mean publishing the fix to the affected application. In other cases this will require you to wait until end users have been forced to change their passwords after the fix enters production.
Second, you will want to wait until the political ramifications of the breach have shaken out. (This can vary widely by company.)
When (or if) you can safely satisfy both criteria, only then should you approach your sponsor about sharing your analysis with other security experts. Ideally you would tie your release in with a local security event that will bring prestige to your company as an industry leader, rather than in an online forum (which could be seen as a knock on its operations). Make sure specific company and application information is scrubbed from your report, although you may want to retain information such as the size of the user base and the application’s regulatory exposure (e.g., subject to HIPAA, PCI-DSS, SOX, etc.).
Once you have permission to proceed, go ahead and release your findings. Expect that a copy of your findings will be published on the Internet, so plan to use the web site resources of the organization you released the results through to publish your findings. Among other things, publishing through a security web site also gives you cover (e.g., “see, other security experts thought it was okay. too”) in case your sponsor changes his or her mind about publishing your results later.
- dictionary-en.txt is an English dictionary derived from zyzzyva open source project (http://www.gnu.org/licenses/gpl.html).
- keyboardpatterns.txt is a list of common keyboard patterns.
- makepassworddictionary.bat is a batch file to download an English dictionary; depends on GnuWin32′s grep and wget commands.
- PasswordAnalysis.cs is C# sample code to parse an IIS file and analyze passwords.
- README.txt contains additional installation and usage instructions.