cs50 week 2: cracking crack

8 minute read

Almost a month after finishing caesar and vigenere from CS50 pset2, I thought I’d give the final problem set crack a, well, crack.

Crack is a password cracker that takes a hash as its argument. Your program then brute forces its way through until it finds a password that, when encrypted, matches the argument.

As well as another interesting problem to solve, this week introduced us to important computer science topics of hashing, recursion and memory management.

In the brief, Zamyla throws in a a few things you can do to jazz it up a bit:

“Recall that sometimes, people use passwords that are actual words. Perhaps there’s an optimization that can be employed?”

“Recall that shorter passwords are usually easier to crack than longer ones.”

So she’s looking for a dictionary and brute force attack that starts with shorter words/character combinations. Challenge accepted!

Here are the five steps I broke crack down in to:

  1. Validate input. Make sure user’s input is a hashed password.

  2. Create dictionary. Import dictionary.txt to an array, remove words longer than 5 characters and then sort by length from shortest word to longest word.

  3. Create salt. Create an array and use loop and ASCII character numbers to fill it with values [a-z], [A-Z] and [0-9].

  4. Dictionary attack. Iterate through the dictionary for each possible salt, using crypt() to create a hash and strcmp() to compare with the program’s argument. If generated hash matches program’s argument, print the word.

  5. Brute force. If dictionary attack doesn’t work, iterate through all possible alphanumeric character combinations up to 5 characters, again using crypt() and strcmp() to generate a hash and compare with program’s argument.

With the caveat that I still don’t have a fully working program, I’ll talk about what I did for each step now.

1. Validate input

This is easy. Code below:

 if (argc != 2)
    {
        printf("Usage: crack hash\n");
        return 1;
    }
    string hash1 = argv[1];

2. Create dictionary

This was good fun. I found a collection of English words on Github courtesy of Dwyl (Do What You Love, incidentally looks like a good learning resource.)

I then learned how to use fopen() and fgets() to pull the words in to an array. This taught me a lot about strings vs char *s (same thing, who knew?!), pointers and some basic memory management in C (memalloc).

CS50 gives you a good intro to the different sorting algorithms (merge sort, bubble sort etc.). I spent a few hours trying to roll my own merge sort (the most efficient with a worst case sorting time of n log n), but I just couldn’t get string sorting to work. After an evening spent on the problem I settled for using qsort from stdlib.h. Complete cop out, but it got the job done. I know sorting is so important to computer science theory - when you search on Google you’re sorting by relevance - so I will revisit this one. Some concepts I feel comfortable waving a hand at but this is one of those knowledge challenges I really want to be able to fully understand and implement myself.

Here’s the code:

#include <cs50.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int compar_strlen (const void * a, const void * b);

int compar_strlen (const void * a, const void * b)
{
    char *s1 = *((char**)a), *s2 = *((char**)b);
    if (strlen(s1) < strlen(s2)) return -1;
    if (strlen(s1) == strlen(s2)) return 0;
    else return 1;
}

int main(int argc, string argv[])
{
    // Open words_alpha.txt
    FILE *fp;
    fp = fopen("words_alpha.txt", "r");

    // Initialize variables for performing file operations
    char *dictionary[350000];
    char buffer[100];

    // Iterate through words_alpha.txt, storing those that are less than 6 chars in dictionary
    int i = 0;
    while (i < 350000 && fgets(buffer, 100, fp) != NULL)
    {
        if  (strlen(buffer) <= 6)
        {
            dictionary[i] = malloc(6);
            strncpy(dictionary[i], buffer, 6);
            i++;
        }
    }
    fclose(fp);

    // Sort dictionary by strlen() so that shortest words are first
    int dictionary_size = 0;
    while (dictionary[dictionary_size] != NULL)
    {
        dictionary_size++;
    }

    for (i = 0; i < 29; i++)
    {
        printf("Word #%i in dictionary is %s.\n", i, dictionary[i]);

    }
    printf("dictionary size = %i\n", dictionary_size);
    printf("Word #349999 is %s.\n", dictionary[200]);

    qsort(dictionary, dictionary_size, sizeof(dictionary[0]), compar_strlen);

    fp = fopen("words_crack.txt", "w+");

    for (i = 0; i < dictionary_size; i++)
    {
        printf("Word #%i in dictionary is %s", i, dictionary[i]);
        string temp = dictionary[i];
        fputs(temp, fp);
    }

    fclose(fp);
    return(0);
}

This was actually in a separate .c file I called create_dict.c. I didn’t want to create a new dictionary every time a user ran my program, so I created the dictionary words_crack.txt one time.

3. Create salt

I used a loop to do this:

char all_salts[62];
    char this_salt[3]; // String to store salt
    this_salt[2] = '\0';
    string hash2; // String to store comparator hash

    // Fill salts[] with A-Z, a-z and 0-9.
    for (int i = 0; i < 61; i++)
    {
        // a - z
        while (i >= 0 && i < 26)
        {
            all_salts[i] = i + 97;
            i++;
        }

        // A - Z
        while (i >= 26 && i < 52)
        {
            all_salts[i] = i + 39;
            i++;
        }

        // 0 - 9
        while (i <= 61)
        {
            all_salts[i] = i - 4;
            i++;
        }
    }

Looking back, I could have just made the string myself:

string all_salts = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

This would have had the same effect with one line of code! I found this to be one of those trade-offs in writing code: when is it right to do the ‘dumb’ thing and type out a load of alphanumeric characters quickly? When to do the same with a for loop? I guess with experience these judgement calls will become easier, or perhaps the magnitude of the problem will just increase as you consider bigger and bigger questions. Hm.

4. Dictionary attack

Now for the fun bit! This code, I am proud to say, works:

    // Open words_crack.txt
    FILE *fp;
    fp = fopen("words_crack.txt", "r");

    // Calc number of lines in words_crack.txt to ease array initialization
    int dict_length = 0;
    char c;
    for (c = getc(fp); c != EOF; c = getc(fp))
    {
        if (c == '\n') // Increment count if this character is newline
        dict_length++;
    }
    fclose(fp);

    // Put words from words_crack.txt in to array dict[]
    fp = fopen("words_crack.txt", "r");
    char dict[dict_length+1][7];
    char buffer[7];
    int i = 0;
    while ((fgets(buffer, 7, fp) != NULL && i < dict_length+1))
    {
        buffer[strcspn(buffer, "\n")] = 0; // Removes null characters after \n
        strcpy(dict[i], buffer);
        i++;
    }
    fclose(fp);

    // Loop to generate salts ("00" .. "01" .. "02" etc.)
    for (i = 0; i < 61; i++)
    {
        this_salt[0] = all_salts[i];
        for (int j = 0; j < 61; j++)
        {
        this_salt[1] = all_salts[j];

            // Loop to run through dict[] for this salt[]
            for (int k = 0; k < dict_length+1; k++)
            {
                 //printf("Dict_length: %i | Salt: %s | Dict[%i] is: %s | Hash2 is %s\n", dict_length, this_salt, j, dict[j], hash2);
                hash2 = crypt(dict[k], this_salt);
                if (strcmp(dict[k], "greg") == 0)
                {
                    //printf("Dict_length: %i | Salt: %s | Dict[%i] is: %s | Hash2 is %s\n", dict_length, this_salt, k, dict[k], hash2);
                }

                if (strcmp(hash1, hash2) == 0)
                {
                    printf("%s\n", dict[k]);
                    return 0;
                }
            }
        }
    }

I’ve left my printf debugging lines in there but commented out (“greg” was one of my test cases). So often when I see code online it’s pristine, but I have a suspicion that it only gets that way after refactoring. I’ve left mine warts and all, mostly because I feel I’ve learned what I need to from this and would gain less from refactoring this code than moving on to another learning opportunity. But also to challenge the view that “good” developers spurt out flawless code without Googling anything and without having to tidy up afterwards. #nomakeupselfie.

5. Brute force attack

OK, here is where I fail. With bleary eyes and many hours spent, I have admited defeat. I learned a little bit about recursion, a little bit about optimising code and a little bit about stack overflows and segmentation faults. Ultimately, I went for looping rather than recursion due to the performance benefits. Frustratingly, this is no more complex than the caesar or vigenere problems, but I’ve reached the max number of hours that I wanted to spend on this and it’s time to move on.

Here is my broken code:

 // Dictionary search has failed, move on to brute force
    char pass[6];
    int pass_length;

    for (int i = 0; i < 62; i++)
    {
        this_salt[0] = all_salts[i];
        for (int j = 0; j < 62; j++)
        {
            this_salt[1] = all_salts[j];
            for (pass_length = 0; pass_length < 6; pass_length++)
            {
                for (int k = 0; k < 62; k++)
                {
                    pass[pass_length] = all_salts[k];
                    hash2 = crypt(pass, this_salt);
                    printf("pass[%i] = %s | this_salt = %s\n", pass_length, pass, this_salt);
                    if (strcmp(hash1, hash2) == 0)
                    {
                        printf("%s\n", pass);
                        return 0;
                    }
                    //sleep(1);
                }
                pass[pass_length] = all_salts[0];
            }
        }
    }
}

I should add that I satisfied all the requirements to pass this pset and move on to next week’s module. I just wanted to have a go at the ‘harder’ pset to see if I could crack it. It would seem the answer was ‘almost, but no cigar’, although that is a little weak as there is no such thing as a partially cracked password!

The most elegant solution I found online for crack is mareksuscak’s on GitHub.

Flogging a dead horse

I’m in Bavaria for the week mountain biking and catching up on CS50 in the evenings. It’s lush here:

Muddy alpine single track

I submitted a university assignment last Sunday, which has given me some breathing room to come back to CS50.

But not that much breathing room! I learned so much from file operations, different qualities of variable types (strings/chars etc.), memory management, some recursion basics… that I’ve decided I’ve learned enough from this problem. Even though the final program didn’t work 100% (it can still crack passwords providing they’re in the dictionary), I’m satisfied I’ve got what I needed to from attempting crack. It’s been fulfilling, but this is one of those problems I definitely feel I would have benefited from being able to speak to a mentor.

I’m now going to move on to week 3 of CS50. Time is of the essence, and I’ve got some real world coding projects I want to work on when I’ve finished this off, first of which is an e-commerce site for a friend. On to week 3 of CS50 with no regrets!