## Password Strength Is A Hard Problem

We are all familiar with the classic advice on password strength, a mix of lowercase, uppercase, symbols, numbers, of a certain length. It is trivial to construct a validator for password strength in Python

```>>> from string import ascii_uppercase, ascii_lowercase, digits, punctuation
>>> def check_pw1(p, m=8):
...     return len(p) >= m and all(map(lambda x: any(y for y in x if y in p), [ascii_uppercase, ascii_lowercase, digits, punctuation]))
>>>
```

The problem with this is that you can meet all these requirements and still have a bad password, for example:

```>>> check_pw1('Gaiu\$123')
True
>>> check_pw1('P@55w0rd')
True
>>>
```

When substituting symbols or numbers for letters, everyone does it in predictable ways, such as S → \$ or 5, T → +, O → 0 and so on. So these offer only minimal protection against brute-force dictionary attacks, or even intelligent guesses. We can easily generate a few variations (there are obviously combinations of variations too that would need to be considered too†):

```>>> s = {'s': ['5', '\$'], 'a': ['@',], 'o': ['0', '.']}
>>> def subs(a):
...     x, y = a
...     return [(x, z) for z in y] + [(x, x.upper())]
>>> l = ['password'.replace(a, b) for a, b in [x for y in map(subs, s.items()) for x in y]]
>>> l
>>> for a, b in [x for y in map(subs, s.items()) for x in y]:
...     p = p.replace(a, b)
...
>>> p
'p@55w0rd'
>>> l.append(p)
```

Looking at the size of the dictionary, with a few combinations you would end up with well over a million possibilities, possibly several million, but remember that with techniques like rainbow tables these kinds of attacks were feasible decades ago, now you can easy throw 10000× the compute power at it.

```gaius@klossy:~\$ wc -l /usr/share/dict/words
102401 /usr/share/dict/words
```

Maybe we can do something a bit cleverer using NLP techniques:

```>>> from nltk.metrics.distance import edit_distance
>>> def check_pw2(u, p, m=8):
...     return len(p) >= m and edit_distance(u, p) >= m
...
>>> check_pw2('gaius', 'Gaiu\$123')
False
>>> check_pw2('gaius', 'P@55w0rd')
True
```

But it’s still not foolproof.

```>>> def check_pw3(u, p, m=8):
...     return len(p) >= m and edit_distance(u, p) >= m and p not in l
...
>>> check_pw3('gaius', 'P@55w0rd')
True
5
>>>
```

Hah, I’m not as clever as I thought I was, in building my list of substitutions I “forgot” that every letter can also be substituted for its opposite-case counterpart. And you can bet any users of my system, having to change their password regularly enforced by policy, would quickly discover that and use it as a means to minimise variation each time to make it easier to remember. I could then check the edit distance between the password and every word in the dictionary… It’s a neverending game of cat and mouse, and only serves to annoy everyone.

```>>> from timeit import timeit
>>> timeit("""from nltk.metrics.distance import edit_distance
... f = open('/usr/share/dict/words', 'r')
... for w in f:
...     edit_distance(w, 'P@55w0rd')
... """, number=10)
244.13386648699998
```

But we can try it:

```>>> check_pw4('gaius', 'P@55w0rd')
abactor

False
>>> check_pw4('gaius', 'agi395wk%')
aardvark

False
```

Finally! But maybe it’s too good!

```>>> check_pw4('gaius', 'agi395wk%', 7)
agile

False
>>> check_pw4('gaius', 'agi395wk%', 6)
True
```

OK, we can maybe get decent results relaxing it a bit. It still needs tweaking but I have proved my point! Programatically checking a password short enough to be memorable is a hard problem! It would require an almost impossible effort to cross check against names of relatives, names of pets, birthdays, postcodes, car registrations… A password can simultaneously be both secure in theory and trivial to guess in practice.

You have to get into the realms of `pwgen 20` to be secure, eventually you end up with something as long as an SSH key! People are too predictable and the traditional rules are too easy to workaround. I wonder if any formal studies have ever been done into the most complex password a normal user can cope with, or whether “the rules” were just dreamt up by someone one day without really thinking about it. You very quickly reach the tipping point which leads to post-it notes on monitors!

None of this matters anyway really, I’ll wager very few cyberattackers make the initial breach via guessing a password these days. Just waltz in through an unpatched vulnerability and grab the whole user database and crack, sorry, “recover” it offline at your leisure… 😀

† Not my intention here to give away anything actually useful obviously