Mysql – how to check if a string looks randomized, or human generated and pronouncable

algorithmMySQLnlpphoneticsspam

For the purpose of identifying [possible] bot-generated usernames.

Suppose you have a username like "bilbomoothof" .. it may be nonsense, but it still contains pronouncable sounds and so appears human-generated.

I accept that it could have been randomly generated from a dictionary of syllables, or word parts, but let's assume for a moment that the bot in question is a bit rubbish.

  1. Suppose you have a username like
    "sdfgbhm342r3f", to a human this is
    clearly a random string. But can
    this be identified programatically?
  2. Are there any algorithms available
    (similar to Soundex, etc..) that can
    identify pronounceable sounds within
    a string like this?

Solutions applicable in PHP/MySQL most appreciated.

Best Answer

I guess you could think of something like that if you could restrict yourself to pronounceable sounds in english. For me (I am French), words like szczepan or wawrzyniec are unpronounceable and certainly have a certain randomness.

But they are actually Polish first names (meaning steven and lawrence)...