Ludovic's weblog

Generating names using Markov chains

Yet Another markov chains article

I've always been interested in procedural generation. It's the kind of computer sciences area that makes you think that everything is possible without investing to much time on building assets.

Random name generation is probably the most intuitive and easiest way to start.

You will probably find countless random name generators using Markov chains or other techniques, anyways, here's another one... :)

Introduction to Markov chains

Citing wikipedia:

A Markov chain, named after Andrey Markov, is a mathematical system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a random process usually characterized as memoryless: the next state depends only on the current state and not on the sequence of events that preceded it. This specific kind of "memorylessness" is called the Markov property.

The thing about Markov chains is that they are very simple to implement, and you have a nice result very quickly.

Feeding the beast

Markov chains are great at generating names that sound real because they are based on real data. The best way to start is to feed the program with a huge list of real world words, like common first names, god names, or town names for example. The more words, the better.

Internally, the program uses a transition table to generate the names. This is the transition table for the name "John":

{
    None: ['jo'],
    'jo': ['h'],
    'oh': ['n'],
    'hn': [None]
}

The word is sliced in group of 2 letters:

[ "jo", "oh", "hn" ]

and each of this group gets a transition to the letter right next to it. The "None" values indicate the start and the beginning transitions.

If you add "Joseph" as a second word to the transitions table, it would be sliced in:

[ "jo", "os", "se", "ep", "ph" ]

and the updated transition table would be:

{
    None: ['jo', 'jo'],
    'jo': ['h', 's'],
    'oh': ['n'],
    'hn': [None],
    'os': ['e'],
    'se': ['p'],
    'ep': ['h'],
    'ph': [None]
}

There are now two ways of starting the chain, though they are the same, "jo" and "jo", as well as two ways of ending the chain, "hn" and "ph".

Generating names

Given the previous transition table, starting to generate a random name will look like this:

Simple.

You can control the output by providing a maximum lenght to prevent generated names to be too long. The more you will encounter a transition when parsing the names, the more it's likely to be picked when generating a random word. The main thing here is to provide a big enough list so that random generated names will sound like the names in the list.

Given a list that contains the names of gods from different origins, running the algorithm 10 times gives: