Algorithm for grouping anagram words

algorithmanagramdata-processing

Given a set of words, we need to find the anagram words and display each category alone using the best algorithm.

input:

man car kile arc none like

output:

man
car arc
kile like
none

The best solution I am developing now is based on an hashtable, but I am thinking about equation to convert anagram word into integer value.

Example: man => 'm'+'a'+'n' but this will not give unique values.

Any suggestion?


See following code in C#:

string line = Console.ReadLine();
string []words=line.Split(' ');
int[] numbers = GetUniqueInts(words);
for (int i = 0; i < words.Length; i++)
{
    if (table.ContainsKey(numbers[i]))
    {
        table[numbers[i]] = table[numbers[i]].Append(words[i]);
    }
    else
    {
        table.Add(numbers[i],new StringBuilder(words[i]));
    }

}

The problem is how to develop GetUniqueInts(string []) method.

Best Solution

Don't bother with a custom hash function at all. Use the normal string hash function on whatever your platform is. The important thing is to make the key for your hash table the idea of a "sorted word" - where the word is sorted by letter, so "car" => "acr". All anagrams will have the same "sorted word".

Just have a hash from "sorted word" to "list of words for that sorted word". In LINQ this is incredibly easy:

using System;
using System.Collections.Generic;
using System.Linq;

class FindAnagrams
{
    static void Main(string[] args)
    {
        var lookup = args.ToLookup(word => SortLetters(word));

        foreach (var entry in lookup)
        {
            foreach (var word in entry)
            {
                Console.Write(word);
                Console.Write(" ");
            }
            Console.WriteLine();
        }
    }

    static string SortLetters(string original)
    {
        char[] letters = original.ToCharArray();
        Array.Sort(letters);
        return new string(letters);
    }
}

Sample use:

c:\Users\Jon\Test>FindAnagrams.exe man car kile arc none like
man
car arc
kile like
none