Algorithm
To generate a random string, concatenate characters drawn randomly from the set of acceptable symbols until the string reaches the desired length.
Implementation
Here's some fairly simple and very flexible code for generating random identifiers. Read the information that follows for important application notes.
public class RandomString {
/**
* Generate a random string.
*/
public String nextString() {
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}
public static final String upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
public static final String lower = upper.toLowerCase(Locale.ROOT);
public static final String digits = "0123456789";
public static final String alphanum = upper + lower + digits;
private final Random random;
private final char[] symbols;
private final char[] buf;
public RandomString(int length, Random random, String symbols) {
if (length < 1) throw new IllegalArgumentException();
if (symbols.length() < 2) throw new IllegalArgumentException();
this.random = Objects.requireNonNull(random);
this.symbols = symbols.toCharArray();
this.buf = new char[length];
}
/**
* Create an alphanumeric string generator.
*/
public RandomString(int length, Random random) {
this(length, random, alphanum);
}
/**
* Create an alphanumeric strings from a secure generator.
*/
public RandomString(int length) {
this(length, new SecureRandom());
}
/**
* Create session identifiers.
*/
public RandomString() {
this(21);
}
}
Usage examples
Create an insecure generator for 8-character identifiers:
RandomString gen = new RandomString(8, ThreadLocalRandom.current());
Create a secure generator for session identifiers:
RandomString session = new RandomString();
Create a generator with easy-to-read codes for printing. The strings are longer than full alphanumeric strings to compensate for using fewer symbols:
String easy = RandomString.digits + "ACEFGHJKLMNPQRUVWXYabcdefhijkprstuvwx";
RandomString tickets = new RandomString(23, new SecureRandom(), easy);
Use as session identifiers
Generating session identifiers that are likely to be unique is not good enough, or you could just use a simple counter. Attackers hijack sessions when predictable identifiers are used.
There is tension between length and security. Shorter identifiers are easier to guess, because there are fewer possibilities. But longer identifiers consume more storage and bandwidth. A larger set of symbols helps, but might cause encoding problems if identifiers are included in URLs or re-entered by hand.
The underlying source of randomness, or entropy, for session identifiers should come from a random number generator designed for cryptography. However, initializing these generators can sometimes be computationally expensive or slow, so effort should be made to re-use them when possible.
Use as object identifiers
Not every application requires security. Random assignment can be an efficient way for multiple entities to generate identifiers in a shared space without any coordination or partitioning. Coordination can be slow, especially in a clustered or distributed environment, and splitting up a space causes problems when entities end up with shares that are too small or too big.
Identifiers generated without taking measures to make them unpredictable should be protected by other means if an attacker might be able to view and manipulate them, as happens in most web applications. There should be a separate authorization system that protects objects whose identifier can be guessed by an attacker without access permission.
Care must be also be taken to use identifiers that are long enough to make collisions unlikely given the anticipated total number of identifiers. This is referred to as "the birthday paradox." The probability of a collision, p, is approximately n2/(2qx), where n is the number of identifiers actually generated, q is the number of distinct symbols in the alphabet, and x is the length of the identifiers. This should be a very small number, like 2‑50 or less.
Working this out shows that the chance of collision among 500k 15-character identifiers is about 2‑52, which is probably less likely than undetected errors from cosmic rays, etc.
Comparison with UUIDs
According to their specification, UUIDs are not designed to be unpredictable, and should not be used as session identifiers.
UUIDs in their standard format take a lot of space: 36 characters for only 122 bits of entropy. (Not all bits of a "random" UUID are selected randomly.) A randomly chosen alphanumeric string packs more entropy in just 21 characters.
UUIDs are not flexible; they have a standardized structure and layout. This is their chief virtue as well as their main weakness. When collaborating with an outside party, the standardization offered by UUIDs may be helpful. For purely internal use, they can be inefficient.
In Java 1.7 or later, the standard way to do this is as follows:
import java.util.concurrent.ThreadLocalRandom;
// nextInt is normally exclusive of the top value,
// so add 1 to make it inclusive
int randomNum = ThreadLocalRandom.current().nextInt(min, max + 1);
See the relevant JavaDoc. This approach has the advantage of not needing to explicitly initialize a java.util.Random instance, which can be a source of confusion and error if used inappropriately.
However, conversely there is no way to explicitly set the seed so it can be difficult to reproduce results in situations where that is useful such as testing or saving game states or similar. In those situations, the pre-Java 1.7 technique shown below can be used.
Before Java 1.7, the standard way to do this is as follows:
import java.util.Random;
/**
* Returns a pseudo-random number between min and max, inclusive.
* The difference between min and max can be at most
* <code>Integer.MAX_VALUE - 1</code>.
*
* @param min Minimum value
* @param max Maximum value. Must be greater than min.
* @return Integer between min and max, inclusive.
* @see java.util.Random#nextInt(int)
*/
public static int randInt(int min, int max) {
// NOTE: This will (intentionally) not run as written so that folks
// copy-pasting have to think about how to initialize their
// Random instance. Initialization of the Random instance is outside
// the main scope of the question, but some decent options are to have
// a field that is initialized once and then re-used as needed or to
// use ThreadLocalRandom (if using at least Java 1.7).
//
// In particular, do NOT do 'Random rand = new Random()' here or you
// will get not very good / not very random results.
Random rand;
// nextInt is normally exclusive of the top value,
// so add 1 to make it inclusive
int randomNum = rand.nextInt((max - min) + 1) + min;
return randomNum;
}
See the relevant JavaDoc. In practice, the java.util.Random class is often preferable to java.lang.Math.random().
In particular, there is no need to reinvent the random integer generation wheel when there is a straightforward API within the standard library to accomplish the task.
Best Solution
As other answers have implied, there's two main approaches: 1) track what you've already generated (the proposed solutions in this category suffer from possibly never terminating), or 2) track what permutations have yet to be produced (which implies that the permutations must be pre-generated which was specifically disallowed in the requirements). Here's another solution that is guaranteed to terminate and does not require pre-generation, but may not meet your randomization requirements (which are vague at this point).
General overview: generate a tree to track what's been generated or what's remaining. "select" new permutations by traversing random links in the tree, pruning the tree at the leafs after generation of that permutation to prevent it from being generated again.
Without a whiteboard to diagram this, I hope this description is good enough to describe what I mean: Create a "node" that has links to other nodes for every letter in the alphabet. This could be implemented using a generic map of alphabet letters to nodes or if your alphabet is fixed, you could create specific references. The node represents the available letters in the alphabet that can be "produced" next for generating a permutation. Start generating permutations by visiting the root node, selecting a random letter from the available letters in that node, then traversing that reference to the next node. With each traversal, a letter is produced for the permutation. When a leaf is reached (i.e. a permutation is fully constructed), you'd backtrack up the tree to see if the parent nodes have any available permutations remaining; if not, the parent node can be pruned.
As an implementation detail, the node could store the set of letters that are not available to be produced at that point or the set of letters that are still available to be produced at that point. In order to possibly reduce storage requirements, you could also allow the node to store either with a flag indicating which it's doing so that when the node allows more than half the alphabet it stores the letters produced so far and switch to using the letters remaining when there's less than half the alphabet available.
Using such a tree structure limits what can be produced without having to pre-generate all combinations since you don't need to pre-construct the entire tree (it can be constructed as the permutations are generated) and you're guaranteed to complete because of the purging of the nodes (i.e. you're only traversing links to nodes when that's an allowed combination for an unproduced permutation).
I believe the randomization of the technique is a little odd, however, and I don't think each combination is equally likely to be generated at any given time, though I haven't really thought through this. It's also probably worth noting that even though the full tree isn't necessarily generated up front, the overhead involved will likely be enough such that you may be better off pre-generating all permutations.