My new year’s resolution for this year was to not only start blogging more frequently but also to Open Source some of our core libraries that I’ve worked on over the years. In preparation for that though it’s important I cover some of the basics and explain the reasoning behind some of the code we’ll be putting out there.
I thought we’d start with something pretty easy, but hopefully with enough twist to interest more advanced readers. Today I’m starting with random number generation – or more precisely random stuff generation. This will form part of some future test code, but for now, I’d like to look at this small subset of the overall problem.
The first thing about the random generation in .NET is the System.Random class. Now, I’m assuming you’re mostly familiar with this little gem, but if not, here are some quick examples –
// Create a random number generator Random random = new Random(); // Generate a random number between 0 and Int.MaxValue-1 inclusive. int a = random.Next(); // Generate a random number between 0 and 49 inclusive. int b = random.Next(50); // Generate a random number between 10 and 49 inclusive. int c = random.Next(10, 50); // Generate a random double between 0.0 inclusive and 1.0 exclusive double d = random.NextDouble(); // Fill a buffer with random bytes (each byte having every possible value). byte[] buffer = new byte[10]; random.NextBytes(buffer);
The important (and often missed) point of these examples is that the upper bound of the Next overloads is exclusive, that is it’s impossible to get Int32.MaxValue using the Next overload. Even worse, you can’t get any negative values. Similarly, NextDouble will only give you values greater than or equal to 0.0 or less than 1.0, which is hardly the whole range of possible doubles!
If we really want to be thinking about random number generation, the really useful method is the NextBytes method – and I’m going to flog it to death!
So first of all, what makes a good random ‘thing’ generator. Well priorities change based on application but at the top of my list is always “It must be theoretically possible to get every conceivable value” – this isn’t always achievable, but it’s the main goal for my implementations. After that, I normally place “It mustn’t be slow”, and finally, “Uniform distribution of values would be nice”. However, I don’t personally put uniformity ahead of performance.
I know in this regard I’m not going to get everyone’s agreement, however, in justification bare in mind that to actually get a random generator to give every possible long (by way of example) randomly would take so incredibly long that it’s uniformity is not that useful.
Ahead of everything, though, I want my Random number generators to be thread safe, and System.Random certainly isn’t (yet so many people miss that). The safest way to use a random is to declare a new one just before it’s use – but that creates a load of objects that need garbage collection. Way better is to have one Random per thread of execution – that way you don’t have to worry about locking, as a thread can’t interrupt itself.
Now achieving this scenario used to be a lot harder than it is in .NET 4 – now it’s easy –
/// <summary> /// Creates one random generator per thread. /// </summary> [NotNull] private static readonly ThreadLocal<Random> _randomGenerators = new ThreadLocal<Random>(() => new Random()); /// <summary> /// A random number generator. /// </summary> [NotNull] public static Random RandomGenerator { get { return _randomGenerators.Value; } }
The key here is the wonderful new System.Threading.ThreadLocal generic class. This allows us to create a field that has a different value on each thread (which is created by the lambda expression), so we can get a different Random on each thread – but only one per thread!
Now that I’ve solved thread safety let’s get on with creating our generators. For this, I want to add extension methods to the System.Random class, so I’m going to need a static class. to define them in (which in my project is called Tester as it’s part of my testing NuGet).
So let’s start with some easy ones –
/// <summary> /// Generates a random boolean. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static bool RandomBoolean(this Random random ) { return (random ?? RandomGenerator).Next(2) == 1; } /// <summary> /// Generates a random byte. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static byte RandomByte(this Random random) { return (byte) (random ?? RandomGenerator).Next(0x100); } /// <summary> /// Generates a random char. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static char RandomChar(this Random random) { return (char) (random ?? RandomGenerator).Next(0x10000); }
These classes us our good old Random.Next to create simple types, being careful to remember that the upper bound is exclusive (and that chars are Unicode!). If you call the methods directly (i.e. not as extension methods) and pass in null, they will use the RandomGenerator defined above, otherwise you can specify your own Random object.
Now we can turn our attention to creating a random integer, that can have every possible value –
/// <summary> /// Generates a random Int32. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static int RandomInt32(this Random random) { byte[] bytes = new byte[4]; (random ?? RandomGenerator).NextBytes(bytes); return BitConverter.ToInt32(bytes, 0); }
This method creates a 4-byte array and fills it with random bytes using the built-in random generator. Unlike Random.Next , it gives a uniform distribution of every possible Int32 value (including Int32.MaxValue and negatives).
We can extend the same principle to quite a few other types –
/// <summary> /// Generates a random Int16. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static short RandomInt16(this Random random) { byte[] bytes = new byte[2]; (random ?? RandomGenerator).NextBytes(bytes); return BitConverter.ToInt16(bytes, 0); } /// <summary> /// Generates a random Int64. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static long RandomInt64(this Random random) { byte[] bytes = new byte[8]; (random ?? RandomGenerator).NextBytes(bytes); return BitConverter.ToInt64(bytes, 0); } /// <summary> /// Generates a random float. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static float RandomFloat(this Random random) { byte[] bytes = new byte[4]; (random ?? RandomGenerator).NextBytes(bytes); return BitConverter.ToSingle(bytes, 0); } /// <summary> /// Generates a random double. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static double RandomDouble(this Random random) { byte[] bytes = new byte[8]; (random ?? RandomGenerator).NextBytes(bytes); return BitConverter.ToDouble(bytes, 0); }
As you can see our new RandomDouble covers the whole double range, unlike Random.NextDouble() which is a nice result.
The next few types are going to get a little more tricky and require us to know something about their underlying implementations, for example, the decimal type. This can be constructed with four integers, but the last one is a bit special as it encodes a scale and a sign, and not all values are allowed (or possible) –
/// <summary> /// Generates a random decimal. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks></remarks> public static decimal RandomDecimal(this Random random) { random = random ?? RandomGenerator; // Calculate last byte // We need a scale from 0-28 and a sign, so calculate a random number between -28 & 28. // This makes +'ves sligthly more common (as 0 is positive) but is faster. int scale = -28 + random.Next(57); int sign = 0; if (scale < 0) { sign = unchecked((int) 0x80000000); scale = -scale; } // Now we can create msb. int msb = sign + (scale << 16); return new decimal(new[] { random.RandomInt32(), random.RandomInt32(), random.RandomInt32(), msb }); }
The first three integers are generated using our previous RandomInt32 method, but the last integer is calculated separately to ensure it’s always valid. We have to apply the same level of care to the DateTime generators –
/// <summary> /// Generates a random date time, with a specific <see cref="DateTimeKind" />. /// </summary> /// <param name="random">The random generator.</param> /// <param name="kind">The <see cref="DateTimeKind" />.</param> /// <returns></returns> /// <remarks></remarks> public static DateTime RandomDateTime(this Random random, DateTimeKind kind) { // Last two bits are used internally by date time. long ticks = RandomInt64(random) & 0x3FFFFFFFFFFFFFFF; // If ticks is more than max value, just and it to ensure less than max value. if (ticks > 0x2bca2875f4373fff) ticks &= 0x2bca2875f4373fff; return new DateTime(ticks, kind); } /// <summary> /// Generates a random date time. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks>Also generates a random <see cref="DateTimeKind" />.</remarks> public static DateTime RandomDateTime(this Random random) { random = random ?? RandomGenerator; // Last two bits are used internally by date time. long ticks = RandomInt64(random) & 0x3FFFFFFFFFFFFFFF; // If ticks is more than max value, just and it to ensure less than max value. if (ticks > 0x2bca2875f4373fff) ticks &= 0x2bca2875f4373fff; DateTimeKind kind; switch (random.Next(3)) { case 0: kind = DateTimeKind.Utc; break; case 1: kind = DateTimeKind.Local; break; default: kind = DateTimeKind.Unspecified; break; } return new DateTime(ticks, kind); } /// <summary> /// Generates a random date time. /// </summary> /// <param name="random">The random generator.</param> /// <returns></returns> /// <remarks>Also generates a random <see cref="DateTimeKind" />.</remarks> public static DateTimeOffset RandomDateTimeOffset(this Random random) { random = random ?? RandomGenerator; // Last two bits are used internally by date time. long ticks = random.RandomInt64() & 0x3FFFFFFFFFFFFFFF; // If ticks is more than max value, just and it to ensure less than max value. if (ticks > 0x2bca2875f4373fff) ticks &= 0x2bca2875f4373fff; // Calculate random offset +/- 14 hours in minutes (offsets must be in minutes) long offsetTicks = (TimeSpan.TicksPerHour*-14) + (TimeSpan.TicksPerMinute*random.Next(1680)); // Ensure offsetTicks don't take us outside of the DateTime range. // We could decrease/increase ticks and round to nearest minute, but it is easier just to set to 0. long ticksOffset = offsetTicks - ticks; if ((ticksOffset < 0) || (ticksOffset > 0x2bca2875f4373fff)) offsetTicks = 0; return new DateTimeOffset(ticks, TimeSpan.FromTicks(offsetTicks)); }
These required a lot more thought but they’ve been thoroughly tested and checked, so feel free to use them even if the comments aren’t immediately transparent to you!
Finally, we probably want to be able to generate random strings –
/// <summary> /// Generates a random string. /// </summary> /// <param name="random">The random generator.</param> /// <param name="maxLength">Maximum length.</param> /// <param name="unicode">if set to <see langword="true" /> string is UTF16; otherwise it uses ASCII.</param> /// <param name="nullProbability">The probability of a null being returned (0.0 for no nulls).</param> /// <returns></returns> /// <remarks></remarks> public static string RandomString(this Random random, int maxLength = -1, bool unicode = true, double nullProbability = 0.0) { random = random ?? RandomGenerator; // Check for random nulls if ((nullProbability > 0.0) && (random.NextDouble() < nullProbability)) return null; // Get string length, if there's no maximum then use 8001 (as 8000 is max specific size in SQL Server). int length = (maxLength < 0 ? random.Next(10000) : maxLength)*(unicode ? 2 : 1); if (length < 1) return string.Empty; byte[] bytes = new byte[length]; random.NextBytes(bytes); return unicode ? new UnicodeEncoding().GetString(bytes) : new ASCIIEncoding().GetString(bytes); }
The nullProbability allows you to optionally (and randomly) return nulls, and Unicode is likewise optional.
Next time, we’ll start looking at some more random generators related to SQL, but before we finish, how about something a bit different? The next two extensions are to the IEnumerable<T> interface, which extends the built-in LINQ queries to include a random item selector –
/// <summary> /// Returns a random element from an enumeration, that matches the predicate; otherwise returns the default value. /// </summary> /// <typeparam name="T">The element type.</typeparam> /// <param name="enumeration">The enumeration.</param> /// <param name="predicate">The optional predicate.</param> /// <returns>A random element or default.</returns> [CanBeNull] public static T RandomOrDefault<T>([NotNull]this IEnumerable<T> enumeration, Func<T, bool> predicate = null) { if (enumeration == null) throw new ArgumentNullException("enumeration", "The enumeration cannot be null."); // We may as well build a list, as we have to count elements anyway. List<T> filtered = predicate == null ? enumeration.ToList() : enumeration.Where(predicate).ToList(); int count = filtered.Count; return count < 1 ? default(T) : filtered[RandomGenerator.Next(count)]; } /// <summary> /// Returns a random element from an enumeration, that matches the predicate; otherwise throws an exception if the predicate is not matched. /// </summary> /// <typeparam name="T">The element type.</typeparam> /// <param name="enumeration">The enumeration.</param> /// <param name="predicate">The optional predicate.</param> /// <returns>A random element.</returns> [CanBeNull] public static T Random<T>([NotNull]this IEnumerable<T> enumeration, Func<T, bool> predicate = null) { if (enumeration == null) throw new ArgumentNullException("enumeration", "The enumeration cannot be null."); // We may as well build a list, as we have to count elements anyway. List<T> filtered = predicate == null ? enumeration.ToList() : enumeration.Where(predicate).ToList(); int count = filtered.Count; if (count < 1) throw new InvalidOperationException("The enumeration did not return any results."); return filtered[RandomGenerator.Next(count)]; }
If you want any clarification, or you have a better implementation be sure to leave a comment! Once I get through these tutorials I’ll be sure to post the completed source.
[…] library could be used to quickly generate random data, using the same techniques first discussed in Generating Randomness. That means we have the tools to create real test data and create lots of random data. […]
[…] my last post, I talked about creating random values for all kinds of interesting types. One of the biggest complaints I get when encouraging developers to implement unit tests (or dare […]