Recently, I found myself in a situation where I needed to generate a random float
between 0 and 1 using a random 32-bit integer in C. This subject has been done to death before, but I felt like feeling a little pain so I decided to figure it out myself before using someone else's better solution.
First things first: we actually need to obtain some random integers. Easy, right? We'll just call rand()
and—
rand() is known to have a number of problems; here's a short list:
- If you use glibc like me, your
rand()
uses a Linear Congruential Generator. LCGs are very simple, but they are prone to subtle correlation issues that tend to manifest when applied to a multidimensional problem, such as Monte Carlo simulations. - The number of distinct sequences that you can get from
rand()
is limited by the number of different seeds that you can provide. A quality PRNG usually has an internal state of at least 128 bits, butsrand()
only accepts anunsigned int
, providing only 32 bits of seed material. rand()
is not thread safe. Calling it will result in reads and writes to a shared location, which will cause both performance problems and randomness problems.rand_r()
allows the user to provide a pointer that will be used to the PRNG's state, allowing thread-safe code to be written, but it is rarely used.
Okay, so rand()
is not as horrible as I've made it out to be. Don't get me wrong—rand()
is pretty crappy—but if you generated 100 random numbers and stared at it, it would probably look pretty random. So rand()
might be okay for use cases where high quality randomness is not required (such as a game), but if you try subjecting it to rigorous statistical tests it crumples like wet toilet paper.
Today, we're going to be using xoshiro128+ as our PRNG. PCG is another viable alternative.
Introducing Floating Point §
Okay, great. So now we have a random uint32_t
... how do we turn this into a random float
?
The naïve solution is to just divide the integer by the maximum value (232), but this is not great for two reasons:
- Division is slow.
- This requires casting the integer to a float, but floats themselves are also 32 bits, so a float could not possibly represent all 32-bit integers. You can avoid this precision loss by casting to a double, but that just makes the first issue worse.
Instead, what we want to do is create a floating point number using bitwise operations. To do that, we need to familiarize ourselves with how floating point values are represented in memory; specifically, we're going to be focusing on IEEE 754, which is sort of required in the C standard.
If you ever learned scientific notation in school, a floating point number is a lot like that. Basically, three fields are packed together: the sign bit, the exponent, and the integer. The function of the sign bit is very simple; 0 is positive and 1 is negative. The real value represented by a float is basically equal to .
There are a few pitfalls that we need to be aware of, however:
- The exponent is stored with an offset of 127, so an exponent of 0 is equal to -127, an exponent of 127 is equal to 0, and an exponent of 255 is equal to 128.
- The fraction section is interpreted as if there was an additional 1 bit in front of it.
In this case, the exponent portion is equal to , and the fraction section is equal to . Thus, the overall value is equal to .
Actually Generating Floats §
Time to actually generate the floats!
We can start by drawing 23 bits from our PRNG for the fraction part. This is easy enough; we just take a 32-bit value and shift it right 9 bits. This gives us a random fraction value between and .
If we set the exponent portion to 0, the fraction portion will be multipled with , giving us a random number in the range . Subtracting 1 yields a number in the range .
Here's the actual code:
float int_to_float(uint32_t random) {
union { uint32_t u32; float f; } u = { .u32 = random >> 9 | 0x3f800000 };
return u.f - 1.0;
}
0x3f800000
is what we get when we take our exponent (0 is represented as 127) and shift it right 23 bits so that it's in the right place.
We could use plain old floating point multiplication instead of type punning, though there is a chance that it will be imperceptibly slower than our hacky solution. Probably not. Whatever.
You can find the full code demonstrating this concept here. That's all, folks.