SkepticblogSkepticblog logo banner

top navigation:


by Steven Novella, Jan 02 2012

One concept that is important to being a scientist or critical thinker is that terms need to be defined precisely and unambiguously. Words are ideas and ideas can be sharp or fuzzy – fuzzy ideas lead to fuzzy thinking. An obstacle to using language precisely is that words often have multiple definitions, and many words that have a specific technical definition also have a colloquial use that is different than the technical use, or at least not as precise.

Recently on the SGU we talked about randomness, a concept that can use more exploration than we had time for on the show. The term “random” has a colloquial use – it is often used to mean a non-sequitur, or something that is out of context. It is also used colloquially in the mathematical sense, as a sequence or arrangement that does not have any apparent or actual pattern. However, people have a  generally poor naive sense about what is random, mathematically speaking.

There are at least two specific technical definitions of the term random I want to discuss. The first is mathematical randomness. Here there is a specific operational definition; a random sequence of numbers is one in which every digit has a statistically equal chance of occurring at any position. That’s pretty straightforward. This operation can be applied to many sequences to see if they conform to a statistically random sequence. Gambling is one such application. The sequence of numbers that come up at the roulette table, for example, should be mathematically random. No one number should come up more often than any other (over a sufficiently large sample size), and there should be no pattern to the sequence. Every number should have an equal chance of appearing at any time. Otherwise players would be able to take advantage of the non-randomness to increased their odds of winning.

Computer simulations are another area where a truly random sequence of numbers is valuable.  Random numbers may provide the input necessary for the simulation to run.

It is very difficult (perhaps impossible) for a person to generate a truly random sequence of numbers from their brain. Here are three sequences of numbers, try to find the one that is mathematically random:

0 4 4 7 2 0 6 0 2 3 8 9 9 3 0 2 0 5 3 3 8 6 8 4 9 3 3 8 9 2 4 2 2 1 3 6 4 7 9 7 4 0 2 4 9 9 3 4 5 0

9 4 8 5 7 6 0 9 4 7 3 6 5 2 9 1 7 3 5 7 8 5 4 8 0 2 9 3 8 7 5 1 0 2 5 2 3 5 5 5 0 2 9 8 9 7 7 2 0 3

8 5 5 7 0 3 0 9 2 9 9 2 8 4 7 5 6 6 2 0 3 9 4 8 7 5 0 3 0 9 4 8 7 5 0 3 0 3 8 4 7 5 9 8 7 7 0 3 9 8

The top sequence was generated by a random number generator, the bottom two I produced by typing chaotically (I won’t say “randomly”) on my number pad. The top sequence is statistically random, while the bottom two are not. It’s hard to tell the difference just by looking. Also we tend to underestimate the clumpiness of randomness (called the clustering effect). So, for example, in a mathematically random sequence of numbers, the same digit should occur twice in a row with a certain frequency, and even three, four, five or more times in a row. But such clusters make the sequence look naively non-random.

The top number is what is called pseudo-random. As I said, random numbers are very useful to computer programmers. There are a number of operations that can generate mathematically random number sequences. But they are not truly random because the operation will generate the same sequence of numbers given the same input or seed. There therefore needs to be some way to create a random seed, which can be based upon some physically noisy process, or the time, or something else that changes regularly.

Another example of a pseudo-random sequence is pi. The number pi (3.1415926535897932384626433832795028841971693993751058209 7494459230781640628620899862803482534211706798214808651…) is a statistically random sequence of digits, but of course it is not truly random because it is one specific sequence.

This brings us to the second technical definition of random – true physical randomness. I can throw dice to generate a statistically random sequence of numbers, assuming the dice are fair and I am sufficiently “randomizing” each throw. But from a physical point of view, the result of each throw is not random, but determined by the laws of physics. The number that results on the die must occur given all the physical parameters of the throw. Once the die is cast, the number that will result is determined and not random. In this sense “random” also means “unpredictable.”

The only truly random physical system known to science results from quantum effects. Certain quantum properties are undetermined and unpredictable – they are truly random. In fact, researchers last year developed a random number generator based upon quantum properties – the first truly random number generator.

As with many concepts in science and elsewhere, even seemingly basic or simple concepts can become very detailed and complex when explored deeply. That is one lesson I have thoroughly learned from studying and teaching science  – it’s always more complicated than it seems. In fact, it’s always more complicated than your current understanding. The above discussion of randomness is a quick overview, but there are layers of complexity and detail I did not get into. There are also limits to our current understanding – the universe is more complicated than we know.

It is very helpful, however, to at least understand that there is likely more depth to an issue than one’s current knowledge. But we can still use terms and concepts that are accurate and precise as far as they go, even if there is always a deeper complexity.

23 Responses to “Randomness”

  1. Rob says:

    “a random sequence of numbers is one in which every digit has a statistically equal chance of occurring at any position”

    This goes directly against your example of roulette. 5, for example doesn’t have the same chance in every position. To meet your definition, you’d have to consider the table (american) a single base 38 number.

    The definition is also, in general, incorrect, since it only applies to one distribution. To maintain the roulette example, are you saying red/black/green isn’t random? Those aren’t going to land equally at all.

    • Wrong says:

      The idea is that any individual position on the wheel is equally likely as a resting place, although, as he indicated with the dice example, when the ball rolls, or the die is cast, the answer becomes non-random.

      In each roll (position), 5 is supposedly as likely as every other value. Of course, the physics of the roll change this, and he did indicate this later, but that doesn’t make it a flawed example. Prior to the ball being rolled, with a fair wheel, every number is equally likely to be selected (Of course, and ideal table is obviously not possible, but they’re close enough for an example, and making analogies about half lives and radioactive decay would fly over most heads, seeing as randomness is an easier way to explain quantum phenomena, rather than vice versa, to the layman.

      And Red/Black/Green don’t have to be equal. You see, the thing that is random is the number that occurs. These numbers are grouped into sets, Red/Black/Greed, Odd/Even, 1-18/19-36 etc. Now, the probability of a 5 occuring is 1/36, but if I choose Red say, or even, or 1-18, then it’s 1/36*18, or 1/2. The outcome Novella is discussing is obviously the number. The probability of an item from a set occuring is equal to the sum of the individual probabilities of each value in the set. So it’s still the same, problem, just with sets. The idea of course, being that the casino offers multiple options for probability, for different gambling types. I guess this is what Novella gets for being interesting and not talking about an ideal coin, or an ideal die, general rudeness.

      Your comment is simple pedantic nitpicking showing that you lack the capability to give the benefit of the doubt and understand what was being said. I could point out that American has a capital A, and that definition of Random is, as far as I’m aware, mathematically perfect,(the one I usually hear from my maths professors is “Selection occuring without bias or preferrence.” which is functionally identical) and pedantically nitpick at your comment. That gets us no-where. Instead, use your brain to understand what it is that someone’s actually saying. That lets us not only speak intelligently, but also allows us to assume you’ve some sense, rather than treating you like a child, which no-one enjoys.

  2. Мах says:

    If you know a random variable’s probability distribution, then you can predict how often it’ll fall within a given range of values. When you talk about a random sequence, you’re really talking about independent and identically distributed (i.i.d.) random variables. It’s unpredictable in that there’s no pattern to the sequence, which is why random sequences usually don’t compress well. A pseudorandom sequence, however, is completely described by a function and a seed.

    You can get a random number generator that uses noise from a webcam. Some of that noise is shot noise, or quantum noise, which is truly random, but it’s dwarfed by thermal noise, which is chaotic but not quantum, more like throwing dice.

  3. tmac57 says:


  4. Nathaniel Brottingham says:

    Random is also often used the sense of arbitrary, but not without choice. (I walked into a random bar; RAM; opening a file FOR RANDOM) And originally it meant great speed. But at least you have it better than the Dutch. ‘Willekeurig’ means arbitrary in the sense of by choice of will, but (possibly because of a long-forgotten translation mishap) it has come to be used for English ‘random’ in some scientific circles, even though they generally should be using ‘toevallig’, ‘aselect’, or some suchlike.
    >a random sequence of numbers is one in which every digit has a statistically equal chance of occurring at any position. That’s pretty straightforward.
    It is also wrong. Not only is it possible and often useful to generate a sequence for which this is not the case (for example, you might need a sequence with a Gaussian distribution) but it also leaves out the most important feature of a random sequence: unpredictability. Given the sequence of numbers already generated, you shouldn’t be able to tell what you’re going to get next.
    A good pseudo-random sequence is one where the generating function isn’t going to affect your use case and where the next number can only be predicted by resorting to that function. (This is one of the reasons why pseudo-random number generators often don’t use their entire internal state to yield a number. Those are actually two functions: one to generate the successor state, which you don’t look at, and one to generate a result from part of this state.) As an afterthought, there are some people who like to make a distinction between unpredictable and random sequences, but their writings are very confused from an information theoretical perspective.
    Testing a sequence for randomness is really hard and standard tests require thousands of samples. For example, although I was able to determine with certainty that your sequence 3 wasn’t random (it has a tendency not to follow up low numbers with low numbers as often as it should) sequence 2 looks quite nice to me. Given that the sequence is probably too short to really tell, suppose for a moment that you did generate it with a computer but you forgot about it and forgot to document it… how would you know?
    Also, I remember from your program that you had issues with phrases like ‘more random’ and ‘less random’. I cannot remember the context exactly, but such phrases are not by definition meaningless. You (or was it one of the other rogues?) complained that ‘random’ being an absolute shouldn’t be used in conjunction with ‘more’, but ‘red’ is an absolute too and still it makes sense to say redder or less red even if you and your audience both agree on what ‘red’ means (sRGB FF0000 say). In physical modelling (but also graphical design and I’m sure there are more examples) it is not uncommon to mix randomness with non-random variables or previous results. Brown noise for example is significantly less random than white noise. And some sequences are so random that they don’t even have an average; these outdo even white noise.

    • Max says:

      I threw the three sequences into a zip file. They packed down to 58, 56, and 51 bytes, which suggests that the first sequence is the most random and the third one is the least random.

    • Wrong says:

      Your comments on the meaning of random are exactly the point of his article. He describes the technical, the most useful and least (In fact, not at all) fuzzy definition. The others, or previous uses aren’t relevant. “Lololololollol so random” is fuzzy and wrong, while, “The outcome of an ideal dice roll is random” is precise. Precision of language is important.

      The definition he used of random, standard mathematical definition. If you can’t deal with that, then you don’t understand what a randomness is, in the mathematical sense. A random sequence, means that it’s unpredictable, ie, as he said, the next number has an equal chance of being any value of the set being selected from. You may be able to generate a sequence where each value is predictable to some extent (The opposite of his definition). That wouldn’t be random. It might be useful, but it isn’t random. Unpredictability? That’s what he described: STATISTICALLY EQUAL PROBABILITY. That’s randomness. If all outcomes are equally likely, then they’re unpredictable. The bit about precedent just reinforces what he said. You seem to think your discussion, which simply concurs with his, opposes it, and you would appear to be mistaken.

      Yes, there’s a difference between unpredictable and random. It’s not all that tricky. Random is unpredictable, but unpredictable is not random. A RNG is unpredictable, a Chaotic sequence is unpredictable, but both are governed by a definite bias and preference, making them non-random, though a truly Chaotic sequence is impossible to determine as non-random without access to the generating function.

      Your ability to determine whether the sequence is random is confirmation bias. Sequence 3 does follow what you describe, but it’s equally possible, in fact, a statistical certainty, that a truly random sequence could provide the same thing. The standard probability logical fallacies apply here.

      Randomness testing isn’t what he was talking about. Non-sequitur, but also, I’m not sure the point here. A chaotic sequence is impossible to distinguish from a random one, no-matter the sample size. Most RNGs without access to the seed, though not entirely chaotic, are almost impossible to determine as non-random as well. The importance still lies in correct use of language.

      False analogy with the red thing, Random is an absolute. Something is random, or the selection has bias. That’s it. Seriously, you can’t understand the concept of random values, and think that more and less apply to random. That would be like saying more dead, or less dead. While there may be varying degrees of alive, or varying degrees or bias in selection, there is only one degree of not alive, or not biased. (See how you use an analogy?)

      Max makes an interesting point about compression and randomness, although that doesn’t give a measure of random, just the variance of the strings. A high variance would usually indicate a greater probability of the sequence being random, and a low one is less likely to be random. But that’s not a measure of randomness, that’s a measure of how easy it is to find patterns and groupings, ie, a measure of how likely the series is to be random, which is in fact, a very clever idea. Much better than my way, looking at the numbers and looking for a pattern (Not a good way at all). It is possible for a random sequence to include a string of 12 zeroes, and later, a string of all the numbers 1-9 at unpredictable positions, with 3 extra places also unpredictable. Now, if I cut said string, and took the 12 zeroes:

      182639523807-Not actually random, just an example

      those would appear less likely to be random, and would compress smaller, but the varying string would compress higher, in fact, a string of the mod ten 2x table would likely be higher than the zero string.

      182639523807-Not actually random, just an example

      Using text files and compressed folders: File one compresses to 5 bytes, and the the second to 12 (ie, not at all), and the last to 9 bytes. Despite this, the last sequence is very predictable and determined, and the second one is not very, and the first is entirely possible under truly random conditions (Though very unlikely, I believe it’s a 1/Trillion chance, 1/10^12 of that occuring in a random sequence). It’s very useful, but a bit like Occam’s Razor- a good test to get an idea of something, but not a certainty.

      • Max says:

        Zip file compression basically does Huffman coding, replacing common long strings with short codes. It’s not smart enough to tell the difference between a pseudorandom sequence and a truly random one. But a theoretical perfect compressor would figure out the pattern of the pseudorandom sequence and compress it down to the smallest computer program that can generate it. The length of that program is the sequence’s Kolmogorov complexity.

        Now, you could have a true random number generator that generates zeros 99% of the time. Its output sequence will have many long runs of zeros and will compress well, because its preference for zeros is itself a pattern that can be exploited.

        And after all, probability theory is useful because it CAN make predictions about random things; hence the law of large numbers.

      • Wrong says:

        That’s a good point, an ideal compressor could perfectly determine the sequence, and return it, enabling greater compression. I still like the idea of using compression to test the sequence, but I wonder how the system would fare against a highly chaotic generator.

        A minor point, I’m not sure that you could have a random number generator that generated zeros 99% of the time. Unless the set was 99% zeros that it was selecting from, a truly random process would have equal numbers of the other values over an infinite period.

        My example is a very simplified one, and not very good, I’m sure someone else could come up with a better one. The point I was trying to get at is that compression is about determining patterns, and patterns can exist in small samples of random strings, which can impact compression, and in fact, some non-random strings may be more complex than a random section. Of course, as you point out, the Law of Large Numbers fixes this: If you can examine a large enough sample size, then for most functions, the system tends towards ideal.

        I was more itchy about it in the wake of the previous poster’s assertion that he could tell that one of the sequences was non-random, because while it may appear that a pattern has emerged in a small sample, it doesn’t mean a pattern in the large. Of course, this cuts both ways, the converse being that in a large sample, a lack of a pattern indicates a greater chance of “randomness”.

        I’m also concerned at the use of varying degrees of random, since random, by definition, is an absolute. Something can be more like random, or less like random, or more likely random, or less likely random, but saying that something is partly random means that it’s predictable and therefore, not random. This wasn’t really something you did, but the previous guy did, with an insipid analogy about the colour red (What do Random and Red have in common? The letter R, and not a similar scaling of degrees), and that got me cranky, and I apologise that my post was overly rude.

      • Max says:

        To make a random number generator that generates zeros 99% of the time, you could take one that generates numbers between 1 and 1000, and replace all numbers above 10 with zeros.
        Also, a random number generator based on thermal noise or even quantum/shot noise can generate a sequence with a Gaussian distribution, which prefers zeros over other values.

  5. Chris Howard says:

    “The Drunkards Walk” helped me understand randomness theory. It’s very well written, and conveys the concepts of randomness in such a way that is easily understood. Of course this is coming from a middle-aged male with a fourth grade math level, so…

  6. MadScientist says:

    You don’t need quantum mechanical gizmos to generate true random numbers. True random numbers have been generated using the ‘thermal noise’ in electronic circuits for quite a few decades now. It is not clear to me that the quantum entanglement gizmo is superior in any way to the thermal noise random number generator. I wonder if any modern consumer computers have a hardware random number generator.

    • Max says:

      Thermal noise isn’t truly random, unlike quantum noise.

      • MadScientist says:

        Thermal noise is affected by various parameters but you can use it to generate truly random numbers. No one has developed a scheme for predicting the outcome of a thermal noise random number generator; we can only predict the aggregate outcome, but that is true of any true random generator.

      • Max says:

        Any chaotic process is all but impossible to predict in the long run, but it may have some patterns in a short time interval that true quantum noise wouldn’t have.

      • Wrong says:

        Truly Random numbers is a bit of a stretch. Almost random, highly chaotic, pseudorandom, would be better definers. It’s close to random, but it’s still being defined by two non-random processes. Non-random Operator Non-random can’t equal random. Whilst Random Operator Anything will equal random. Now, a thermal noise RNG system is probably useful for anything you might need to do, but it’s not mathematically perfect, and that leaves space for people to strive for more.

  7. John K. says:

    I have always considered the main characteristic of randomness to be unpredictability. In this sense randomness is in the eye of the beholder. If I pick up a text written in a language I cannot read, the marks are indeed random to me. Likewise they are not random at all to someone who reads that language. The same holds true for different entities that can count cards or calculate the trajectory of lottery balls, the randomness is relative.

    Computer simulations of randomness actually do very well in that they only have to be unpredictable to the humans using it. A uniform distribution is often useful for certain types of brute force analysis, but provided a sequence cannot be predicted in can be said to be random even if the distribution is not uniform.

    Probability is mostly a method of managing unknowns. It seems like a mistake to try and enforce an absolute randomness standard, since it mostly depends on the ability of the beholder.

    • Wrong says:

      It’s not about enforcing a truly random standard. It’s about making sure everyone’s clear on what is random, and what isn’t. For example, I might be walking along and see a $50 note on the ground. I might say “Wow, that’s random.” -I wouldn’t, but it’s an example. It’s fuzzy language, and it’s fine, people know what I mean, but in the context of critical thinking, and skepticism, the word needs to be clearly defined.

      When Dawkins described Evolution as: “the outcome of non-random survival of randomly varying replicators”, the understanding of what is meant by random and non-random is vital to discourse. Imagine trying to explain a concept like that to a creationist with no concept of random. Not a fun prospect.

  8. Nathaniel Brottingham says:

    >Your comments … mathematical definition.
    You obviously didn’t read, or didn’t understand, what I wrote. Steve addressed statistical randomness and non-sequitur; I merely added another common meaning that Steve apparently forgot to mention, and one that isn’t at all ‘fuzzy’.
    >If you can’t deal with that, then you don’t understand what a randomness is, in the mathematical sense.
    It should be abundantly clear from my post that I have a better understanding of the topic than both you and Steve. Furthermore, it’s better not to accuse people of not understanding something, when you yourself lack text comprehension skills.
    >A random sequence, means that it’s unpredictable, ie, as he said,
    No, as I said. Steve forgot to mention this, that’s the point.
    >the next … selected from.
    That isn’t what he wrote; there are non-random sequences that would still fit within Steve’s loose definition.
    >You may … isn’t random.
    *sigh* Dealt with that already; there’s no sense in playing Echo.
    >Unpredictability? That’s what he described: statistically equal probability.
    Think before you type. These two properties are orthogonal.
    >That’s randomness … apply here.
    You are repeating yourself, which is unhelpful and boring.
    >Randomness testing … of language.
    1) I think the topic is perfectly apropos, and if you disagree you haven’t understood the article. 2) Come back here after taking a statistics class or two.
    >False analogy with the red thing, Random is an absolute.
    As is ‘red’ when both you and your audience agree on the definition.
    >Something … not biased.
    I know what statistical randomness is; you don’t have to explain it to me and especially not as ineptly as you do.
    >(See how you use an analogy?)
    ‘You don’t know how to make an omelette? No matter, I’ll show you!’ As you lay the resulting crumbling black misery on the plate, you say: ‘See, that’s how you do it!’
    >Max makes … a certainty.
    Since Max already replied, I will refrain from comment, except to gently ask you to shut up about things you obviously know nothing about, and to request that you keep future posts shorter, since you’re repeating yourself a lot and you’re generally being a bore.
    P.S. Your username is very appropriate.

    • Wrong says:

      Might want to put that as a reply to the original comment, it’s almost as if you don’t want to be seen down here.

      Max replied. It was an intelligent and reasoned response, and actually informed me of an idea which I hadn’t considered. Yours on the otherhand, was an insult to anyone with eyes.

      The definition you mentioned was basically indistinguishable from the Mathematical, or Physical definition, so yours was worthless. Deal with it.

      Abundantly clear from your post that you’ve a greater comprehension of the subject? Far from it, and I’ve never taken a claim to higher knowledge as proof of higher knowledge, which in your case, seems to display something I didn’t know existed, an ego even bigger than my own. And mocking my text-comprehension skills? I don’t see how I’ve been mistaken, as you’re the one who quite competently stated what had been stated previously in the blog post etc with a different wording, seemingly ignorant of the lack of difference. More to the point, an ad-hominem doesn’t make your point nearly so well as you think it does, although it does show a failure in your own logic.

      “The number that results on the die must occur given all the physical parameters of the throw. Once the die is cast, the number that will result is determined and not random. In this sense “random” also means “unpredictable.””-Oh look, Steve mentions unpredictable. Of course, since it’s implied in his statement on equal probability, and more expressly stated here, I’ll say this. Your text comprehension skills seem to be… lacking (Yeah, I’m a jerk. Sue me). This of course doesn’t make me right. It makes you an ass.

      Of course Red can be an absolute. But you didn’t define a definition where only one Red existed. In fact, you spoke of redness. Which is a non-sequitur (llolololo so random pfft), and a false analogy. And that would make your false analogy both poorly worded, poorly selected AND wrong.

      Oh? I explain it ineptly? I’ll tell the lecturers at UTAS that. “Sorry chaps, some fool with an overly long name (I’m a jerk) on the internet thinks your definition is inept. You’ll have to give up your jobs.” If my statements inept, I’d like you to explain why. I can say the same thing about yours, and it doesn’t change anything.

      Oh? Another false analogy and a non-sequitur with no discussion? Goodee! Using a false analogy to criticise my analogy has to be the most stupid thing I’ve seen in critical thinking. Good work! Random is an absolute. If something is like random, but not quite random, it’s predictable, and hence, non-random, and hence, can’t be at all random. Hence terms like more or less random, are devoid of meaning, there is only one Random, and there are only varying degrees of appearing Random, but not being random.

      “except to gently ask you to shut up about things you obviously know nothing about”< How not to be gentle. Now, if you're going to be immature and say I clearly know nothing about it, then you'd be best actually proving that rather than being a rude and immature child about it. Now, if my understanding of Random is wrong, I'd gladly take note. From a published source who is not an inept internet troll. Since: A) The definition I used for Mathematically Random is correct (However "inept" you consider it), and B) Your analogy is poor logic, and doesn't prove your point, moreoever, it contributes to both the fuzziness of language by defending a usage which is inept, and generally tends to misinform people.

      Yeah, sure my post is long. I really don't care. If people want to read it, that's up to them. Being a bore hardly counts when it's the internet, and the scroll button and choice are available, so I'll be as boring as need be. Personally, I found your post not only boring, but inept, and stupid, but then, telling you that would be insulting. But then, since you're so keen to insult, then I'll dish it back. And I'm aware of the amusing nature of the length of this post in response. It amuses me to think that you'd probably be the only one bothered to read through it all, as you're the only one it really concerns. Finally, my name is very appropriate. When I responded to you, it appear how I intended it to when I first used it long ago: As a notice that you're Wrong.

      PS- Your username is also appropriate. It's a stupid and boring name, rather like yourself. Goodnight, and I hope while you sleep, an epiphany of common sense comes over you.

  9. Max says:

    Steve’s third sequence has no 1s. The chance of getting no 1s in 50 tries is (9/10)^50=0.005. Looks anomalous. But we can find something improbable in any sequence. In fact, the probability that a 10-digit RNG would generate a given sequence of 50 digits is 1/10^50.
    BUT, if it’s missing 1s, maybe it was generated with a 9-digit RNG that doesn’t generate 1s. Then, its probability is 1/9^50. The Bayes factor is (1/9^50)/(1/10^50)=(10/9)^50=194.
    So, if the prior odds of a 9-digit RNG vs. a 10-digit RNG were better than 1:194, this result would make the 9-digit RNG more probable.

    But what’s the probability that a human haphazardly typed the sequence on the keyboard? To determine that, you’d have to model human typing by studying a lot of sequences typed by humans. Maybe humans prefer certain digits, like 5 and 0, or combinations of digits. For example, the third sequence had six 03s and two 94875030s, which is probably why it compressed the most.