12 mins read

Last updated on Tuesday 13th June 2023

The Seventh Heresy: Entropy isn’t What It Used To Be

Entropy is what makes the universe change and what gives time a direction. The universe is a very lumpy place – the technical term is ‘heterogeneous’ – with stars, huge balls of hot matter, separated by vast distances of cold, empty nothing. Change in the universe is powered by this lumpy state moving to a more evenly spread state – a more ‘homogeneous’ picture. This is encapsulated by the Second Law of Thermodynamics “In any thermodynamic change (in a closed system), entropy will always increase “, or similar words[1].

If we scroll forwards countless billions of years, the picture envisaged is of this spreading out and being completed. At which point all directional change ceases, just random fluctuations back and forth continue. This is given the evocative name of ‘The Heat Death of the Universe’. (This picture ignores complications, such as black holes.)

If we scroll backwards, we move to a point where things were more heterogeneous, going to a maximum at whatever the ‘Big Bang’ was at the start of the universe.

This is the big version of the observation that entropy, or a better expression, the drive to equilibrium, is what gives time a direction. Everything sub-atomic is ‘time-symmetrical’[2], that is, all sub-atomic interactions can occur the other way round equally well, and time appears to have no direction at this level. However, while any individual interaction can go from one state to another or back again equally easily, groups of particles, on average, interact with each other more in one direction than the other. When groups of particles with a different average energy interact with each other, they all end up with similar average energy. Mix hot and cold and you get warm; two places that start off being different end up being more alike; energy in lumps gets smoothed out.

Temperature is the average speed of many small things like molecules, lumped together into the size of something we can see, like a cup of coffee. In hot coffee, the molecules are moving around and bumping into each other faster than in cold coffee. If you mix the two, hot and cold, the water molecules bump into each other. After they collide, and assuming that on average they all have the same mass, they bounce apart, now moving at the same speed as each other. This speed is the average of the two speeds at which they collided. This tendency for the difference in energy between two objects to get less when they interact is what we call the second law of thermodynamics. This is sometimes expressed as: ‘In any thermodynamic change in an isolated system, entropy will always increase.’ After the change to becoming more alike has happened, it can only be unmade by taking energy from elsewhere. For example, if you introduced some much hotter water into your cup of coffee, it could return to its original temperature. So, after any energy interaction, the energy gets more evenly distributed between the things that interacted. This gives time its direction: things move from being different to being more similar.

You can look at this another way: there is no special place in the universe that we know of or expect to find, where the rules are different. When there is an interaction between two things that have speed or energy differences, the tendency will be for them to become less different. This change, i.e., things becoming less different, is what makes the past quantitatively different from the present, and we expect that in the future, everything on average will continue to become more similar.

Given the importance of this, that moving closer to thermodynamic equilibrium is the main driver of change in the universe, it is regrettable that the meaning of the term ‘entropy’ is so muddled. (Thermodynamic means ‘heat moving’, and equilibrium means ‘equal balance’, the other driver of change is gravity.)  ‘Entropy’ has at least five different definitions, some of which are significantly different to each other, and some of which vaguely involve concepts such as probability, mess, information and order, which are not physical properties, but are related to knowledge and perception. Let us try to sort the muddle.

The word ‘entropy’ was coined by Rudolf Clausius in the 1850s from the Greek for ‘in transformation’. In his original meaning, which we will refer to as E1, entropy was the measure of the amount of change in energy lumpiness. E1 is, like acceleration, a measure of the change between two states. E1 measures the amount of energy lumpiness lost in a change; the more the starting disequilibrium moves towards equilibrium, the higher the change of E1. However, neither the starting situation nor the end situation has any quantity of entropy E1 ‘in it’, just as a car that accelerates from 10km/h to 20km/h has no acceleration ‘in it’ before the start or after the end of the process. This definition of entropy works fine in the field in which it was developed, the math to measure the efficiency of steam engines, but it has confused even great scientists. All that an increase in E1 actually means is that energy has become more evenly spread, more homogeneous, less lumpy, closer to equilibrium, the entropy number being the amount of spreading. Energy differences between two defined states have decreased by a defined amount, that is all. E1 has nothing to do with order or probability. At the time, the work on entropy was important in improving the efficiency of steam engines. Clausius’ work showed that no energetic change could keep all the energy in a useful state; there cannot be a 100% efficient thermodynamic change. Clausius was also the first person to state a form of the second law of thermodynamics.

The second definition of entropy, E2, came from Ludwig Boltzmann. He applied a mathematical definition, based on probability, to describe ‘complete equilibrium’ which he saw as a state of maximum entropy, E2. This is the state that, according to Boltzmann, has the largest possible potential number of states for the material involved. A pack of 52 playing cards that are completely shuffled can be in any one of 52! (fifty-two factorial or fifty-two shriek) different orders or states. If, however, we happen to know that more of the red cards are in the top 26, there are fewer states possible and E2 is smaller.

It is widely believed that Boltzmann’s definition, E2, is simply the representation of Clausius’s but in the (more prestigious) language of particles, rather than in the language of steam and engineering, thermodynamics. But E2 is not the same as, or equivalent to, the original E1. It is a static definition, a property something has, not a measure of change, like E1. In Boltzmann’s definition a thing can have more entropy or less entropy in it. To be fair, this slippage of meaning is easily done and Clausius himself did it but it is undoubtedly mathematically different: speed is not acceleration, albeit that they are closely connected.

More importantly, E2, is an information definition, not a physical definition. Thermodynamic equilibrium (less pompously, everything at the same temperature), is defined as all the particles involved having the same average kinetic energy (if they are all the same size, that means the same average speed), a definition that rests on the state of physical objects. By contrast, Boltzmann defined maximum entropy, E2, as something whose particles could be in the maximum number of possible states. By this definition, entropy can change with a change in information alone. In the shuffled pack of cards mentioned above, the number of possible states that the cards could be in is reduced without any physical change but just by the information that more red cards are at the top. That information reduces the level of Boltzmann’s entropy. This was the start of a long-running muddle between the meaning of entropy and order, probability, information and messiness. As we will discuss later, none of these last four terms ‘order’, ‘probability’, ‘information’ and ‘messiness’ has a physical, meaning any more than the term ‘justice’ although their use in the math of physics is essential.

Boltzmann claimed that his state of maximum entropy can carry no information. This is true by definition, not by any physical fact; if you have any information about something, there are possible states of that thing that are excluded. By Boltzmann’s definition – and only by his definition – any information, such as knowing that there are more red cards at the top of the pack, puts the pack in a state of non-maximum entropy. While Boltzmann defined his state of maximum entropy, he had no measure of any other state of E2, so it cannot be used in thermodynamics.

The entropy/messiness link (also suggested by Boltzmann) has superficial plausibility. We think that change, unguided by our intelligence, always makes things messier. We add to this that the second law states that entropy always increases (in a thermodynamic change), and it leads us to the idea that more entropy equals more mess. Initially laughed at by the scientific establishment, Boltzmann’s link between order and entropy is now taken for granted[3] and is often assumed to be a scientific finding. It is a metaphor that has confused people ever since; there is no trend to messiness. Order can arise spontaneously and often does, as seen in crystallisation or simply in the gravitational sorting and separating of heavy and light things, sand and water, for example. The initially robust criticism of Boltzmann’s approach was muted by his suicide. Under these sad circumstances, this concept was accepted with insufficient thought.

Boltzmann’s work was followed by entropy E3, devised by Gibbs, and entropy E4, devised by John von Neumann who applied Gibbs-like entropy to quantum mechanics. All this work was mathematical. Gibbs founded the field of statistical mechanics which works for thermodynamic calculations because thermodynamics involves countless trillions of particles, whose individual behaviour is completely subsumed into the sum of the behaviour of the mass. Toss a coin a million times and the result will be so close to 50:50 heads and tails that the fact that 50:50 is only a probability can be ignored completely. The minuscule amount of probability or uncertainty in calculations of, say, the energy behaviour of boiling water in a saucepan, is beyond any conceivable relevance. However, in forecasting the outcome of a small number of events, e.g., which of three cups has a coin under it, probability is about the state of our knowledge, not about the world. If we are told that cup A does not have the coin under it, the probabilities of the coin being under cups B and C, change with no change in physical state at all. To understand the physical situation, rather than the state of knowledge, we must leave behind Boltzmann’s, Gibbs’, and von Neumann’s ideas of subjective entropy. (For more on the use of probability in physics, see the Heresy devoted to the subject).

Boltzmann’s link between the word ‘entropy’ and messiness was accepted to such an extent that the word entropy was adopted (by Shannon in 1948) for use in a branch of mathematics called information theory (although signal theory would be a better name). Shannon’s entropy uses a lot of the math originally provided by Boltzmann to quantify the signal-to-noise ratio needed in data transmission and storage, specifically to determine the minimum number of non-random data points required to separate a signal from any given level of background noise. Shannon called this entropy which we can refer to as entropy E5. There is a story about this that we cannot confirm but that seems plausible. Shannon asked John von Neumann what to call the quantity and von Neumann, a playful mathematical genius, replied that Shannon should call it entropy for two reasons, ‘In the first place, a mathematical development very much like yours already exists in Boltzmann’s statistical mechanics, and in the second place, no one understands entropy very well, so in any discussion you will be in a position of advantage.’

There is a strong mathematical link between the information theory definition of E5 and Boltzmann’s definition of E2 (via Gibbs E3 and von Neumann’s E4 statistics), although they both have a different mathematical definition of statistical entropy.

This has led to conceptual confusion. Sometimes, systems not in complete thermodynamic equilibrium have been referred to as ‘having information’, rather than having a structure that can potentially store or convey information. This has led to some mathematical approaches to fundamental physics (notably entanglement) to be referred to as quantum information theory. It has also led to some rather meta-physical speculations on the fate of material that crosses the event horizon of a black hole.

Just because two worlds share the same statistics, does not mean they are related. The distribution of height in a population is subject to the same mathematics as the distribution of the speed of molecules in a gas at a fixed temperature. All are connected to the statistics of a normal distribution, but they are not otherwise related.

It is probably true that Gibbs was thinking of Clausius’ original entropy E1 when he first formulated the generally accepted formulation of the second law of thermodynamics as ‘entropy always increases (in a thermodynamic change in an isolated system).’ However, even this is not certain. Under the circumstances, it seems best not to use the term entropy at all and refer to movement closer to equilibrium or becoming more homogeneous and other similar terms, especially as we are not using it mathematically. The Second Law can be restated as : ‘In any thermodynamic change, differences in energy density always diminish (in an isolated system)[4]’.

By bringing lumps together, gravity appears to work against the ‘No Special Place’ rule or, in the older language, it decreases entropy over time. The way this is handled is with the concept of Gravitational Potential Energy, a kind of opposite to energy, especially Kinetic energy. This is best shown in the formation of the first stars. After the ball-waves of protons and electrons were formed, the universe was very uncomplicated. It was still warm from the original energy and the only matter in it was in the form of a thin gas made up of hydrogen atoms and molecules, a few helium atoms and traces of slightly heavier elements. Given that the atoms and molecules are completely stable in that environment, the universe was close to complete practical thermodynamic equilibrium. However, over several hundred million years or so, a period known as the ‘Dark Age’ of the universe, small differences in density of the gas, and hence in their gravitational effect, pulled material together into concentrated balls of matter. The gas heated up as it fell in, until the balls became so hot that the atoms at their centre started to fuse together, releasing huge quantities of nuclear energy and stars were born[5]. As a result of gravity, the universe went from being dark and homogenous to being made up of either hot bright, heavy stars or cold, dark, empty space. This change appears to be the exact opposite of what you might predict from the Second Law of Thermodynamics and the rule of No Special Place. But we deal with this by saying that the free atoms of the original dark age had Gravitational Potential Energy – available because they were both spread apart and also not quite evenly distributed. This gravitational potential energy was lost as they fell together and converted into kinetic energy and heat.

This gravitational potential energy, the separation of the atoms and molecules of the early universe came about because, as the original, Big Bang F-waves collapsed into the ball-waves, some of the spare energy of the F-waves kicked them outwards, away from the direction the F-waves came from (we know this happens from the Compton effect). The newly formed ball-waves that became the dark matter, electrons, protons, and neutrons of the universe were accelerated outwards, providing the expansion of the universe. The kinetic energy provided by this acceleration separating all the particles from each other gave the particles the gravitational potential energy. This energy was later released as local differences in the density of matter, and so the effect of gravity, made the matter particles ‘condense’ together into stars and galaxies.

The continuing expansion of the universe continues to convert the original kinetic energy into gravitational potential energy. The mysterious increase in the rate of expansion after 5 billion years that was discovered in 1999 is made more mysterious by the fact that it appears to increase the amount of energy in the universe from no known source.


[1] The First Law is that the energy content of a closed system is constant.

[2]For experts, the non-conservation of parity in Beta decay and its notional change in a time-reversed scenario is not a counter example to this statement. A long explanation will be available in time.

[3] The heretics are told that there is another, specialist use of the term ‘order’, related to commutativity and this does link to Boltzmann’s E2 entropy. The heretics have not understood this at all and here use ‘order’ in its normal sense which is connected to patterns and opposed to randomness.

[4] Although we may have to include the extra space made available by the Fourth and higher spatial dimensions that may not be directly observable.

[5] In fact only the heavy form of hydrogen, known as deuterium, starts to fuse and release energy. It is one of the startling facts that stars persist only because it needs this relatively rare form of hydrogen to work. If plain hydrogen could fuse on its own, stars would have quickly exploded.

Add your perspective

Comments are welcome on the website

Although you will have to sign up as a Friend of the Heretics to post them and the group reserves the right to delete stuff arbitrarily. Direct contact is via the site to the Arch-Heretic, Jamie Cawley, on jamie.cawley3@gmail.com.