Who stole my disk space?!

I’ve seen this question asked a fair bit around the internet. Whilst there are many explanations to the reason why this is, I thought I’d offer my own flavour in the hope that it will help a few more people understand the issue.

The scenario is this: you’ve just bought some external or internal storage for your computer, whether it be a hard drive, flash storage or an SSD. You check the hard drive on your computer and shock horror – it appears there’s less space on your drive than was quoted on the box! I recall many years ago thinking that this was because of some invisible partition using up some space on all drives to help with ‘administration’ of that drive. Some sort of native system files, if you will. This theory probably made sense to me at the time for smaller drives, but take something larger and you’ll realise that the amount of space you’re seemingly ‘losing’ is rather ridiculous if it was for a few system files.

Let’s say you’ve just bought a 500 GB external hard drive. You plug it in, and you see this (the bottom drive):

Total size 465 GB! Cripes! Where have the other 35 GB gone? (There isn’t any horrible built-in software in this drive either).

Binary vs Decimal

The answer is really just a case of confusing and conflicting measurements. If you don’t know already, electronics and computing depends on the binary number system. This is a fancy way of saying there are only 2 unique digits (0 and 1) in each position, and each digit to the left in a binary number represents the next power of 2 (I went into number bases a bit more in an older article).

Our decimal number system has standard prefixes that represent powers of 10 for each increment of 3 i.e. for each thousand. In other words, we have ‘kilo’ to mean 1,000s (or 103), ‘mega’ to mean 1,000,000s (106), ‘giga’ to mean 109s and so on. However, since binary works in powers of 2, the closest we can get to 1000 is 1024, which is 210. The next prefix in binary would represent increments of 220, or 1,048,576. Finally, the analogue to ‘giga’ would be 230, or 1,073,741,824.

The problem lies with the distinction between these two measurements. Technically, the decimal system is implied with the terms kilo, mega, giga, tera etc. Hence, 1 kB is 1000 bytes, 1 MB is 1,000,000 bytes and so on. When you buy some storage, this is how the capacity is typically measured. So our 500 GB hard drive can store 500,000,000,000 bytes.

However, a computer can only work out numbers in binary, because it relies upon on/off electrical signals to perform calculations. So a computer will use the 210 prefixes I described earlier. Because 210 ≠ 103, we can’t use the same prefix names really. Hence, the prefixes are named kibi, mebi, gibi and so on. So a computer sees things as kiB, MiB and GiB rather than kB, MB and GB.

For what I can only presume was a legacy decision to simplify the notation, the distinction between the two measures was omitted and operating systems decided to just label MiB as MB etc. Unfortunately, this makes it a little confusing for people who don’t know this, and it is what causes the discrepancy between disk capacities.

Back to our 500 GB drive

As I said, our 500 GB drive actually holds 500,000,000,000 bytes. If a computer is using the binary prefix system, it breaks this up into chunks of 1024. We divide 500,000,000,000 by 230, giving us 465.66 (to 2 decimal places). This explains why our Windows XP screenshot earlier shows 465 GB as the capacity of the drive. What it really means is that the capacity is 465 GiB.

More modern operating systems have chosen to convert to displaying in decimal units exclusively to avoid this confusion. Here is the same drive in OS X:

As you can see, the capacity is shown as 499.76 GB, which means it is using decimal notation (I assume the reason it is not exactly 500 is because of rounding errors in the conversion process from binary to decimal, but I’m not certain of this).

I believe the latest Linux distributions have also transitioned to decimal notation, but I’m not entirely sure whether Windows has done so. If not, I can only imagine it is a matter of time before they follow suit. Whilst binary notation is useful for programmers and computer scientists, it is largely a source of bemusement to the average end-user of a computer in today’s era.

Leave a comment

Time limit is exhausted. Please reload CAPTCHA.