Data Sure is Neat! Part 1

dataLibrary_82175447_400.jpg

It isn’t a secret that technology has come so, so far as compared to just a short time ago. Take data, for instance. Let’s dive into why something so incredibly small is so incredibly important (and while we’re at it, how much space today’s data would take up in other formats).

What is Data?

To explain data, let’s go back to how data was stored before we had fancy-schmancy computers and smartphones and the like—in books! The typical novel contains somewhere from 60,000 to 110,000 words, with lengthier epics containing more (as you would expect). To get a sense of exactly how much data this translates to, let’s turn our attention to the typical text message, with its maximum of 160 characters, and anywhere from one to three sentences on average. English-language characters are each 7 bits, with each bit represented by a 1 or a 0. 01000001 stands in for “A,” while 01000010 stands in for “B.”

So, what does this matter? I promise, we’re getting there.

A text message can contain a total of 1120 bits of data, 1120 ones and zeroes. With 8 bits in every byte, this translates to 140 bytes per text message. 1,000,000 bytes make one Megabyte.

The average word to be found in our novel is made up of about five characters, meaning it could contain anywhere between 300,000 to 555,000 characters. Multiplying by seven for the bits that make up each character, and dividing by 8 for the bits in each byte, we have 481,250 bytes, or 0.48125 Megabytes, in our book.

Following this logic, a large book could contain about a half a Megabyte of information…then you also have to consider metadata, the cover, and the other assorted information an ebook would contain, which means it’ll be about one Megabyte in size. This is added to further by any images or illustrations.

How Much Data Can Be Found in a Library?

Okay, so because libraries often hold books that are far larger than the average novel—textbooks, reference books, encyclopedias, dictionaries, and the like—some files will be much, much larger than a Megabyte, while others could very well be much smaller. For simplicity, let’s assume that the average book in our hypothetical library equals one Megabyte.

The typical library generally holds between 5,000 and 500,000 books, although some hold millions. The United States Library of Congress, for instance, has over 51 million books, 25 million manuscripts, and millions of other items in its massive collection. Again, to keep things simple, let’s omit everything but the text in each of the 51 million books and calculate the data stored within.

51 million Megabytes equals about 51 thousand Gigabytes, which then converts to 51 Terabytes. Many PCs contain 1 to 2 Terabyte drives, so the entire book collection of the Library of Congress could be contained on about 25 home computers. Crazy, when you think about it.

Of course, we don’t currently have mobile devices with this kind of capacity, but who knows what the future will hold.

It’s also important to acknowledge that we only calculated based on the text alone. If each book was scanned in as images, you could expect the total per book to be closer to 8 Megabytes, with a need for 408 Terabytes to hold the Library of Congress. That would take far more than a room of workstations to contain.

Stay Tuned for the Data Your Organization Handles

Next time around, we’ll discuss how much data the average human being generates, in addition to what is stored in your business each day. Make sure you check back so you don’t miss it!