How The Internet Works
,
Back to Table of Contents >

Let’s recall Agnes in the elevator that shook as if seized by Saint Vitus’ dance. Even though she was a cybernetics expert, she didn’t have any idea what was going on in the head of that machine which was as strange and impenetrable to her as the mechanism of the various objects with which she daily came into contact, from the small computer next to her phone to the dishwasher. In contrast, Goethe lived during that brief span of history when the level of technology already gave life a certain measure of comfort but when an educated person could still understand all the devices he used. Goethe knew how and with what materials his house had been constructed, he knew why his oil lamp gave off light, he knew the principle of the telescope with which he and Bettina looked at Jupiter; and while he himself could not perform surgery, he was present at several operations, and when he was sick he could converse with the doctor in the vocabulary of an expert. The world of technical objects was completely open and intelligible to him. This was Goethe’s great moment at the centre of European history, a moment that brings on a pang of nostalgic regret in the heart of someone trapped in a jerking, dancing elevator.

– Milan Kundera, Immortality

How much does the internet weigh?

Well, according to Moss, Richard Ayoade’s character in The IT Crowd, ‘The internet doesn’t weigh anything. It goes on top of Big Ben, that’s where you get the best reception.’

Actually, Moss is nearly right. All of the data on the internet weighs about the same as a single small grain of fine sand. That’s everything — YouTube videos, holiday snaps, emails, tax returns, ebooks, music, pornography and so on. It’s expressed in code made up of zeroes and ones, or ‘bits’.

Those bits are stored on computer hard drives. Old hard drives used spinning discs to store data, similar to records (and CDs/DVDs). Modern hard drives use billions of tiny electromagnetic switches to store the ones and zeroes. When a switch is in use, storing a bit, it weighs a tiny bit more than an unused switch. The combined difference in weight of all of the information on the internet adds up to no more than a grain of sand. Or, to put it differently: a full ereader with hundreds of books weighs roughly an attogram (0.0000000000000000001g) more than an empty one.

Obviously the data itself is pretty much worthless if it can’t be accessed. This data is made available to you by a special type of computer: servers. The term (probably) originated from the hospitality industry. These servers are owned by companies and organisations like Google, Apple, Facebook, Spotify, Netflix, and the BBC. Recent estimates put the number of servers at 50-100 million, weighing an average 25kg each. Their combined weight is in the same ballpark as the weight of 500,000 cars.

That’s what stores the data – but then you have all the user devices, the bit most people actually hold in their hands and use. It is very hard to come up with a good estimate of the weight of all these devices — laptops, mobile phones, televisions, smart fridges, thermostats, connected vacuum cleaners, smart watches, Boris bikes, children’s toys, baby monitors and even cars are connected to the internet nowadays. If you include all these connected consumer devices, the weight of the internet becomes absolutely enormous.

Whew. What does that all cost, then?

To really understand what the internet costs we need to break it down into smaller components.

Let’s start with the core infrastructure of the internet — comparable to a large network of roads. People use these roads to transport all sorts of things. On tarmac, Royal Mail transports letters and newspapers and parcels, DHL delivers parcels, parents help their children cycle to school, and large trucks deliver stock to supermarkets.

The tarmac is very similar to the cables that transport internet data. They act as conduits for anything ranging from emails, articles on the Daily Mail website, family photos, YouTube videos, to the latest software update for your Macbook. On every large intersection of cables stand routers that make sure that each packet of data takes the right turn in order to get to its destination quickly and safely. If there’s too much traffic on a particular stretch of the route, data packets will be redirected — much like your satnav telling you when to avoid the M25.

There are no reliable estimates of the total length of all cables that power the internet, but what we do know is that the total length of all undersea cables exceeded 600,000 miles a couple of years ago. Special types of boats are used for laying the cables — which are thrown out like fishing nets into the sea. The thickness of the cable depends on the depth at which it lies: shallow cables are about the size of a can of coke, deeper cables have a thinner protective layer and are therefore much smaller. New cables are being added all the time to support the growing internet traffic; for example, Google announced earlier this year that they’ve invested in a new cable between Japan and Australia.

There’s one other key part of the infrastructure of the internet: a large distributed address book called  ‘domain name system’ or simply ‘DNS’. It’s basically the digital version of the Yellow Pages, on steroids. Your computer will contact these Yellow Pages whenever you send an email or want to visit a website in your browser. The domain name system will tell your computer where it can request information, or where it needs to send the data to.

So what are the running costs of all this? Unfortunately that’s not easy to answer, because a lot of the infrastructure (cables, routers, DNS) is owned by private companies: the internet is a global system of interconnected networks. A very significant part of it is the responsibility of a large number of different organisations — ranging from enterprises like Google to smaller governments and publicly-funded bodies like universities.

Although we can’t reliably estimate the total running costs, analysts have calculated what we collectively pay for our internet connections. It’s about $450-$550bn — comparable to the GDP of Belgium (but only half the value of Apple).

But that’s just the internet’s core infrastructure — the tarmac, the satnavs, the address book. What about the services that are offered on top of that infrastructure–  the machinery that powers Netflix, Amazon, and BBC iPlayer? It’s estimated that in 2018 alone, the revenue of all server vendors combined adds up to about $75bn. That’s just from selling the bare steel and silicon – before you’ve even begun to think about the salaries of the hundreds of thousands of people who are responsible for keeping all that up and running.

And all of that hardware has to be powered by electricity. Back in 2016, scientists at the Lawrence Berkeley National Laboratory estimated that the servers in the US alone use approximately 70bn kWh per year. Although that’s 1.8% of the overall energy consumption in the US, it’s ‘only’ about $3.5bn.

All of that totted up on the back of an envelope? $600bn.

Where is the internet?

Everywhere apart from the London Underground.

Actually, it really is everywhere. We used to rely on cables to facilitate communication. The first cable was laid between Ireland and Newfoundland (then still  ‘British America’). This telegraph cable reduced the communication time between North America and the UK from days to minutes. By 1896, Kipling had already intuited the global village in his poem ‘The Deep Sea Cables’: ‘Hush! Men talk to-day o’er the waste of ultimate slime, / And a new Word runs between: whispering, “Let us be one!”’. Now, 122 years later, we can even use the internet on transatlantic flights, using a swarm of satellites that circle the earth. In fact, the internet connection at the International Space Station is about 20 times faster than the average UK household connection.

That said, satellites are rarely used for normal internet communication: they’re too slow and expensive. For example: if we were to have a Skype conversation via a satellite connection between Oxford and Utrecht, the bandwidth of that connection (the number of bytes it can transfer per second) would easily be good enough to send both video and voice data. However, every byte that sent from Utrecht will take about 600 milliseconds (that’s more than half a second) to arrive in Oxford. That means that theoretically we’ll have perfect sound and picture quality, but our conversation would suffer from a permanent delay of 600 milliseconds. This is caused simply by the time it takes to transmit things from the surface of the earth, to a satellite (sometimes more than one), and back down to earth again. Such satellites are typically positioned in a geosynchronous orbit, at about 35,000km above the surface of the earth.

WiFi is a very local signal: it’s local to a building, or a group of buildings. The signal is transmitted from your laptop to a small WiFi access point that is never more than a few rooms away. (That access point is the end of a smaller cable which ultimately joins up to all the bigger ones under the sea.) Mobile phone signal carries further: the distance between your mobile phone and the masts installed by mobile phone providers (EE, T Mobile, …) can easily be 10 miles.

As for those 50-100 million servers? They are typically stored in warehouses (also called ‘farms’), and strategically positioned all over the world. Their locations are sometimes kept secret by their owners for security reasons. If disaster (for example fire, or an earthquake) strikes in one location, the data and services provided by those servers can be picked up by servers in another location. To make this possible, every piece of data is typically mirrored in multiple locations.

Some of these farms are located in cold areas to help cool servers down (and save on air conditioning), others  are positioned in areas where electricity is cheap, or close to potential users. For example, Facebook has a warehouse in the north of Sweden on the border of the arctic circle, and Google has a warehouse in Singapore.

So what actually happens when we press ‘send’?

Email works very much like old-school post. You write a message, press  ‘send’, and your computer hands the message over to your ‘local’ postal sorting office. Let’s assume you’re sending a message to craig.raine@new.ox.ac.uk using your Gmail account; Gmail is your sorting office.

Once the email has been processed by Gmail, it contacts the internet’s ‘Yellow Pages’ — the Domain Name System (DNS) — to figure out where to find Craig’s mailbox. DNS is decentralised: there’s not a single large phone book, but multiple levels of phone books.

Craig’s email address contains some hints about these levels of phone books – rather like a postcode. His address ends in uk. The UK’s DNS ‘Yellow Pages’ will contain an entry for ac.uk: it will direct Gmail to the phone book containing the details of addresses ending in ac.uk. Gmail consults the DNS ‘Yellow Pages’ for ac.uk, which contains a reference to a phone book for addresses ending in ox.ac.uk. And so on.

This quest ends with Gmail obtaining the globally unique address of the mail sorting centre for new.ox.ac.uk: the IP address of the mail server that is responsible for new.ox.ac.uk. (IP here means ‘internet protocol’, not ‘intellectual property’.) Gmail can then use that IP address to connect to the server, and deliver your message.

The mail server responsible for new.ox.ac.uk might not necessarily be in New College itself, though, or even in one single place.

Warehouses with large numbers of servers are strategically positioned all over the world. Sometimes, their locations are kept secret by their owners. That way, if disaster (for example fire, or an earthquake) strikes in one location, the data and services provided by those servers can be picked up by servers in another location. To make this possible, every piece if data is typically mirrored in multiple locations.

When computers talk to each other, they speak one of many different languages, or protocols — one for every type of application. To deliver email, computers speak SMTP: Simple Mail Transfer Protocol. It’s a sequence of mangled English words and numeric status codes. Think of it as an extremely absurdist one-act play by Ionesco. Here it is, with modern English translation:

new.ox.ac.uk server: 220 mail.new.ox.ac.uk

Hi! I’m the email server for New College, Oxford, UK.

Gmail: EHLO gmail.com

Pleased to meet you. I’m gmail.com!

New: 250 OK

OK… And?

Gmail: MAIL FROM geeks@gmail.com

I’ve got mail from geeks@gmail.com! You interested?

New: 250 OK

OK… And?

Gmail: RCPT TO craig.raine@new.ox.ac.uk

It’s for Craig — you know him?

New: 250 OK

OK… And?

Gmail: DATA

I’m talking to a broken record. Ready for the email text?

New: 354 End data with <CR><LF>.<CR><LF>

Sure thing. Just end with a single dot on an empty line.

Gmail: Hi Craig. Here is our piece for the next Areté. Best wishes, Steven and Bas.

.

New: 250 OK: queued as DECEBA0060

Got it. Your reference is DECEBA0060

Gmail: QUIT

Thanks. Over and out.

Who invented the internet?

Many Brits believe that Sir Tim Berners-Lee invented the internet. He did not.

Sir Tim did give us the World Wide Web – an incredibly important invention. You may think the internet and Sir Tim’s Web are the same, but they’re not, and that’s an important detail to understand.

The Web is a collection of pages – rather like a library – that we access using a web browser. These pages are made available on the global system of interconnected networks – the internet. In other words: the Web is an application which uses the internet, just like email, WhatsApp, and Tinder.

The first person to come up with the concept of the Internet was Leonard Kleinrock, who wrote a paper about it in the early 1960s. Inspired by the idea, Elmer Shapiro published a report in 1968, which describes a technique for sending messages from one computer to another. On October 29, 1969 the first ‘internet message’ was sent from Kleinrock’s laboratory at UCLA to the Stanford Research Institute (SRI). At the time, the ‘internet’ consisted of only four machines, one in each of the following locations: UCLA, SRI, University of California Santa Barbara and the University of Utah. This network was called ARPANET, for Advanced Research Projects Agency NETwork. It was funded by the American Department of Defense.

From then, things moved very quickly. The first email is sent in 1971 by Ray Tomlinson; in the mid-1970s Vincent Cerf and Robert Kahn design one of the most-used and fundamental internet protocols; and in 1974 the first commercial version of ARPANET is introduced. In the years after that, more of the internet’s foundations are put in place, including the internet protocol (IP), from which the IP addresses are derived.

It’s now the early 1990s; enter Sir Tim. He’s at CERN in Switzerland, where he develops the standard format for web pages: HTML, the Hyper Text Markup Language. He also designed the first web browser to view his HTML web pages. Those two inventions are commonly marked as the birth of the world wide web.

There is only one language in which  ‘world wide web’ contains fewer syllables than its acronym  ‘www’.

In the hit musical Avenue Q, there’s a song about the origin of the internet. ‘The Internet is really really great… FOR PORN. It’s like I’m surfing at the speed of light… FOR PORN. The internet is for porn! The internet is for porn! Why do you think the net was born? Porn! Porn! Porn!’

Is there any truth in these rumours?

According to Avenue Q’s porn-obsessed, foul-mouthed Trekkie Monster, the internet is for porn. And many conservative and religious news sources agree with him: an oft-cited statistic is that 30-50% of the internet is porn. That’s also the figure used to market products like ‘Net Nanny’. Another popular claim: porn sites attract more visitors than Amazon, Netflix, and Twitter combined.

Fortunately (or unfortunately?), those stats are far from accurate. Researchers without a political, moral, or commercial agenda have found that of the one million most visited websites, ‘only’ 42,371 (or 4%) are sex related. The same researchers also tracked web searches: 13% involved porn. That matches the estimates from officials at some of the world’s largest search engines: 10-15%.

So, as with many things, depending on how and what you measure, the outcome is very different: 4-15% of the internet is for porn. Far from half, or even a third.

Having said that, it is highly likely that porn used to play a bigger role on the net. The adult industry has an impressive track record of being a front-runner when it comes to adapting to new technology, and influencing its development. For example, it (allegedly) gave a huge boost to 8mm film projectors and cameras due to the widespread demand for ‘glamour home movies’ in the 1950s and 60s. Not long after that, pornography played a role in the battle between VHS and Sony’s Betamax standard. Sony supposedly refused to allow smutty content on their tapes; you can probably guess who won that battle.

Talking about sex and the internet – why does everybody keep sending me emails about penis enlargement?

45% of all email messages are spam of some sort. Roughly half of those are related to enlarging bodyparts or centered around some other adult theme (not counting low-cost mortgages). Researchers from UC Berkeley have calculated that on average, spammers need to send 12.5 million messages to trigger a single purchase. That sounds like a very ineffective venture, until they realised that sending these emails is almost completely free, and some of the larger spam organisations can easily send about a billion messages per day. Those messages led to 80 purchases of about $100 each. In other words: these spammers are making $5.50 per minute!

We tried to find out why penis enlargement emails specifically are so popular, with very little result — apart from a thoroughly polluted internet search history. A best guess: the messages are aimed at particularly insecure and vulnerable people. Can any of Areté’s readers enlighten us? Answers on a, er, postcard.

Who polices the internet?

There is no single internet police force, and there are no laws that govern the internet as a whole. Many countries have tried to control the internet in one way or another. The infamous Great Firewall of China is a series of laws and technologies that are designed to ‘protect’ the people. Similar technologies are in use in countries like Cuba, North Korea, and Burma.

Different countries also assign vastly different powers to their own domestic law enforcement. For example, Reporters Sans Frontières — an organisation that advocates freedom of press — has recently put both the US and the UK on their list of ‘enemies of the internet’ for passing laws that permit mass surveillance by agencies like GCHQ. Should a government be permitted to eavesdrop on its citizens like that? Some argue that these powers are necessary to keep people safe, others cry 1984.

This debate came to a head during the 2010 London riots, when David Cameron and his home secretary (one Mrs May) called for Facebook and Twitter to be shut down to protect the public order. People were quick to compare his suggestion to the actions of governments in the Middle East during the Arab Spring.

It’s not just government spooks listening in on our conversations, though — large-scale data analysis is also performed by firms like Google, Facebook, and Cambridge Analytics. The majority of Areté readers carry a microphone with them almost everywhere they go — in their phone. We do it completely voluntarily and are in many ways addicted to our phones. Does that sound Orwellian, or are we just entering a brave new world? Consider this: would you ever choose to stay in a hotel room that was bugged with microphones? Of course not. Yet the Marriott hotels are installing an Amazon Alexa personal assistant in every single one of their rooms in the US.

Orwell and Huxley aside — even with supreme surveillance powers, governing the internet domestically is not an easy task. What is illegal in one country is perfectly legal in another. For example: some American websites will happily sell you drugs that are completely illegal in the UK. And in 2014, the government moved to ban particular sexual acts from pornography produced in the UK — including female ejaculation, spanking, bondage, and face sitting. But the internet knows no borders, so a domestic ban really has very little effect.

What is the dark web?

The dark web is like a large network of connected speakeasy bars: if you know your way around, a whole new world opens up to you. It was designed to provide users with absolute anonymity. The best-known example of a (former) service on the dark web is the Silk Road marketplace — an eBay for illegal goods and services. It was eventually taken down by the FBI; its operator (nicknamed Dread Pirate Roberts) was arrested.

Some people refer to the dark web by its technical name: Tor (The Onion Router). Surprisingly enough, it was developed in the 1990s by the US Naval Research Laboratory with the intention to protect online US intelligence communications. Soon, journalists and human right activists started using it, especially in countries where freedom of speech is suppressed.

The Onion Router works by concealing the location and IP address of its users by adding multiple layers of indirection and encryption to all internet traffic. For example, a Tor user located in London may be accessing a website which is hosted in Amsterdam, but all the traffic between the user and the website deliberately takes a large number of detours to hide the Londoner’s real location and identity. A mazy route to throw unwanted attention off the scent.

These days, the dark web is no longer known as a platform for freedom of speech, but mostly as a collection of marketplaces that sell merchandise that is highly illegal in most countries — ranging from all sorts of drugs to child pornography. Most people who get caught forget that although their online identity is indeed protected, the address to which they have their purchase delivered is easier to find: one day it’s the FBI or police on their doorstep, rather than a bag of Black Bart.

What is the Cloud?

These days, everything happens in this magical ‘Cloud’. Google and Apple store your files in The Cloud, the BBC offer iPlayer from the Cloud, and universities use ‘Cloud computing’ for their research.

‘The Cloud’ is a term people use for data and services that are stored on someone else’s servers. Apple iCloud stores your files – just as we used to store our files on floppy discs or CD-ROMs. The only difference is that in the Cloud, everything is bigger. Much bigger. It’s powered by many extremely large data warehouses, all connected to the internet. Every file is stored on multiple hard drives in multiple locations, to make sure the data is always available even if there’s a power outage or disaster strikes. 

So how many hard drives does Apple have to operate the iCloud? Funnily enough: none. Apple are paying Google to run it for them.

What is the Internet of Things?

When people think about the internet, they mostly think about using Gmail and Wikipedia on their laptop or smartphone. But increasingly often, other types of devices are connected to the internet too: smart thermostats, cars, fridges, industrial applications (including things like power reactors), bikes, smart locks, toys, televisions, baby monitors, smart watches, fitness trackers, speakers, radios. There are even smart rubbish bins that automatically notify the city council when they’re full!

Most people don’t consider these things to be computers — they’re fridges and televisions. Hence the Internet of Things.

In order to send and receive data, every device that’s connected to the internet needs to have an IP address. The internet was designed to have about 4 billion such addresses. While it is possible for multiple devices to share the same address, the world is running out of IP addresses — like many UK cities ran out of phone numbers in the 1990s and 2000s.

But surely 4 billion unique IP addresses should be enough for everyone and their uncle, especially if we share addresses efficiently? One of the problems is that these addresses have been (very inefficiently) assigned in blocks to organisations, countries, and subcontinents. Early adopters of the internet were given a whole block of IP addresses, each consisting of millions of addresses — at the time, there were plenty anyway! For example: American behemoth Hewlett-Packard (HP) was given a large block of 16 million addresses in the late 1980s. It subsequently acquired Compaq, which had been assigned a same sized block. HP’s 32 million IP addresses vastly exceeds all the IP addresses that have been made available to South Africa: just 20 million.

In the early 2000s, the first renumbering efforts started to switch to a new address system with over 340 trillion addresses. That’s enough to give every grain of sand on earth its own address, plus all the sand on another 100 earth-like planets. The renumbering is progressing slowly — very slowly. The main problem: old computer systems don’t support it. For example, Windows Vista (2007) was the first version of Windows to fully support the new address system. Imagine how many devices there are still connected to the internet that are older than that.

Who’s paying for all this?

There are multiple ways to look at this. Individual users will be paying an Internet Service Provider (ISP) like BT for the privilege of having their home connected to the internet. This pays for the connection between their property and BT’s central data centre. From there, BT is your gateway to the rest of the internet.

The costs of the thick cables that connect BT to the wider internet are often shared between the Internet Service Provider and the large companies who need the infrastructure to provide customers with their services. The internet cables on the floor of the ocean, for example, are partly paid for by companies like Google, Facebook, and Netflix. All of that is what we previously referred to as the ‘tarmac’ of the internet.

Companies like Google have large numbers of servers to allow them offer their services to their customers — and these servers are paid for by Google themselves. But how does Google make their money if most of their services are free? Here’s a rule of thumb: if a product or service is free, then you yourself are the product.

Google sells their product — you — to their advertisers. By doing so, they’re making about $75 billion a year. The approach Google uses is that companies can buy advertisements for select keywords: when a user searches for a particular keyword or something related to this keyword then the relevant advertisement shows up. Google charges the advertiser extra when a user actually clicks on the ad. Over the years, some of the most expensive keywords are: insurance ($55 per click), mortgage ($47.12), and lawyer ($54.86).

Facebook is free too. Guess what? The users are the product: Facebook makes a lot of money by selling (aggregated) user data and advertisements. Using that data, companies like Cambridge Analytics execute highly targeted political campaigns.

There’s no such thing as a free lunch, even on the internet.

Who owns our emails, and what happens to them when we die?

You own the copyright to your own emails — regardless of where these are stored. It doesn’t matter whether you use Gmail, Yahoo, or an @btinternet.com address. Every now and then a hoax surfaces to warn us that Google has now claimed ownership of our correspondence. They have not. The same applies to content that you post on Facebook, Instagram, and Twitter: the copyright remains with you, the author.

(Can you feel the caveat coming?)

BUT. What all of the terms, conditions, and other small print do say is that by using these services, you give the service provider (Google, Facebook, …) a license to use that data. That’s not surprising — without a license, Twitter wouldn’t be able to redistribute your tweets to a larger audience.

What exactly happens to all this data when the author dies depends on the individual service. Most companies have procedures in place for relatives to gain access to the data that belonged to their deceased relative. For example, Gmail allows you to set up a trusted contact who will be given access to your account if it becomes  ‘inactive’.

So, the posthumous fate of our emails and other data depends where it is stored, and our own preferences. Or does it? If there’s one thing we’ve learned in the past few decades, it’s that technology moves so rapidly that nothing can be considered permanent anymore. In the 1980s and 1990s, typewriters were replaced by the new standard of word processing: WordPerfect. It didn’t take long before Microsoft Word took over (unless your name was Jonathan Franzen). In the late 1990s and early 2000s, the vast majority of mobile phones were manufactured by Nokia — a company that seemed too big to fail. Yet Apple and Google took control within 10 years. Similarly, Microsoft and Windows were ubiquitous in the early 2000s, while today, large numbers of people use Apple Macbooks, and many others don’t have an old-fashioned computer or laptop at all. Will we still be using Google in 10 years’ time?

Change is still accelerating; nothing is here to stay. Who knows whether our emails will even still exist when we die.


'Arete is a journal as exquisite in its execution as in its intentions.'
John Updike

'Vous m’avez donné un grand plaisir … votre revue m’est très sympathique et proche.'
Milan Kundera