What We Got Wrong About the Internet
In order to effectively protect our customers, Expanse cultivates a deep understanding of the Internet. We use our global perspective to help our customers understand both what they own that is connected to the public Internet and how to keep their assets secure. Since joining Expanse as a software engineer, I’ve learned about many pervasive, incorrect assumptions related to the history and structure of the Internet – many of which I myself used to hold. These bad assumptions have cropped up throughout the history of the Internet and continue to influence how many think about cybersecurity today.
The following are three big things the world got wrong about the Internet:
1. The Internet is Inherently Global
The early Internet was initially developed as a closed network of trusted participants, primarily university and government researchers. This means that in the early days, the Internet was a local system, not a globally accessible network.
The invention of the Internet is fairly well-documented in Request for Comment (RFC) documents, the common design documents of the Internet. These RFCs tell a detailed story of how the technologies that form the Internet came to be. If you go back and read the RFCs from the 80’s for a wide variety of protocols, you’ll hardly encounter the word “security” at all.
When I looked through old RFCs, I was initially surprised to see a section called “Precedence and Security” in the RFC for Transmission Control Protocol (TCP). TCP is one of the fundamental protocols that allows two endpoints on the Internet to connect with one another. It’s a pretty simple idea: the endpoints initiate and confirm their connection in a “handshake” step, and then those endpoints can send each other chunks of data in “packets.” The two sides acknowledge every packet, so each side can be certain that the packets it sent were received.
The RFC documenting TCP was published by the US Defense Advanced Research Projects Agency (DARPA) in 1981 – a time when almost no RFCs made mention of security at all. So why does TCP’s design document have that “Precedence and Security” section?
It’s because this section was not talking about the kind of security we think of in cyber today. Instead, it refers to levels of classified information. TCP was a government project, and DARPA wanted a way to specify the classification level each packet. But there was nothing built into the protocol that actually secured it against hijack, eavesdropping, or any other kind of attack. The Internet was a closed and trusted network, and no one was worried about malicious actors hacking into their systems.
Starting around 1990, the Internet got a lot less local. More and more of the RFCs and papers published during this time discuss standardization, scaling, globalization, and, yes, security. But security measures were added on as an afterthought to many of the backbone protocols we still use today. We knew TCP was insecure, and some argued that it would be best to overhaul the whole thing, but change was really hard once the system was already scaled up. The TCP we use today is almost unchanged from its original 1981 specification. So how did we move forward? How did we build secure systems on such insecure foundations?
We use what is called the “end-to-end” principle: it is the job of the applications running at the endpoints to handle the complex task of security, and those endpoint applications cannot rely on security from the lower-level networking steps. Even if TCP is insecure, we have protocols like TLS (Transport Layer Security) that work on top of TCP to ensure a secure end-to-end connection. This is why it is critically important that we all keep the applications we use secure and up-to-date — because we cannot rely on the backbone of the Internet for security. After all, it was built for a local network of trusted participants, not to connect a world of strangers.
2. The Internet is Large
Now that we have a global Internet, we generally think of it as absolutely massive — a vast sea of connected people and devices. It is unsurprising, then, that people often think they can put something up on the public Internet, say an insecure server or a database with customers’ personal information, and no one will find it because the Internet is so big. Why would anyone be looking at your IP? Who even knows to go to your domain name? But the Internet is small enough — and technology is good enough — that attackers can look at every single endpoint on the public Internet to find vulnerabilities in less than an hour.
We believe that this kind of malicious scanning, generally using large sets of stolen machines (a botnet), started in around 2000. We think this because, in 2007, researchers at Berkeley and the University of North Carolina looked at the history of stray packets flying around the Internet, not part of any particular conversation. This kind of activity is so constant and pervasive that there is a fancy name for it: Internet background radiation. Internet background radiation is often caused by malicious actors probing and scanning to look for vulnerabilities on the global Internet.
You can listen to Internet background radiation for yourself by putting up a machine on a public IP with no services or advertised domains — no reason anyone for anyone to try to connect. If you log all incoming connections, you’ll see that a lot of machines are still trying to talk to you. Expanse regularly does this kind of monitoring, because it tells us what attackers are looking for based on how they’re trying to connect and the ports they’re targeting.
When Internet-scale scanning began around 2000, you needed thousands of machines to be able to scan the Internet quickly. This meant that Internet-wide scanning was largely restricted to malicious actors who could co-opt large numbers of computers. However, in 2013, independent security researchers and academic groups created and published multiple software packages that allowed fast Internet-wide scanning from a single machine. It was finally possible to scan without stealing time on a bunch of machines, which meant that individuals and organizations with only minimal resourcing and moderate levels of technical sophistication — including both researchers and malicious actors — were able to find vulnerabilities across the entire Internet. Shortly after these academic research projects were initially released, Expanse realized that by fusing Internet-wide scanning data with other Internet-wide data we gather, we could facilitate finding and, much more importantly, operationally fixing vulnerable systems belonging to the world’s largest and most important organizations.
This kind of scanning technology makes the Internet small. Relying on the size of the Internet to hide insecure servers — security by obscurity — is not a viable strategy. Fortunately, it’s not just the attackers who are able to scan the global Internet. Expanse can tell our customers quickly what their company network assets are and when a risky protocol comes online on their company network so they can remediate the vulnerable service.
3. Control of the Internet is Decentralized
Before I started working at Expanse, I thought that the Internet was decentralized. In my mental model, the Internet was globally shared and communally operated. There is definitely some truth to that — but there is also a lot of centralized organization.
How does your computer know where to get information about `expanse.co`? Well, it needs to know where to look for the content associated with that domain name, `expanse.co`. There is a lot of complexity in how this lookup actually takes place, but fundamentally, your computer needs to ask the people who organize all the `.co` domain names. These domains are organized by Neustar, which also has authority on all `.biz`, `.us`, and `.nyc` domains.
Neustar and other organizations like it are tied together in what’s called the “Domain Name System,” which organizes who owns which domain name and allows your computer to look up `expanse.co` quickly and efficiently. Once your computer successfully does a domain name lookup, it gets back an IP address where it can connect to the website.
But why does Expanse Inc. get to decide what content is hosted on that particular IP address? Expanse works with a hosting provider. That hosting provider, in turn, leases the IP from ARIN, the American Registry for Internet Numbers. ARIN is one of five major Regional Internet Registries, which have the job of handing out IP addresses based on geographic region — ARIN serves all of North America, and there’s one for Latin America, one for Europe, one for Asia, and one for Africa. And these five Regional Internet Registries, in turn, get their power from IANA, the Internet Assigned Numbers Authority. IANA is one of the biggest centralized bodies governing the Internet, and it’s an arm of the Internet Corporation for Assigned Names and Numbers (ICANN). There is a theme here: the Internet is composed of many distributed systems that rely on consolidated roots of trust. This allows us to have a fast and well-organized Internet, but it also means there are potential points of weakness that can impact the entire system.
Take, for example, DynDNS, a company that helps end users set up domain names. In October 2016, a distributed denial of service attack against DynDNS took down many of their servers — and effectively pulled down Twitter, Reddit, GitHub, Amazon.com, Netflix, and many more major websites for US-based Internet users. Centralized control can be a weakness if it allows for a single point of failure.
Cybersecurity is About Scale and Visibility
We have a picture in our heads of cybersecurity as revolving around sophisticated defenders working to thwart increasingly sophisticated attackers. We like the thrilling idea of brilliant hackers for good and evil, battling against each other by coming up with complicated algorithms and detailed plans. And this is a piece of cybersecurity, sure. But if you have the most top-of-line, state-of-the-art, sophisticated firewall set up, but you forget about just one IP that’s part of your network — well, if an attacker finds that one IP, that one small door into your system, you still have a big problem.
Cybersecurity today is a game of scale and visibility. The attacker’s play is to know every door into your system — and into everyone else’s systems too — just in case one is left unlocked. And our play is to know about every door, too, so we can help our customers make sure that every one of them is secure.
- RFC on cutting over to TCP/IP, no mention of security: https://tools.ietf.org/rfc/rfc801.txt
- RFC on implementation of TCP, security only in “classification” field: https://tools.ietf.org/rfc/rfc791.txt
- Announcement about ICANN gaining independence from US gov: https://www.icann.org/news/announcement-2016-10-01-en
- A brief history of Internet scanning (via looking at internet background radiation): http://conferences.sigcomm.org/imc/2007/papers/imc76.pdf