Thoughts on IPv6 Measurement

About five years ago two projects, Zmap and Masscan, helped to shift the way that many researchers thought about the Internet. The tools both provide a relatively optimized code path for sending packets and collecting replies, and allow a researcher with moderate resources to attempt connections to every computer on the IPv4 Internet in about an hour.

These techniques are widely applied to monitor the Internet-scale security of services, with prominent examples of censys.io, scans.io, and shodan.io. For the security community, they have become a first-step for reconnaissance, allowing hackers to find origin IPs masked by CDNs, unadvertised points of presence, and vulnerable hosts within an organization.

While the core of the Internet and the services we actively choose to connect with remain staunchly IPv4, the networks that many end hosts are connected to are more rapidly adopting IPv6, responding to the exhaustion and density of the IPv4 address space.

This fall, a new round of research has focused on what is possible for the enumeration and exploration of the IPv6 address space. ‘You can -J reject but you can’t hide’ was presented at CCC, focusing on spidering DNS records to learn of active IPv6 addresses which are registered within the DNS system. Earlier in the fall, there were several sessions at IMC thinking about IPv6. Most notably, “Entropy/IP – uncovering entropy in IPv6″, which looks at how addresses are allocated in practice as seen by Akamai at the core of the network. In addition, IPv6 was the focus of a couple WIP sessions, expressing thoughts on discovering hosts through progressive ICMP probing, as well the continued exploration of what’s actually happening in the core as seen by Akamai.

Where does this growing understanding of wide-scale IPv6 usage take us?

Enumeration of candidate addresses is a new first step that will be needed for anything beyond a single prefix. Even then, scanning within a single organizational prefix can be considered an active brute-force attack, rather than the relatively ‘harmless’ reconnaissance of IPv4 scanning.
There are many potential sources to interact with for enumeration, including DNS records, observed network traffic, and default ::1 addresses. The Entropy/IP paper points out that shodan.io has already been observed adding itself as a member of the NTP pool to harvest candidate IPv6 addresses for scanning.
Address generation for many hosts is not fully random, embedding a mac address, IPv4 address, or other non-random information. This can be used to discover a subset of hosts more efficiently, though still not at Internet scale. (for example, 2^32 attempts to look for hosts of a specific brand within a 2^64 network address space.) This would still sends several gigabytes of traffic to an individual network in the process of scanning. Non-random addresses tend to be more often associated with servers and routers than with end-clients.
Discovery of network topology is possible by enumerating where error responses to guessed addresses come back from. This doesn’t allow for discovery of individual machines either.

What do we do about it?

There will probably not be a shodan.io for ipv6 in the same way there is for ipv4. Instead, much of the wide-scale scanning on the IPv6 network will be performed through reflection from hosts discovered through their participation in other active services, for instance bit torrent, NTP, or DNS.

Conversely, the number of vulnerable IPv6 hosts will keep growing, because they can exist for much longer before anyone will find them. This will likewise increase the value that can be obtained through scanning – both to hackers, and to academics looking at Internet dynamics. We can expect to see a marketplace for addresses observed passively by ISPs, the network core, and passive services.

It’s worth also watching the watchers here: which providers are “selling me out” so to speak? It would be worth building the honey-pots to observe which services and servers leak client information and lead top probing and the potential for compromise of end hosts.