Excited to see this work show up at IMC in November.
I’m excited that the first project I helped on at Michigan will be presented at FOCI next month: An ISP-Scale Deployment of TapDance
Last week I talked briefly about the state of open internet measurement for network anomalies at IETF 98. This was my first time attending an IETF in-person meeting, and it was very useful in getting a better understanding of how to navigate the standards process, how it’s used by others, and what value can be gained from it.
A couple highlights that I took away from the event:
There’s a concern throughout the IETF about solving the privacy leaks in existing protocols for general web access. There are three major points in the protocol that need to be addressed and are under discussion as part of this: The first is coming up with a successor to DNS that provides confidentiality. This, I think, is going to be the most challenging point. The second is coming up with a SNI equivalent that doesn’t send the requested domain in plain-text. The third is adapting the current public certificate transparency process to provide confidentiality of the specific domains issued certificates, while maintaining the accountability provided by the system.
There are two proposals with traction for encrypting DNS that I’m aware of. Neither fully solve the problem, but both provide reasonable ways forward. The first is dnscrypt, a protocol with support from entities like yandex and cloudflare. It maintains a stateless UDP protocol, and encrypts requests and responses against server and client keys. There are working client proxies for most platforms, although installation on mobile is hacky, and a set of running providers. The other alternative, which was represented at IETF and seems to be preferred by the standards community is DNS over TLS. The benefit here that there’s no new protocol, meaning less code that needs to be audited to gain confidence of the security properties for the system. There are some working servers and client proxies available for this, but the community seems more fragmented, unfortunately.
The eventual problem that isn’t yet addressed is that you still need to trust some remote party with your dns query and neither protocol changes the underlying protocol where the work of dns resolution is performed by someone chosen by the local network. Current proxies allow the client to choose who this is instead, but that doesn’t remove the trust issue, and doesn’t work well with captive portals or scale to widespread deployment. It also doesn’t prevent that third party from tracking the chain of dns requests made by the client and getting a pretty good idea about what the client is doing.
SNI, or server name identification, is a process that occurs at the beginning of an HTTPS request where the client tells the server which domain it wants to talk to. This is a critical part of the protocol, because it allows a single IP address to host HTTPS servers for multiple domains. Unfortunately, it also allows the network to detect and potentially block requests at a domain, rather than IP granularity.
Proposals for encrypting the SNI have been around for a couple years. Unfortunately, they did not get included in TLS1.3, which means that it will be a while before the next iteration of the standard and the potential to include this update.
The good news was that there seems to be continued interest in figuring out ways to protect the SNI of client requests, though no current proposal I’m aware of.
Certificate Transparency Privacy
Certificate Transparency is an addition to the HTTPS system to enforce additional accountability in to the certificate authority system. It requires authorities (CA)’s to publish a log of all certificates they issue publicly, so that third parties can audit their list and make sure they haven’t secretly mis-issued certificates. While a great feature for accountability and web security, it also opens an additional channel where the list of domains with SSL certificates can be enumerated. This includes internal or private domains that the owner would like to remain obscure.
As google and others have moved to require the CT log from all authorities through requirements on browser certificate validity, this issue is again at the fore. There’s been work on addressing this problem, including a cryptographic proposal and the IETF proposal for domain label redaction which seems to be advancing through the standards process.
There remains a ways to go to migrate to protocols which provide some protection against a malicious network, but there’s willingness and work to get there, which is at least a start.
In 2014, Domain Fronting became the newest obfuscation technique for covert, difficult to censor communication. Even today, the Meek Pluggable transport serves ~400GB of Tor traffic each day, at a cost of ~$3000/month.
The basic technique is to make an HTTPS connection to the CDN directly, and then once the encryption has begun, make the HTTP request to the actual backing site instead. Since many CDNs use the same “front-end cache” servers for incoming requests to all of the different sites they host, there is a disconnect between the software handling SSL, and the routing web server proxying requests to where they need to go.
Even as the technique became widely adopted in 2014-2015, its demise was already predicted, with practitioners in the censorship circumvention community focused on how long it could be made to last until the next mechanism was found. This prediction rested on two points:
- The CDN companies will find themselves in a difficult position politically, since they are now in the position of supporting circumvention while also maintaining a relationship with the censoring countries.
- The technique has security and cost implications that make it not great for either the CDNs, or the practitioners.
We’ve seen both of these predictions mature.
Cloudflare, explicitly doesn’t support this mechanism of circumvention, and coincidentally has major Chinese partnerships and worked to deploy into China. Google also has limited the technique over periods as they have struggled with abuse (although mute in China, since the Google cloud doesn’t work there as a CDN.)
In terms of cost, the most notable incident is the “Great Cannon”, which targeted not only Github as widely reported, but also caused a significant amount of traffic to go to Amazon-hosted pages run by GreatFire, a dissident news organization, and costing them significant amounts of money. GreatFire had been providing a free browser that operated by proxying all traffic through domain-fronting. Due to a separate and less reported Chinese “DDOS” they ended up with a monthly bill for several tens of thousands of dollars and had to turn down the service.
The latest strike against domain fronting is seen in posts by Cobalt Strike and FireEye that the technique is also gaining adoption for Malware C&C. This abuse case will further incentivize CDNs from allowing the practice to continue, since there will now be many legitimate western voices actively calling on them to stop. Enterprises attempting to track threats on their networks, and CDN customers wanting to not be blamed for attacks will both begin putting more pressure on the CDNs to remove the ability for different domains to be intermixed, and we should expect to see a continued drop in the willingness of providers to offer such a service.
Video from my CCC talk last week is here.
Third party analytics services are suffering from the growing prevalence of ad blocking, tracking protection, and the trend of minimizing connections and requests. However, from a site owner perspective, receiving usage information remains important for measuring site growth.
My expectation is that we are already on the curve where ads and tracking software will be more tightly integrated into websites and make it significantly more difficult for clients to disambiguate
“good” and “bad” scripts, which are mostly done today from the URL.
Google already provides the tools needed to relay analytics communication through a third party server, and it took under an hour to put together a proof of concept that removes the final third-party requests that are required when viewing this page. In essence, my server proxies all the requests that would normally go to Google, and adds on a couple extra parameters to track who the real client is.
The modified loading script for google analytics, and the corresponding nginx configuration to make my server a relay are here.
On Monday, China ratified an updated cybersecurity legislation that will enter effect next June. The policy regulates a number of aspects of the Chinese Internet: What data companies need to keep on domestic servers, the interaction between companies and the government, and the interaction between companies and Chinese users.
Notably, when considering the impact on the Internet, the law include:
- Network operators are expected to record network security incidents and store logs for at least 6 months (Article 21)
Note that the punishment for refusing to keep logs is a fine up to 10,000usd to the operator, and of up to 5,000usd to the responsible person.
Services must require real-identity information for network access, telecom service, domain registration, blogging, or IM (Article 24)
The punishment for failing to require identity is up to 100,000usd and suspension of operations.
- Network operators must provide support to the government for national security and crime investigations (Article 28)
- If a service discovers prohibited user generated content they must remove it, save logs, and report to the government (Article 47)
The punishment for this is up to 100,000usd and closing down the website
The concerns from foreign companies seem to center around a couple things: The first is that there’s a fairly vague classification of ‘critical infrastructure’, which includes power, water and other infrastructure elements explicitly, but also refers to services needed for public welfare and national security. Any such service gets additional monitoring requirements, and needs to keep all data on the mainland. Companies are worried they could be classified as a critical service, and that there aren’t clear guidelines about how to avoid or limit their risk of becoming subject to those additional regulations.
The other main concern seems to be around the fairly ambiguous regulation of supporting national security investigations by the government. There’s a concern that there aren’t really any limits in place for how much the government can request from services, which could include requiring them to include back doors, or perform significant technical analysis without compensation.
My impression is that these regulations aren’t much of a surprise within China, and they are unlikely to cause much in the way of change from how smaller companies and individuals experience Internet management already.
In the last week at Blackhat / Defcon two groups looked deeply at one of the lesser known implementations of network policy called Proxy Autoconfig. (In particular, badWPAD by Maxim and Crippling HTTPS with unholy PAC by Safebreach.)
These two talks both focus on what a malicious individual could do to attack the standard, and each find an interesting line of attack. The first attack is that the PAC file is allowed to make DNS requests in determining how to proxy connections, and in many browsers sees the full URL being accessed rather than only the domain. This means that even when the user is communicating with a remote server over HTTPS, the local network can learn the full URL that is being visited. The second attack has to do with where computers look for PAC files on their local network – for a file called `wpad.dat`.
While there is certainly the potential for an attacker to target a victim through these technologies, they are more accessible and arguably more valuable to a ISP or state level actor interested in passive surveillance. This explicit policy for connectivity is not inherently more invasive than policies employed by many ISPs already, and could likely be deployed on many networks without consumer push-back as a performance enhancement for better caching. It is also appropriate for targeted surveillance, since vulnerability can be determined passively.
The viability of surveillance through WPAD and PACs is a bit of a mixed bag. Most ISPs use DHCP already and set a “search domain”, which will result in a recognizable request for proxy information from vulnerable clients. While organizations often require all clients to enable discovery, this is not true of many consumer machines. Unfortunately, some versions of windows have proxy discovery enabled by default.
The NMAP tool used for network exploration, and pitched towards use as a tool facilitating network attackers, already has support for WPAD. In contrast, the network status and monitoring tools, like Netalyzr and OONI do not yet monitor local proxy status and won’t provide indication of malicious behavior.
I’ve started to dive once again into the mess of connection establishment. Network address translation (NAT) is a reality today for most Internet users, and poses a significant hurdle in creating the user-user (or peer-peer) connections. NAT is the process used by your router to provide multiple internal (192.168.x.x) addresses that are all only visible as a single external address on the Internet. The challenge caused by this device is that if someone outside wants to connect to your computer, they have to figure out how to get the router to send their traffic back to you, and not just drop it or send it to another computer on your network.
Without configuring your router to add a ‘port forwarding’ rule, it isn’t supposed to do this, so many of the connection establishment procedures are really ways to trick your NAT into forwarding traffic without realizing what’s happening.
There are two main protocols on the Internet today: UDP and TCP. UDP is stateless, each “packet” of data is its own message, and is self contained. In contrast, TCP is a representation of a longer “stream” of data – many messages are sent with an explicit ordering . TCP is much harder to trick routers into establishing, and there has been little work there.
The current generation of p2p systems are led by high-bandwidth applications that want to offload traffic from central servers in order to save on bandwidth costs. Good examples of these are Google’s hangouts and other VOIP (video over IP) traffic.
These systems establish a channel to send UDP traffic between two computers both behind NAT routers using a system called ICE (interactive connectivity establishment). This is a complex dance with multiple sub-protocols used to try several different ways of establishing connectivity and tricking the routers.
One of the key systems used by ICE is a publicly visible server that speaks a protocol called STUN. STUN servers provide a way for a client to open a UDP connection through their router to a server that is known to be able to receive messages, and then learn what that connection looks like outside of its router. It can then provide that external view of how it’s connected to another peer which may be able to send messages to the same external address and port and have them forwarded back to the client.
One of the unfortunate aspects of this situation is that the complexity of these systems has led to very few implementations. This is unfortunate, since the existence of libraries making it easy to reuse these techniques can allow more p2p systems to continue working in the modern Internet without forcing users to manually configure their routers.
I’ve started work on a standalone go implementation of the ICE connectivity stack. Over the weekend I reached the first milestone – The library can create a STUN connection, and learn the external appearance of the connection as reported by the STUN server.
Satellite takes a look at understanding shared CDN behaviors and automatically detecting censorship by regularly querying open DNS resolvers around the world.
For example, we can watch the trends in censorship in Iran using only a single, external machine.