Category: Security

Computer Security Related topics

On Trust
There has been a fair amount of effort on UCAN (User Controlled Authorization Networks), and other types of ‘decentralized credentials’ over the last couple years. These efforts perpetuate the same control structures that exist today, with delegated trees of hierarchical control. This is in contrast to a personal or ‘decentralized’ trust we might hope for in peer to peer networks. It is difficult to use DIDs, UCANs, or other proposed mechanisms for reputation and network formation without finding ourselves back trusting an authority – they are both easily captured and naturally lend themselves to centralization of control. We need a fundamentally different trust infrastructure in order to build resilient, peer to peer networks.

On non-hierarchical models for trust

The main barrier is not a technical one – we have seen technical implementations (e.g. the GPG web of trust) for decades. There is an intuitive design for how a flat trust model can be implemented. The problem lies in a dis-satisfaction from the emergent properties of that naive network structure. This tension has been framed in a couple different ways. One perspective is that the user experience in bootstrapping trust is overly cumbersome, and this friction leads to an insufficiently dense trust network. A different perspective on the same tension is that a user-driven trust system is at-odds with transitive / automatic trust relations, and that actions to ‘ease’ the user experience are fundamentally reducing user control.

We can find a space for exploration, by calling out this tension as a false dichotomy. The choice is not between a single authority vs user-directed trust links, but about distributing trust structures. There is a space for organic / automatic way to generate and allow for the reflection and evolution of trust that is neither user-directed nor rooted in a single authority. The bit-torrent tit-for-tat mechanism is one form of this, where protocol-compliant behavior leads to an increasing buffer for data transfer within the protocol.

Trust or Reputation

There is a related notion that is more regularly referred to in protocols as a concept of ‘reputation’. Reputation can be viewed as a property of a node in a system rather than one of an edge. (e.g. reputation is often constructed as a metric that is transitive, or where a node has a single consensus value. This is different from how we normally think of our personal trust in another user.)

What then exactly are we trying to capture in a measure for ‘Trust’? In the hierarchical systems of web 2, it’s meant to provide some assurance that “someone is who they say they are”. It isn’t an indication that there are ‘aligned beliefs’, but rather that the expected entity is behind a given identifier. The properties that come from systems like TLS / CAs look very similar to reputation in this sense. While each individual can over-ride and manually configure which authorities to trust, that definition of trust is meaning a confidence in adherence to protocol and of coherence between expectation and reality.

Scoping trust

A challenge we sometimes run into when talking about trust as it relates to technical networks is that our expectation of scope is typically much more limited in digital or transactional contexts than they are in real life. When you refer to a person as a “trusted individual”, the implication is not only that this is not an ‘imposter’, but also that the person has some level of altruism or aligned / positive motivations. While some formulations use reputation as a stand-in for this additional notion of trust, I would argue that it is perhaps better thought of as an understanding of motivations. The trust is that it is understandable what game someone is playing, what their motivations are, and thus what their rational behavior will be.

Narrow interactions, like those scoped in technical protocols, are intentionally limited to exclude externalities, but this also makes it difficult to understand if other nodes have ulterior motives in participating in the protocol. The analysis of what can be learned by a participant, and the other uses that can be derived from participation is not always easy to analyze, and the lack of completeness is unsatisfying. In contrast, the design of protocols to not leak information is difficult-to-impossible, and difficult to justify. Even the determination and understanding of risk present in a system is an expensive proposition.

Categorizing mechanisms

How do we build distributed notions that reflect this notion of confidence that another participant is also playing the same game as us?

If we take the narrower view of actions within the protocol, we can get to a somewhat useful taxonomy of work in this space.
- The bit-torrent tit-for-tat algorithm uses the demonstration from the other participant that they’re following the protocol as a signal to continue the conversation.
- A set of protocols use a proof of work, or computational puzzle as a way for participants to demonstrate that it is worth something to them to participate.
- Protocols like TLS have added revocation lists, and things shaped like “proofs of bad behavior” as ways to share knowledge of identities that have misbehaved. If the cost of creating an identity is high, and your misbehavior causes “reputational damage”, your rational behavior becomes more incentivized to follow the protocol.
- Finally, there is emerging growth of validation-based protocols. Cryptographic proofs are increasingly able to provide an assertion that computation has been performed per the expected protocol, and reduces the space of valid-but-not-compliant actions that can be taken.
The complement to this category are protocols that make use of external costs. In many cases the cost is difficult to quantify, which leaves modeling of the strength of the protocol trust levels equally difficult to pin down. At the same time, it means that there is the ability for costs to be higher relative to what could be built into a protocol in isolation.
- Protocols which involve a validation of ‘real name’ (linking an ID, bank account, cell phone, etc) are able to retaliate for misbehavior using the legal system.
- Protocols involving social graphs use the potential of negative impact to your standing with your friends.
- Protocols requiring registration with a phone number, or who distribute their app only for mobile devices are leveraging the cost of those assets as part of the account cost.
Increasing trust

From the previous categories we can see that there are two ways that they end up leaning on for increasing this notion of trust.

The first is increasing the cost of defection. Increasing the costs tied to creating or re-creating an account increase this cost. Impacting a reputation or decreasing utility likewise are ways to increase the cost of not following a protocol

The second way that trust is increased is by increasing a user’s confidence that they will be able to succeed in getting resolution when another user defects. In most of the ‘in protocol’ cost models, resolution occurs as part of the protocol itself. Bit-torrent won’t continue rewarding peers that aren’t honoring the tit-for-tat agreement. Submitting a computation without a valid proof transcript will be ignored. It is the out of protocol actions where this subjective confidence is most at issue. Actions like Facebook suspending Cambridge Analytica (and publicized moderation actions more generally) demonstrate to users that enforcement is taking place.

Full circle

How do we provide decentralized notions of trust that can be dense and mesh with protocol needs for automatic establishment?

By ensuring that the risk associated with a trust link is less than what can be mitigated when trust is broken. This can be done in one of three ways:

1. The benefit of breaking trust can be reduced
2. The cost associated with punishment can be increased
3. Regularity (or user perception) of breaking trust leading to punishment can be increased

Concretely, the hesitancy to form a mesh network comes most often from the lack of a concretely defined threat model. When a protocol comes with a well scoped definition of misbehavior, it is typically much easier to enforce compliance and to frame the protocol in a way that provides comfort to participants.

It’s worth noting that we are often concerned with one of the hardest forms of this scenario – which is balancing the ease of participation in a system with the indirect and difficult to identify surveillance risks. Concrete examples of this tension are nation-state identification of Tor users, RIAA identification of bit-torrent users, or IRS identification of crypto currency users. In all of these cases, a user joining the protocol may behave as normal, but may also record network identifiers of other participants they encounter. An unaccountable out-of-protocol leaking of these known identifiers then leads to repercussions to other participants. I don’t know if the preceding discussion is the best framing in this specific case. I think it can be used as a lens still, but the interesting question here is mostly around the first point of reducing the benefits around breaking trust, and in reducing the signal that such an attack gets in the initial level of participation in the protocol.
November 30, 2024
What's Left for private Messaging

I had the privilege to address the annual Chaos Communication Congress (36C3) in Leipzig last week about the state and remaining issues in private communications.

The recording of the video has been made available by the CCC, and I have also posted the slides.

The TL;DR for me is that many of the trade-offs are balancing the stability of user experience with privacy mechanisms – and finding more ergonomic user experience interactions will be as important as new systems schemes are to improving the ecosystem.

I am particularly excited by the number of ongoing effort reducing trust in central servers. Many of the mechanistic trade-offs we face are due to the topology of our systems. With systems designed for fully anonymous interaction, like mixnets, PIR, and oblivious messaging, we can model and mitigate threats from much more realistic adversaries than we do with popular channels today. (For instance, consider an office which has received a whistle blowing message. If the receiving investigation wants to identify the source, they likely control both the local network, and have the ability to send messages to the account that initiated the conversation. Our current designs will find it quite difficult to protect a user from this scenario)

January 1, 2020
Messaging Threat models

I talked yesterday at Bornhack about the current state of secure messaging and the different primitives and threats that groups are working to address.

The talk is on youtube.

The slides are on this site, as are the directions for dogfooding the talek system.

August 25, 2017
Linux Fest

I’ll be talking at Linux Fest Northwest in a couple weeks.

April 23, 2017
IETF 98

Last week I talked briefly about the state of open internet measurement for network anomalies at IETF 98. This was my first time attending an IETF in-person meeting, and it was very useful in getting a better understanding of how to navigate the standards process, how it’s used by others, and what value can be gained from it.

A couple highlights that I took away from the event:

There’s a concern throughout the IETF about solving the privacy leaks in existing protocols for general web access. There are three major points in the protocol that need to be addressed and are under discussion as part of this: The first is coming up with a successor to DNS that provides confidentiality. This, I think, is going to be the most challenging point. The second is coming up with a SNI equivalent that doesn’t send the requested domain in plain-text. The third is adapting the current public certificate transparency process to provide confidentiality of the specific domains issued certificates, while maintaining the accountability provided by the system.

Confidential DNS

There are two proposals with traction for encrypting DNS that I’m aware of. Neither fully solve the problem, but both provide reasonable ways forward. The first is dnscrypt, a protocol with support from entities like yandex and cloudflare. It maintains a stateless UDP protocol, and encrypts requests and responses against server and client keys. There are working client proxies for most platforms, although installation on mobile is hacky, and a set of running providers. The other alternative, which was represented at IETF and seems to be preferred by the standards community is DNS over TLS. The benefit here that there’s no new protocol, meaning less code that needs to be audited to gain confidence of the security properties for the system. There are some working servers and client proxies available for this, but the community seems more fragmented, unfortunately.

The eventual problem that isn’t yet addressed is that you still need to trust some remote party with your dns query and neither protocol changes the underlying protocol where the work of dns resolution is performed by someone chosen by the local network. Current proxies allow the client to choose who this is instead, but that doesn’t remove the trust issue, and doesn’t work well with captive portals or scale to widespread deployment. It also doesn’t prevent that third party from tracking the chain of dns requests made by the client and getting a pretty good idea about what the client is doing.

Hidden SNI

SNI, or server name identification, is a process that occurs at the beginning of an HTTPS request where the client tells the server which domain it wants to talk to. This is a critical part of the protocol, because it allows a single IP address to host HTTPS servers for multiple domains. Unfortunately, it also allows the network to detect and potentially block requests at a domain, rather than IP granularity.

Proposals for encrypting the SNI have been around for a couple years. Unfortunately, they did not get included in TLS1.3, which means that it will be a while before the next iteration of the standard and the potential to include this update.

The good news was that there seems to be continued interest in figuring out ways to protect the SNI of client requests, though no current proposal I’m aware of.

Certificate Transparency Privacy

Certificate Transparency is an addition to the HTTPS system to enforce additional accountability in to the certificate authority system. It requires authorities (CA)’s to publish a log of all certificates they issue publicly, so that third parties can audit their list and make sure they haven’t secretly mis-issued certificates. While a great feature for accountability and web security, it also opens an additional channel where the list of domains with SSL certificates can be enumerated. This includes internal or private domains that the owner would like to remain obscure.

As google and others have moved to require the CT log from all authorities through requirements on browser certificate validity, this issue is again at the fore. There’s been work on addressing this problem, including a cryptographic proposal and the IETF proposal for domain label redaction which seems to be advancing through the standards process.

There remains a ways to go to migrate to protocols which provide some protection against a malicious network, but there’s willingness and work to get there, which is at least a start.

April 2, 2017
Another Strike against Domain Fronting
In 2014, Domain Fronting became the newest obfuscation technique for covert, difficult to censor communication. Even today, the Meek Pluggable transport serves ~400GB of Tor traffic each day, at a cost of ~$3000/month.

The basic technique is to make an HTTPS connection to the CDN directly, and then once the encryption has begun, make the HTTP request to the actual backing site instead. Since many CDNs use the same “front-end cache” servers for incoming requests to all of the different sites they host, there is a disconnect between the software handling SSL, and the routing web server proxying requests to where they need to go.

Even as the technique became widely adopted in 2014-2015, its demise was already predicted, with practitioners in the censorship circumvention community focused on how long it could be made to last until the next mechanism was found. This prediction rested on two points:
1. The CDN companies will find themselves in a difficult position politically, since they are now in the position of supporting circumvention while also maintaining a relationship with the censoring countries.
2. The technique has security and cost implications that make it not great for either the CDNs, or the practitioners.
We’ve seen both of these predictions mature.

Cloudflare, explicitly doesn’t support this mechanism of circumvention, and coincidentally has major Chinese partnerships and worked to deploy into China. Google also has limited the technique over periods as they have struggled with abuse (although mute in China, since the Google cloud doesn’t work there as a CDN.)

In terms of cost, the most notable incident is the “Great Cannon”, which targeted not only Github as widely reported, but also caused a significant amount of traffic to go to Amazon-hosted pages run by GreatFire, a dissident news organization, and costing them significant amounts of money. GreatFire had been providing a free browser that operated by proxying all traffic through domain-fronting. Due to a separate and less reported Chinese “DDOS” they ended up with a monthly bill for several tens of thousands of dollars and had to turn down the service.

The latest strike against domain fronting is seen in posts by Cobalt Strike and FireEye that the technique is also gaining adoption for Malware C&C. This abuse case will further incentivize CDNs from allowing the practice to continue, since there will now be many legitimate western voices actively calling on them to stop. Enterprises attempting to track threats on their networks, and CDN customers wanting to not be blamed for attacks will both begin putting more pressure on the CDNs to remove the ability for different domains to be intermixed, and we should expect to see a continued drop in the willingness of providers to offer such a service.
February 22, 2017
Thoughts on IPv6 Measurement
About five years ago two projects, Zmap and Masscan, helped to shift the way that many researchers thought about the Internet. The tools both provide a relatively optimized code path for sending packets and collecting replies, and allow a researcher with moderate resources to attempt connections to every computer on the IPv4 Internet in about an hour.

These techniques are widely applied to monitor the Internet-scale security of services, with prominent examples of censys.io, scans.io, and shodan.io. For the security community, they have become a first-step for reconnaissance, allowing hackers to find origin IPs masked by CDNs, unadvertised points of presence, and vulnerable hosts within an organization.

While the core of the Internet and the services we actively choose to connect with remain staunchly IPv4, the networks that many end hosts are connected to are more rapidly adopting IPv6, responding to the exhaustion and density of the IPv4 address space.

This fall, a new round of research has focused on what is possible for the enumeration and exploration of the IPv6 address space. ‘You can -J reject but you can’t hide’ was presented at CCC, focusing on spidering DNS records to learn of active IPv6 addresses which are registered within the DNS system. Earlier in the fall, there were several sessions at IMC thinking about IPv6. Most notably, “Entropy/IP – uncovering entropy in IPv6″, which looks at how addresses are allocated in practice as seen by Akamai at the core of the network. In addition, IPv6 was the focus of a couple WIP sessions, expressing thoughts on discovering hosts through progressive ICMP probing, as well the continued exploration of what’s actually happening in the core as seen by Akamai.

Where does this growing understanding of wide-scale IPv6 usage take us?
- Enumeration of candidate addresses is a new first step that will be needed for anything beyond a single prefix. Even then, scanning within a single organizational prefix can be considered an active brute-force attack, rather than the relatively ‘harmless’ reconnaissance of IPv4 scanning.
- There are many potential sources to interact with for enumeration, including DNS records, observed network traffic, and default ::1 addresses. The Entropy/IP paper points out that shodan.io has already been observed adding itself as a member of the NTP pool to harvest candidate IPv6 addresses for scanning.
- Address generation for many hosts is not fully random, embedding a mac address, IPv4 address, or other non-random information. This can be used to discover a subset of hosts more efficiently, though still not at Internet scale. (for example, 2^32 attempts to look for hosts of a specific brand within a 2^64 network address space.) This would still sends several gigabytes of traffic to an individual network in the process of scanning. Non-random addresses tend to be more often associated with servers and routers than with end-clients.
- Discovery of network topology is possible by enumerating where error responses to guessed addresses come back from. This doesn’t allow for discovery of individual machines either.
What do we do about it?

There will probably not be a shodan.io for ipv6 in the same way there is for ipv4. Instead, much of the wide-scale scanning on the IPv6 network will be performed through reflection from hosts discovered through their participation in other active services, for instance bit torrent, NTP, or DNS.

Conversely, the number of vulnerable IPv6 hosts will keep growing, because they can exist for much longer before anyone will find them. This will likewise increase the value that can be obtained through scanning – both to hackers, and to academics looking at Internet dynamics. We can expect to see a marketplace for addresses observed passively by ISPs, the network core, and passive services.

It’s worth also watching the watchers here: which providers are “selling me out” so to speak? It would be worth building the honey-pots to observe which services and servers leak client information and lead top probing and the potential for compromise of end hosts.
January 10, 2017
First-party Google Analytics

Third party analytics services are suffering from the growing prevalence of ad blocking, tracking protection, and the trend of minimizing connections and requests. However, from a site owner perspective, receiving usage information remains important for measuring site growth.

My expectation is that we are already on the curve where ads and tracking software will be more tightly integrated into websites and make it significantly more difficult for clients to disambiguate
“good” and “bad” scripts, which are mostly done today from the URL.

Google already provides the tools needed to relay analytics communication through a third party server, and it took under an hour to put together a proof of concept that removes the final third-party requests that are required when viewing this page. In essence, my server proxies all the requests that would normally go to Google, and adds on a couple extra parameters to track who the real client is.

The modified loading script for google analytics, and the corresponding nginx configuration to make my server a relay are here.

November 12, 2016
Watch your PAC

In the last week at Blackhat / Defcon two groups looked deeply at one of the lesser known implementations of network policy called Proxy Autoconfig. (In particular, badWPAD by Maxim and Crippling HTTPS with unholy PAC by Safebreach.)

Proxy AutoConfig (PAC) is a mechanism used by many organizations to configure an advanced policy for connecting to the Internet. A PAC file is written in JavaScript to provide a dynamic determination of how different connections should be made, and which proxy they should use. In particular, international companies with satellite offices often find the PAC system useful in routing some traffic through a corporate proxy for compliance or geographical reasons while other traffic is routed directly to the Internet.

These two talks both focus on what a malicious individual could do to attack the standard, and each find an interesting line of attack. The first attack is that the PAC file is allowed to make DNS requests in determining how to proxy connections, and in many browsers sees the full URL being accessed rather than only the domain. This means that even when the user is communicating with a remote server over HTTPS, the local network can learn the full URL that is being visited. The second attack has to do with where computers look for PAC files on their local network – for a file called `wpad.dat`.

While there is certainly the potential for an attacker to target a victim through these technologies, they are more accessible and arguably more valuable to a ISP or state level actor interested in passive surveillance. This explicit policy for connectivity is not inherently more invasive than policies employed by many ISPs already, and could likely be deployed on many networks without consumer push-back as a performance enhancement for better caching. It is also appropriate for targeted surveillance, since vulnerability can be determined passively.

The viability of surveillance through WPAD and PACs is a bit of a mixed bag. Most ISPs use DHCP already and set a “search domain”, which will result in a recognizable request for proxy information from vulnerable clients. While organizations often require all clients to enable discovery, this is not true of many consumer machines. Unfortunately, some versions of windows have proxy discovery enabled by default.

The NMAP tool used for network exploration, and pitched towards use as a tool facilitating network attackers, already has support for WPAD. In contrast, the network status and monitoring tools, like Netalyzr and OONI do not yet monitor local proxy status and won’t provide indication of malicious behavior.

August 9, 2016
Satellite at ATC

Excited to see Satellite chosen as best student paper this year at USENIX ATC. Slides and audio from the talk should be online shortly.

The CS department, as always, is on top of its news releases.

June 22, 2016