Tag: Online

  • On Trust

    There has been a fair amount of effort on UCAN (User Controlled Authorization Networks), and other types of ‘decentralized credentials’ over the last couple years. These efforts perpetuate the same control structures that exist today, with delegated trees of hierarchical control. This is in contrast to a personal or ‘decentralized’ trust we might hope for in peer to peer networks. It is difficult to use DIDs, UCANs, or other proposed mechanisms for reputation and network formation without finding ourselves back trusting an authority – they are both easily captured and naturally lend themselves to centralization of control. We need a fundamentally different trust infrastructure in order to build resilient, peer to peer networks.

    On non-hierarchical models for trust

    The main barrier is not a technical one – we have seen technical implementations (e.g. the GPG web of trust) for decades. There is an intuitive design for how a flat trust model can be implemented. The problem lies in a dis-satisfaction from the emergent properties of that naive network structure. This tension has been framed in a couple different ways. One perspective is that the user experience in bootstrapping trust is overly cumbersome, and this friction leads to an insufficiently dense trust network. A different perspective on the same tension is that a user-driven trust system is at-odds with transitive / automatic trust relations, and that actions to ‘ease’ the user experience are fundamentally reducing user control.

    We can find a space for exploration, by calling out this tension as a false dichotomy. The choice is not between a single authority vs user-directed trust links, but about distributing trust structures. There is a space for organic / automatic way to generate and allow for the reflection and evolution of trust that is neither user-directed nor rooted in a single authority. The bit-torrent tit-for-tat mechanism is one form of this, where protocol-compliant behavior leads to an increasing buffer for data transfer within the protocol.

    Trust or Reputation

    There is a related notion that is more regularly referred to in protocols as a concept of ‘reputation’. Reputation can be viewed as a property of a node in a system rather than one of an edge. (e.g. reputation is often constructed as a metric that is transitive, or where a node has a single consensus value. This is different from how we normally think of our personal trust in another user.)

    What then exactly are we trying to capture in a measure for ‘Trust’? In the hierarchical systems of web 2, it’s meant to provide some assurance that “someone is who they say they are”. It isn’t an indication that there are ‘aligned beliefs’, but rather that the expected entity is behind a given identifier. The properties that come from systems like TLS / CAs look very similar to reputation in this sense. While each individual can over-ride and manually configure which authorities to trust, that definition of trust is meaning a confidence in adherence to protocol and of coherence between expectation and reality.

    Scoping trust

    A challenge we sometimes run into when talking about trust as it relates to technical networks is that our expectation of scope is typically much more limited in digital or transactional contexts than they are in real life. When you refer to a person as a “trusted individual”, the implication is not only that this is not an ‘imposter’, but also that the person has some level of altruism or aligned / positive motivations. While some formulations use reputation as a stand-in for this additional notion of trust, I would argue that it is perhaps better thought of as an understanding of motivations. The trust is that it is understandable what game someone is playing, what their motivations are, and thus what their rational behavior will be.

    Narrow interactions, like those scoped in technical protocols, are intentionally limited to exclude externalities, but this also makes it difficult to understand if other nodes have ulterior motives in participating in the protocol. The analysis of what can be learned by a participant, and the other uses that can be derived from participation is not always easy to analyze, and the lack of completeness is unsatisfying. In contrast, the design of protocols to not leak information is difficult-to-impossible, and difficult to justify. Even the determination and understanding of risk present in a system is an expensive proposition.

    Categorizing mechanisms

    How do we build distributed notions that reflect this notion of confidence that another participant is also playing the same game as us?

    If we take the narrower view of actions within the protocol, we can get to a somewhat useful taxonomy of work in this space.

    • The bit-torrent tit-for-tat algorithm uses the demonstration from the other participant that they’re following the protocol as a signal to continue the conversation.
    • A set of protocols use a proof of work, or computational puzzle as a way for participants to demonstrate that it is worth something to them to participate.
    • Protocols like TLS have added revocation lists, and things shaped like “proofs of bad behavior” as ways to share knowledge of identities that have misbehaved. If the cost of creating an identity is high, and your misbehavior causes “reputational damage”, your rational behavior becomes more incentivized to follow the protocol.
    • Finally, there is emerging growth of validation-based protocols. Cryptographic proofs are increasingly able to provide an assertion that computation has been performed per the expected protocol, and reduces the space of valid-but-not-compliant actions that can be taken.

    The complement to this category are protocols that make use of external costs. In many cases the cost is difficult to quantify, which leaves modeling of the strength of the protocol trust levels equally difficult to pin down. At the same time, it means that there is the ability for costs to be higher relative to what could be built into a protocol in isolation.

    • Protocols which involve a validation of ‘real name’ (linking an ID, bank account, cell phone, etc) are able to retaliate for misbehavior using the legal system.
    • Protocols involving social graphs use the potential of negative impact to your standing with your friends.
    • Protocols requiring registration with a phone number, or who distribute their app only for mobile devices are leveraging the cost of those assets as part of the account cost.

    Increasing trust

    From the previous categories we can see that there are two ways that they end up leaning on for increasing this notion of trust.

    The first is increasing the cost of defection. Increasing the costs tied to creating or re-creating an account increase this cost. Impacting a reputation or decreasing utility likewise are ways to increase the cost of not following a protocol

    The second way that trust is increased is by increasing a user’s confidence that they will be able to succeed in getting resolution when another user defects. In most of the ‘in protocol’ cost models, resolution occurs as part of the protocol itself. Bit-torrent won’t continue rewarding peers that aren’t honoring the tit-for-tat agreement. Submitting a computation without a valid proof transcript will be ignored. It is the out of protocol actions where this subjective confidence is most at issue. Actions like Facebook suspending Cambridge Analytica (and publicized moderation actions more generally) demonstrate to users that enforcement is taking place.

    Full circle

    How do we provide decentralized notions of trust that can be dense and mesh with protocol needs for automatic establishment?

    By ensuring that the risk associated with a trust link is less than what can be mitigated when trust is broken. This can be done in one of three ways:

    1. The benefit of breaking trust can be reduced
    2. The cost associated with punishment can be increased
    3. Regularity (or user perception) of breaking trust leading to punishment can be increased

    Concretely, the hesitancy to form a mesh network comes most often from the lack of a concretely defined threat model. When a protocol comes with a well scoped definition of misbehavior, it is typically much easier to enforce compliance and to frame the protocol in a way that provides comfort to participants.

    It’s worth noting that we are often concerned with one of the hardest forms of this scenario – which is balancing the ease of participation in a system with the indirect and difficult to identify surveillance risks. Concrete examples of this tension are nation-state identification of Tor users, RIAA identification of bit-torrent users, or IRS identification of crypto currency users. In all of these cases, a user joining the protocol may behave as normal, but may also record network identifiers of other participants they encounter. An unaccountable out-of-protocol leaking of these known identifiers then leads to repercussions to other participants. I don’t know if the preceding discussion is the best framing in this specific case. I think it can be used as a lens still, but the interesting question here is mostly around the first point of reducing the benefits around breaking trust, and in reducing the signal that such an attack gets in the initial level of participation in the protocol.

  • Retrieval Constraints

    A couple months ago I wrote up some of the edges that I’ve encountered in thinking about how to structure decentralized data transfer systems. These are an extension of the limitations that were initially encountered in bittorrent style tit-for-tat exchanges, and have now matured into a much more extensive field looking at incentives and other mechanisms that can be leveraged to create robust systems.

    See the long-form essay on mirror

    My top take-away from this line of thought is that it does seem like within our initial framing of how data transfer might happen we end up still relying on reputation as a way to estimate transferability of experience, and in estimating trust for whether past behavior will continue to subsequent performance.

  • Private Retrieval

    It’s very exciting to have a public face to the thoughts around how to enable effective private access to data.

    Research Announcement

    EthCC Announcement

    The basic hypothesis here is that there’s a high-leverage opportunity to attract thought around scaling the range of anonymous database or data transfer techniques to reach something with better properties that the systems we have today.

    I’ve learned a lot about what goes into running a grant fund already in my minor involvement helping to set up this program, and am excited to see the next stage of it’s lifecycle as we begin to engage with proposals and grantees.

  • Building Decentralization

    Building Decentralization

    I talked earlier this week on some of the current problems in decentralization at the rc3 event. It’s easy to be pessimistic about the current silo’d technological landscape, but decentralized platforms are continuing to make progress and there’s reason to be hopeful. At the same time, there’s a green field of many more decentralized protocols to discover and define beyond the current notions of DHTs and Consensus protocols.

    The RC3 event was a great commemoration of the traditional chaos congress. The extent of culture and community that was brought into the 2d virtual world managed to capture some of the essence of the in-person event. Like the real events, it was a great opportunity for mixing whimsy and technical learning. In that spirit, I rehashed some measurement work to generate the following statistics about the event:

    • The most common character accessory was wearing a mask, which were donned by 30% of participants.
    • The badge shown on the most user profiles was ‘On Webcam‘, a badge I awarded to a scraped list of usernames on the 2nd day of the event. It was about 3x more popular than the second most popular badge, received for visiting the CERT, which only functioned near the end of the event.
    • A total of 385 badges were awarded and publicly displayed on user profile pages.
    • A total of 334 distinct pronouns were used by users. Only 5 of them were attempts at cross-site scripting attacks.
    • The user population was approximately that of the recent in-person events. Of those, my measurement estimated about 1/3rd participated in the 2d virtual world portion of the event.
    • There were only 2 users who used the same description of themselves on their profiles: ‘Moin!’

    These statistics come from a fairly simple script that measured user pages near the end of the event. User IDs were largely sequential and could be enumerated without issue. This was needed as a step in awarding badges, which could only be done with the non-enumerable “usernames”, rather than these User IDs. One of the things that makes the CCC events unique is their transience, which allows for a safer form of expression than our more usual permanently logged and recorded online experience. In that spirit I have subsequently deleted my collected list of usernames and saved only these summary statistics.

  • Corporate Censorship

    One of the most interesting lines of inquiry within the Censored Planet project at the University of Michigan is trying to pull apart the different actors involved in Internet  censorship. One of the interesting quirks is that a significant factor in why content might not be available to users is that the web publisher themselves have limited who they’ll respond to.

    This relates to existing phenomenons like increased balkanization of the web, where regions and nations promote domestic services and networks, but is as much a function of where lucrative markets are and a reaction to the background of fraud and malicious online traffic.

    One outcome of this research is a set of measurements looking at how and where CDNs limit access, that will be presented tomorrow at IMC.

    Like many parts of the Internet, a take-away here is that attribution is hard.

  • Scalable Remote Measurement of Application-Layer Censorship

    Quite exciting to see another step in remote measurement systems at USENIX Security in August. This particular piece is on how to recover DPI policies at scale.

  • Messaging Threat models

    I talked yesterday at Bornhack about the current state of secure messaging and the different primitives and threats that groups are working to address.

    The talk is on youtube.

    The slides are on this site, as are the directions for dogfooding the talek system.

  • Initial Measurements of the Cuban Street Network

    Internet access in Cuba is severely constrained, due to limited availability, slow speeds, and high cost. Within this isolated environment, technology enthusiasts have constructed a disconnected but vibrant IP network that has grown organically to reach tens of thousands of households across Havana. We present the first detailed characterization of this deployment, which is known as the SNET, or Street Network. Working in collaboration with SNET operators, we describe the network’s infrastructure and map its topology, and we measure bandwidth, available services, usage patterns, and user demographics. Qualitatively, we attempt to answer why the SNET exists and what benefits it has afforded its users. We go on to discuss technical challenges the network faces, including scalability, security, and organizational issues. To our knowledge, the SNET is the largest isolated community-driven network in existence, and its structure, successes, and obstacles show fascinating contrasts and similarities to those of the Internet at large.

    Talks

    The Internet in Cuba: A Story of Community Resilience. Chaos Communication Congress. 2017

    Publication

    P Pujol, Eduardo E., Will Scott, Eric Wustrow, and J. Alex Halderman. “Initial measurements of the cuban street network.” In Proceedings of the 2017 Internet Measurement Conference, pp. 318-324. ACM, 2017. Slides

  • TapDance at Scale

    I’m excited that the first project I helped on at Michigan will be presented at FOCI next month: An ISP-Scale Deployment of TapDance

  • IETF 98

    Last week I talked briefly about the state of open internet measurement for network anomalies at IETF 98. This was my first time attending an IETF in-person meeting, and it was very useful in getting a better understanding of how to navigate the standards process, how it’s used by others, and what value can be gained from it.

    A couple highlights that I took away from the event:

    There’s a concern throughout the IETF about solving the privacy leaks in existing protocols for general web access. There are three major points in the protocol that need to be addressed and are under discussion as part of this: The first is coming up with a successor to DNS that provides confidentiality. This, I think, is going to be the most challenging point. The second is coming up with a SNI equivalent that doesn’t send the requested domain in plain-text. The third is adapting the current public certificate transparency process to provide confidentiality of the specific domains issued certificates, while maintaining the accountability provided by the system.

    Confidential DNS

    There are two proposals with traction for encrypting DNS that I’m aware of. Neither fully solve the problem, but both provide reasonable ways forward. The first is dnscrypt, a protocol with support from entities like yandex and cloudflare. It maintains a stateless UDP protocol, and encrypts requests and responses against server and client keys. There are working client proxies for most platforms, although installation on mobile is hacky, and a set of running providers. The other alternative, which was represented at IETF and seems to be preferred by the standards community is DNS over TLS. The benefit here that there’s no new protocol, meaning less code that needs to be audited to gain confidence of the security properties for the system. There are some working servers and client proxies available for this, but the community seems more fragmented, unfortunately.

    The eventual problem that isn’t yet addressed is that you still need to trust some remote party with your dns query and neither protocol changes the underlying protocol where the work of dns resolution is performed by someone chosen by the local network. Current proxies allow the client to choose who this is instead, but that doesn’t remove the trust issue, and doesn’t work well with captive portals or scale to widespread deployment. It also doesn’t prevent that third party from tracking the chain of dns requests made by the client and getting a pretty good idea about what the client is doing.

    Hidden SNI

    SNI, or server name identification, is a process that occurs at the beginning of an HTTPS request where the client tells the server which domain it wants to talk to. This is a critical part of the protocol, because it allows a single IP address to host HTTPS servers for multiple domains. Unfortunately, it also allows the network to detect and potentially block requests at a domain, rather than IP granularity.

    Proposals for encrypting the SNI have been around for a couple years. Unfortunately, they did not get included in TLS1.3, which means that it will be a while before the next iteration of the standard and the potential to include this update.

    The good news was that there seems to be continued interest in figuring out ways to protect the SNI of client requests, though no current proposal I’m aware of.

    Certificate Transparency Privacy

    Certificate Transparency is an addition to the HTTPS system to enforce additional accountability in to the certificate authority system. It requires authorities (CA)’s to publish a log of all certificates they issue publicly, so that third parties can audit their list and make sure they haven’t secretly mis-issued certificates. While a great feature for accountability and web security, it also opens an additional channel where the list of domains with SSL certificates can be enumerated. This includes internal or private domains that the owner would like to remain obscure.

    As google and others have moved to require the CT log from all authorities through requirements on browser certificate validity, this issue is again at the fore. There’s been work on addressing this problem, including a cryptographic proposal and the IETF proposal for domain label redaction which seems to be advancing through the standards process.

    There remains a ways to go to migrate to protocols which provide some protection against a malicious network, but there’s willingness and work to get there, which is at least a start.