Kademlia is a description of the network interactions and rpc calls that can form a distributed hash table. Today, these kad based DHTs are one of the most common forms of distributed storage. However, since kademlia does not specify the application level protocol to make calls, it is instead implemented on top of existing application protocols, making many of the rings incompatible with each other.
I was interested specifically in the Kad ring that is used by the Mainline Bitorrent client. This storage is used in practice as a distributed source of information about torrents, so that new peers can join a swarm even when there is no tracker present. The ring is used by utorrent, the official bittorrent.com client, and the Vuze client with an optional plugin. (Vuze has a native DHT which is incompatible with other systems.)
Unsurprisingly, the protocol for this DHT is well advertised. The reason is two-fold: first, the amount of space is limited, and it is hard for clients to tell between valid and invalid data, so you don’t want to allow arbitrary data into the DHT. Secondly, the questionable legality of many torrents means that it is detrimental for a third party to have a means to easily monitor the swarm. Continue reading The “Mainline Kademlia” protocol→
The moniker distributed computing has become a broad catchall term that has been applied to everything from parallel computation to the auto configuration of ad-hoc networks. In order explore the technical problems in this field it is necessary to clearly define what exactly the field is, and to this end I want to elaborate on what I believe is and is not distributed computing.
The first use of distributed computing was to describe the problem faced by projects like SETI at home and the genome project. These organizations needed to process huge quantities of data and did not have the budget to do so internally, and instead turned to the greater population. They did not diverge from client-server architecture, but allowed any computer running their client software to request and process chunks of data. The problem that was created was not one of communication, but of trust, since none of the clients could be expected to be either reliable or trustworthy. This problem is typically solved by checking for suspicious responses, and by sending each chunk of data to multiple clients to ensure they agree on an answer.
Five years ago the term grid computing was a new big thing, though it has since been outmoded by the term cloud computing. Both of these models are related to distributed computing, in that they focus on the problem of scalability and reliability of a large number of machines. Both terms imply that the machines are entirely under ones control, with grid computing referring specifically to the use of such clusters internal to an organization and cloud computing referring to the outsourcing of that resource. The problem faced is not as different from the previous as we might initially think. Given the large number of machines and components, failure will occur regularly, and it is important to make sure data and computation are redundant and can hide these individual problems. The problem is lessened only in that nodes are typically not purposefully malicious in this scenario. Cloud computing and grid computing before it both are also systems which have been built and have essentially overcome the issues faced. Many large tech companies run enormous data centers, and consumer services such as Google’s AppEngine and Amazon’s EC2 allow consumers to outsource their computation to the cloud.
Mesh networks are another seemingly disjoint field that has traditionally been linked to distributed computing. Here, a number of computers need to work together to access a common resource, like the Internet, but don’t necessarily trust each other. The typical visualization is that only one computer will have an Internet connection, and that resource needs to be shared to computers that are not directly connected to it. The problem here is now primarily one of trust, since not only is there a likelihood of reliability issues, but each node benefits from ignoring requests from others since that leaves more bandwidth for it. There have been several implementations of mesh computing, although none that can boast tremendous success. The most known is certainly the OLPC, which attempted to use mesh computing as a solution to the intermittent access to resources found in developing countries. Beyond these bespoke solutions, there has been a 802.11s standard for mesh computing developed by the 802.11 working group, which defines a global standard and frequency for wireless communication.
To me distributed computing is both all and none of these problems. Instead I see the fundamental problem to be answered as one of cooperation. Users should be viewed as selfish, in that they want to get the most from a service while giving as little as possible. For proof of this, look at the number of successful free web services versus those that cost money. Free and ad supported continues to rule as a business model because most users are unwilling to pay more than they need to for a service. The challenge then is to get a set of strangers to cooperate so that they all get something more than they started with.
In addition to the previously mentioned fields, this problem is faced by file sharing networks. Here as much as anywhere we see the need for cooperation among selfish individuals. The goal for each user is to get data from others as quickly as possible, while sending as little as possible – both for legal and purely selfish reasons. One attempt to solve this problem have been to develop communities surrounding the technology so that past performance is reflected and good behavior encouraged, but even this is not foolproof.
For me, the goal of distributed computing is to provide a structure where users can gain the ability to burst beyond resources they control, and not worry about peers who are malicious or self-serving. I plan to delve into this problem by looking and experimenting with existing systems, looking specifically at the issue of fault tolerance – which I see as fundamental to any solution, and then building a structure of my own.
I set myself the goal of finishing the project in one day, I’ve managed to get done enough in that time period, and I’m pretty happy with how it turned out. I came up with several ideas for how to improve it in the process, namely letting you create a bookmark that immediately saved a page for a set duration without further interaction, and integration with twitter.
It’s very minimal in a lot of ways, and that’s sort of the point. It’s actually fairly easy to interface with: you give it data in one end, and when the timeout expires it spits them out as rss. I’m considering spending another day at some point to allow it to push data when the timeout expires, or to provide alternative interfaces to the resulting pages.
I am almost done rewriting my launchpad code so that it can be run without a kernel module. Instead, I’ll be using libusb, which is a reasonably common and cross-platform library for interacting with USB devices.
I’m having a couple issues with callbacks and polling, but I’m making pretty steady progress, and should have a stable version to post pretty soon.
The header for interactions with the launchpad is going to look like this:
typedef void(*launchpad_callback)(unsigned char* data, size_t len, void* user_data);
struct launchpad_handle* launchpad_register(launchpad_callback e, void* user_data);
int launchpad_write(struct launchpad_handle *dp, unsigned char* data, size_t len);
int launchpad_poll(struct pollfd* descriptors, size_t num);
You start by writing your callback function, which will be called whenever new data is available from the launchpad, or a write to the launchpad completes. Then register that to start receiving notifications. Call write to send data to the device, and use the launchpad_poll in your main loop, which externally will act as a standard system poll() call, but also handles device events.
It’s worth noting that you can play with the kernel module already by downloading the code from the project page
On the technical side, I’ve worked through a couple issues that took a bit more debugging than I really wanted, so I figured I’d post them here:
The correct formulation for libusb_lock_events appears to be put immediately before your call to poll(), and you should unlock immediately afterwards. If you lock events for the entire access time, you will find that although you’re polling, the reads and writes you initiate never get processed.
If libusb_submit_transfer() is failing with code -1, it’s possibly a IO error, meaning that you don’t have your endpoint correctly defined. For me the issue was that although my output endpoint was an interrupt type, it actually was registered as a bulk-data type. (that is, it’s address was 0x02 rather than 0x01. Checking your endpoints with lsusb -v will let you check what the actual endpoints should be.)
The new toy I got for christmas was a novation Launchpad. It’s a Midi controller developed specifically for ableton live, and is able to both generate midi data or trigger program actions.
Since my current computer is running ubuntu, I was planning to use it in that context, but the device uses its own protocol to communicate at the USB level. Luckily, the protocol is not a difficult one, in that it is stateless, and consists entirely of sending specific 3-byte sequences to the device. With that in mind, I started reading through documentation to figure out what it would take to develop a linux driver that was compatible with the protocol.
Two days later I have a reasonably stable kernel module, and a basic midi driver for the launchpad running under linux. I’ve placed the work as a google code project, and it is available for perusal: http://code.google.com/p/launchpadd/
The coding was really interesting, since it was my first experience writing a kernel module. The kernel environment has always scared me and perhaps rightly so, since I did cause several hard crashes while I was developing this code, but overall it was not unpleasant. The device did almost everything I wanted using the skeleton USB driver provided in the kernel source, with the one exception that the included driver did not support polling for events. Since I wanted to allow for non-blocking reads to the device I switched it’s queues so that it would properly hook up with the character device polling interface.
My eventual goal is to use the launchpad as a control device for my computer, showing and allowing for control of things like virtual desktop, volume, email, battery, and clipboards. I’ve already written a couple of these functions, and hope to have it essentially finished by the end of winter break when I go back to school. This has been one of the most fun programming projects I’ve undertaken in a while, since I get the excuse to write in several different contexts, and bring in several of the concepts like sockets and selecting that I got excited by in the fall.