Tag: programming

  • Pretty excited about the data visualization course at UW this winter.

  • Distributed Computing so far

    I gave a half-hour presentation on my research at the admitted students weekend earlier.  I think I was able to convey some sense of the opportunities that Mudd offers for research, and presented a reasonable face of what it could look like.

    The talk was an explanation of the concepts behind distributed hash tables, and an introduction to the work I’ve been doing with bitTorrent.

    These are the slides I used: Admitted Students presentation

  • Experiments in Data Moshing

    Some work I’ve been doing for my experimental animation class.

  • Downloading Torrents

    I extended the work from last week in order to actually get data from a swarm. The main change is that new sockets are now allocated for each connection, state is remembered, and the client can actually get so far as to download data from the other peers.

    What needs to happen still is that 1. I need a better way of stopping, right now it stops upon a lull in network activity, but that doesn’t really mean anything. 2. I need to actually save downloaded data somewhere, or alternatively keep track of the rate that it’s coming in (or the time delay between packets.) 3. I need to factor out the different message types into separate modules to keep the code readable.

    I’m going to set up a code page for this project soon, so that I don’t have to post stuff just in this blog. I should get to that in the coming week, so that there’s a nicer way of interacting with this code base. This model isn’t super memory efficient, but it is a very simple system to work with, and I think it’s become a good jumping off point for trying to interact with swarms. In particular it’s really easy to add in hooks to get methods to run when messages are received, and to store arbitrary state about connections.

    [python]
    # Standalone Torrent Auditor
    #
    import socket
    import time
    import sys
    import getopt
    import random
    import benc
    import binascii
    import select
    import hashlib
    import urllib
    import server

    #a bit of global state about who we are
    client = "AZ"+chr(0x05)+"31"
    myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20))
    peerCache=[]

    #load in a .torrent file
    def readFile(filePath):
    f = open(filePath, ‘r’)
    data = ”.join(f.readlines())
    f.close()
    return benc.bdecode(data)

    #register with the tracker to generate potential peers
    def register(torrent,myPort):
    url = torrent[‘announce’];
    ihash = hashlib.sha1(benc.bencode(torrent[‘info’])).digest();
    query = urllib.urlencode({‘info_hash’:ihash,
    ‘peer_id’:myID,
    ‘port’:myPort,
    ‘uploaded’:0,
    ‘downloaded’:0,
    ‘left’:0,
    ‘compact’:1,
    ‘event’:’started’});
    url += "?"+query;
    trackerhandle = urllib.urlopen(url);
    trackerdata = ”.join(trackerhandle.readlines());
    trackerhandle.close();
    parseddata = benc.bdecode(trackerdata);
    initialnodes = parseddata[‘peers’];
    peers = [];
    while len(initialnodes) > 5:
    ip = initialnodes[0:4];
    port = initialnodes[4:6];
    initialnodes = initialnodes[6:];
    peers.append({‘state’:0,
    ‘ip’:socket.inet_ntoa(ip),
    ‘ihash’:ihash,
    ‘port’:ord(port[0])*256+ord(port[1]),
    ‘buffer’:”,
    ‘callback’:handleData});
    return peers;

    #register a new incoming connection
    def onNew(socket,(ip,port)):
    obj = {‘socket’:socket,
    ‘ip’:ip,
    ‘port’:port,
    ‘state’:1,
    ‘buffer’:”,
    ‘callback’:handleData};
    handleData(obj)
    server.track_socket(obj);

    #reply to a socket with a torrent message
    def reply(socket,mtype,payload):
    #first flatten the payload
    s = ”;
    while len(payload):
    p = payload.pop();
    if isinstance(p,int):
    s += struct.pack(‘!i’,p)
    else:
    s += p
    s = chr(mtype) + s;
    l = len(s);
    pl = struct.pack(‘!i’,l) + s;
    socket.sendall(ppl);
    return pl;

    #parse torrent msg
    def handleMsg(obj,msg):
    if msg[0] == 0: #choke
    obj[‘state’] &= ~4
    elif msg[0] == 1: #unchoke
    obj[‘state’] |= 4
    #and we’d like to request a piece from them.
    reply(obj[‘socket’],6,[0,0,2<<15])
    elif msg[0] == 2: #interested
    obj[‘state’] |= 8
    elif msg[0] == 3: #uninterested
    obj[‘state’] &= ~8
    elif msg[0] == 4: #have
    idx = struct.unpack(‘!i’,msg[1:5])
    obj[‘have’][idx/8] |= (1 << (8 – (idx%8)))
    elif msg[0] == 5: #bitfield
    obj[‘have’] = msg[1:]
    elif msg[0] == 6: #request
    #ignored
    return;
    elif msg[0] == 7: #piece
    #got our data
    print "succeeded in downloading!"
    elif msg[0] == 8: #cancel
    #we aren’t in that business
    return;
    #parse incoming data
    def handleData(obj):
    try: nbuf = obj[‘socket’].recv(4096)
    except socket.error, err:
    print "disconnected %s" %obj[‘ip’]
    server.closed_socket(obj[‘socket’])
    return;
    if not nbuf:
    print "disconnected %s" % obj[‘ip’]
    server.closed_socket(obj[‘socket’])
    return;
    data = obj[‘buffer’] + nbuf;
    #Handshake
    if obj[‘state’] &2 == 0:
    if data[0] == chr(19) and len(data) >= 68:
    obj[‘ihash’] = data[28:48]
    obj[‘peerid’] = data[48:68]
    obj[‘buffer’] = data[68:]
    obj[‘state’] += 2
    if obj[‘state’] & 1== 0:
    #we need to respond to the handshake
    obj[‘socket’].sendall(handshake(obj[‘ihash’]))
    obj[‘state’] += 1
    print "shook hands with %s" % obj[‘ip’];

    #all other messages are prefixed by their length
    else:
    mlen = struct.unpack(‘!i’,data[0:4])[0]
    while len(data) > mlen:
    msg = data[1:mlen+1]
    data = data[mlen+1:]
    if len(msg):
    #Actual message received
    handleMsg(obj,msg);
    if len(data):
    mlen = ord(data[0]);
    else:
    break;
    #save the rest of the data
    obj[‘buffer’] = data
    print "unknown message %s"%data

    #define the handhsake
    def handshake(ihash):
    announce = chr(19) + ‘BitTorrent protocol’
    announce += chr(0x0)*8
    announce += ihash
    announce += myID
    return announce

    #talk with cached peers
    def onTimeout():
    global peerCache;
    inited = 0;
    if len(peerCache) == 0:
    return True;
    for i in range(5):
    if len(peerCache) == 0:
    break;
    obj = peerCache.pop()
    obj = server.track_socket(obj)
    if not obj:
    continue;
    obj[‘socket’].sendall(handshake(obj[‘ihash’]))
    obj[‘state’] &= 1
    return False;

    def usage():
    global peerCache;
    print "Usage:";
    print "client –file=loc.torrent";
    print "Will report on statistics for the desired torrent";

    def main():
    filePath = "default.torrent";
    try:
    opts, args = getopt.getopt(sys.argv[1:], "hf:", ["help", "file="])
    except getopt.GetoptError, err:
    # print help information and exit:
    print str(err) # will print something like "option -a not recognized";
    usage();
    sys.exit(2);
    for o, a in opts:
    if o in ("-h", "–help"):
    usage();
    sys.exit();
    elif o in ("-f", "–file"):
    filePath = a;
    else:
    assert False, "unhandled option";
    print "Loading Info… ",
    info = readFile(filePath);
    print "okay";
    port = 6886;
    print "Detecting Swarm… ",
    seeds = register(info,port);
    print len(seeds), " peers returned";
    peerCache.extend(seeds);
    print "Entering Main Loop";
    onTimeout();
    server.main_loop(onTimeout,onNew,port);
    print "Finished Snapshot";

    if __name__ == "__main__":
    main()
    [/python]

  • The “Mainline Kademlia” protocol

    Kademlia is a description of the network interactions and rpc calls that can form a distributed hash table. Today, these kad based DHTs are one of the most common forms of distributed storage. However, since kademlia does not specify the application level protocol to make calls, it is instead implemented on top of existing application protocols, making many of the rings incompatible with each other.

    I was interested specifically in the Kad ring that is used by the Mainline Bitorrent client. This storage is used in practice as a distributed source of information about torrents, so that new peers can join a swarm even when there is no tracker present. The ring is used by utorrent, the official bittorrent.com client, and the Vuze client with an optional plugin. (Vuze has a native DHT which is incompatible with other systems.)

    Unsurprisingly, the protocol for this DHT is well advertised. The reason is two-fold: first, the amount of space is limited, and it is hard for clients to tell between valid and invalid data, so you don’t want to allow arbitrary data into the DHT. Secondly, the questionable legality of many torrents means that it is detrimental for a third party to have a means to easily monitor the swarm.
    (more…)

  • The One-Day Website

    SetTimeout Logo
    I spent today building http://setTimeout.net, a website that I was inspired to create yesterday evening.

    I set myself the goal of finishing the project in one day, I’ve managed to get done enough in that time period, and I’m pretty happy with how it turned out. I came up with several ideas for how to improve it in the process, namely letting you create a bookmark that immediately saved a page for a set duration without further interaction, and integration with twitter.

    The goal of the site is to work like javascript’s setTimeout(); function. You pass it a URL, and a time (in days), and the url will pop up in your news reader after the timeout expires. It’s useful if you want to check on the status of a project, but it isn’t interesting enough to monitor constantly, or if you find an interesting website that isn’t loading.

    It’s very minimal in a lot of ways, and that’s sort of the point. It’s actually fairly easy to interface with: you give it data in one end, and when the timeout expires it spits them out as rss. I’m considering spending another day at some point to allow it to push data when the timeout expires, or to provide alternative interfaces to the resulting pages.

  • Launchpad Update

    I am almost done rewriting my launchpad code so that it can be run without a kernel module. Instead, I’ll be using libusb, which is a reasonably common and cross-platform library for interacting with USB devices.

    I’m having a couple issues with callbacks and polling, but I’m making pretty steady progress, and should have a stable version to post pretty soon.

    The header for interactions with the launchpad is going to look like this:

    typedef void(*launchpad_callback)(unsigned char* data, size_t len, void* user_data);

    struct launchpad_handle* launchpad_register(launchpad_callback e, void* user_data);

    int launchpad_write(struct launchpad_handle *dp, unsigned char* data, size_t len);
    int launchpad_poll(struct pollfd* descriptors, size_t num);

    void launchpad_deregister(struct launchpad_handle* dp);

    You start by writing your callback function, which will be called whenever new data is available from the launchpad, or a write to the launchpad completes. Then register that to start receiving notifications. Call write to send data to the device, and use the launchpad_poll in your main loop, which externally will act as a standard system poll() call, but also handles device events.

    It’s worth noting that you can play with the kernel module already by downloading the code from the project page

    On the technical side, I’ve worked through a couple issues that took a bit more debugging than I really wanted, so I figured I’d post them here:

    The correct formulation for libusb_lock_events appears to be put immediately before your call to poll(), and you should unlock immediately afterwards. If you lock events for the entire access time, you will find that although you’re polling, the reads and writes you initiate never get processed.

    If libusb_submit_transfer() is failing with code -1, it’s possibly a IO error, meaning that you don’t have your endpoint correctly defined. For me the issue was that although my output endpoint was an interrupt type, it actually was registered as a bulk-data type. (that is, it’s address was 0x02 rather than 0x01. Checking your endpoints with lsusb -v will let you check what the actual endpoints should be.)

  • Novation Launchpad

    The Novation Launchpad
    Launchpad

    Information here is outdated. Look instead at the Latest Progress.

    The new toy I got for christmas was a novation Launchpad.  It’s a Midi controller developed specifically for ableton live, and is able to both generate midi data or trigger program actions.

    Since my current computer is running ubuntu, I was planning to use it in that context, but the device uses its own protocol to communicate at the USB level.  Luckily, the protocol is not a difficult one, in that it is stateless, and consists entirely of sending specific 3-byte sequences to the device.  With that in mind, I started reading through documentation to figure out what it would take to develop a linux driver that was compatible with the protocol.

    Two days later I have a reasonably stable kernel module, and a basic midi driver for the launchpad running under linux.  I’ve placed the work as a google code project, and it is available for perusal: http://code.google.com/p/launchpadd/

    The coding was really interesting, since it was my first experience writing a kernel module.   The kernel environment has always scared me and perhaps rightly so, since I did cause several hard crashes while I was developing this code, but overall it was not unpleasant.  The device did almost everything I wanted using the skeleton USB driver provided in the kernel source, with the one exception that the included driver did not support polling for events.  Since I wanted to allow for non-blocking reads to the device I switched it’s queues so that it would properly hook up with the character device polling interface.

    My eventual goal is to use the launchpad as a control device for my computer, showing and allowing for control of things like virtual desktop, volume, email, battery, and clipboards.  I’ve already written a couple of these functions, and hope to have it essentially finished by the end of winter break when I go back to school.  This has been one of the most fun programming projects I’ve undertaken in a while, since I get the excuse to write in several different contexts, and bring in several of the concepts like sockets and selecting that I got excited by in the fall.