Tag Archives: tech

Experimenting with 802.11s

The purpose of this document is an explanation of what is defined in the IEEE 802.11s standard, and the states of current implementations based on this standard. 802.11s is the standardization of mesh networking. That is to say that it defines a protocol for mobile devices where they can communicate amongst themselves even when not directly connected, and can route data beyond their immediate environment.
This work has been ongoing for close to a decade, and has been progressing as a single proposal for the last four. The standard is for a complete system, explaining the MAC level standard that data will be transferred. This includes how clients can securely create connections, and how and when they should resend packets they receive which are meant for other devices in the mesh. As a result, this standard has become effectively all encompassing, as it defines standards from the hardware signal levels all the way up to complex software routing and security.
Even today, the standard is not formalized, meaning that nothing is focalized, and implementers are really very cautious to begin implementing mesh networking, because it may change and they would then have to re-write or design their implementation (it means specifically that wireless cards are not being built with mesh networking in mind, since chip design is more expensive and less changeable then software drivers.) The lowest level changes are made to the wireless frames in order to allow hops within the mesh, and would benefit from hardware changes to make those changes quickly implemented, however they will still work even when those changes are processed entirely in software, which is the current choice implementations are making until the standard is formalized.
Routing can be specified either explicitly, as the tree that the source is aware of that will extend to the root (or final destination in the mesh) or ad-hoc, where intermediate nodes make their own decisions on how to route the packet. However, routing is done using a QOS-esque process, where the route is first established, and subsequent packets can use the same path until it needs to be recalculated due to topological changes. In addition there is a periodic discussion about which node in the mesh should act as a root, and the ordering of nodes, which is based upon which node has wired access to other networks, and tie-broken by the MAC addresses. Beyond this there are different levels of structure that these networks can take on, similar to the differences we see today in ad-hoc networks and networks with access-points. These are called meshes with and without registration. Without registration, any node can come and go, and the rest of the mesh is unaware of any topology beyond that directly adjacent to it. In registration mode, the root is aware of the entire tomography of the network, and can make informed choices about how routes should be created.
While the major protocol specification seems to not be changing much at the present, there is still enough flux in details that implementations are not compatible with each other, and are still very rudimentary. I looked at three different implementations, none of which are inter-compatible, and all claim to be based off of the standard.
The first is the implementation of mesh networks used in the OLPC. This was one of the first implementations of mesh networking, and is not compatible with any widely used implementation outside of those devices. It works in OLPC clusters, and is a technical feat, although since the central points that were intended to ground the mesh networks have not yet materialized, it is quite underutilized as I understand it. Instead, in the situations where OLPCs have been deployed, a standard 802.11b network typically works well since all of the machines are close enough to the central access point.
The second is the open 802.11s project, which is designing a software driver for the Linux kernel. This version works currently for a limited set of wireless cards, whose Linux drivers are sufficiently open to allow the needed modifications. It is not included as part of the standard kernel distribution, which is a fairly powerful comment on the state of it’s development, since the kernel is typically thought of as including everything and the kitchen sink. It is simply the reference implementation being created as part of the development of the standard.
The third implementation is available in the latest version of FreeBSD. This version is included in the distribution, and notes that it does not work with the Linux version due to how primitive they are. I was unable to test this implementation as I was hoping to, because the emulated lab I was using didn’t allow the permissions that the code wanted to make to the wireless drivers. However, this code promises to be one of the most practical current versions of mesh networking. Setup is performed through the standard ifconfig linux tool. Where one specifies the access point in other protocols, here they specify the mesh they wish to use, and the parameters associated with it. Once the mesh is specified, routes are automatically formed, and users can manually specify nodes they wish to block or favor using included implementations.
I think the important takeaway from this exploration is into the state of mesh networking. I think the IEEE process here has been a hindrance to the development, and that it has trouble keeping up with the technological developments that are going on in this area. Enough new research keeps coming in that the standard needs to be constantly updated to keep from being outdated, and in so doing means that vendors are hesitant to implement inoperable devices, and keeps the whole field from the popularity it needs to really succeed. I think that we have had the opportunity to settle on an imperfect draft of this standard and actually have widespread mesh networking, but due to the fact that it can become useful only if everyone does the same thing, we have been very hesitant to do so.
Hopefully as smart phones continue to evolve we will begin to see mesh networking standardized in those devices, as the potential it envelops becomes increasingly important to our communication. Given the issues we’ve seen in recent years with cell companies growing troubles keeping their access point bandwidth in line with the consumer demand feels unsustainable, and will hopefully result in an increasing desire for truly distributed communication for both better reliability and bandwidth.

on p2p in the browser

The final project I want to accomplish this semester in my distributed computing independent study, is to come up with a way to access p2p capabilities from within the browser. I was originally hoping to do this with JavaScript API that relied behind the scenes on the new flash p2p protocol, rtmfp. The protocol has been around since last spring, but development for it is not exactly optimal. You need to be using flash to make the movie, and you need a proprietary and expensive piece of server software to coordinate data transfers. The protocol itself is still proprietary, meaning that Adobe won’t actually tell you how to make a server that can work with it, or how data is sent across the wire.

They finally have started advertising it at least, and so it will gain some prominence, and eventually the details will come to light, but for now it is unrealistic to attempt to use flash for a general server agnostic service. (http://www.ietf.org/proceedings/10mar/slides/tsvarea-1.pdf)

Instead, my plan is to modify the open source privoxy software and add p2p capabilities at that layer. My eventual goal is to make a communal grease-monkey system, where pages can be modified by the user through pieces of JavaScript, and those scripts can then be shared to friends. With the right level of abstraction, I think that this can produce a very powerful system. Starting next week, I’ll begin reading and hacking privoxy to find out how to integrate new code into that project.

Cornel Visit Days

This is just a recounting of my visit days at cornell, so that I can keep it fresh in my head when I want to reflect on it in the future.

Flight sucked – sat in pain with a migraine between Cleveland and Syracuse – it let up about when we landed, and I’m blaming it on mild food poisoning from the turkey sandwich I ate on the way to cleveland.

The driver from the airport was a really nice guy, (me and another systems guy (wants to research planetary systems especially filesystems) got driven in at the same time) and pointed out all the interesting sites between Syracuse and Ithaca. He regularly drives for cornell events, so knew a lot about the place. Some random points of interest:
* There is some skiing, but it’s not going to be comparable to stuff on the west coast
* It’s a very definite bubble, which really doesn’t interact with the area around it
* a fair amount of interaction with people from new york

Hotel was nice, as much as you would want from a hotel. No problems getting checked in.
Went down to the hotel bar where people were hanging out. talked to a couple interesting people:
* A guy from princeton doing theory. didn’t have a ton of good things to say about princeton.
* One of the sponsors (1st year) from istanbul who liked the small town and was used to it, but said she went in to new york a lot at the beginning to adjust. She is doing systems – works with weatherspoon and is doing a project trying to optimize the back end of archive.org
* Another girl sponsor who is doing theory and seemed

Breakfast wasn’t bad, they did it at their special hotel school. It was as good as you would need for breakfast.

Campus tour was nice. Lots of fun stuff going on. really pretty place, better certainly than anywhere else I’ve visited. The gorges are pretty sweet, and the old buildings are super neat.

Housing wasn’t that exciting – cheap, but unexciting. Especially the university stuff, but the guys house that we saw was pretty sparse, and not super nice either.

Went back. the systems people that talked seemed quite well adjusted, and were all doing cool stuff. The other people interested in systems my year seemed a bit less so. The good news is that you get extra time to actually build systems, and it sounds like Gun is really into stuff and would be a good guy for me to get as an advisor in terms of productivity.

the food wasn’t overpriced anywhere which is good.

The party was pretty geeky, but at least people were drinking. Still, it devolved to rock band and taboo pretty quickly. I don’t know how you would really do that better.

Overall is seems like a somewhat better mix of geeks and reasonable people than Mudd, but that isn’t saying that too much. I think I could do interesting research, and got really excited talking to van Renesse for dinner, but hakim and gun will need to be really cool and uw will need to be a bit disappointing for me to choose cornell. (it’s operating at a bit of a disadvantage)

Sunday night at 11pm I realized that the schedule in my packet had my meeting with professors and started looking up their research areas

Monday was really quite good as well.
I got up at 7 something in order to take the bus over to the CS department lounge. There was fruit, coffee, bagels, and that sort of breakfast ready for us, which took about half an hour for people to get through, and then we got an introduction to the department in the main conference room by eva Tardos. (A cool lady) She works in theory, and just went over basic strengths of the department. It seems to be really theory heavy, but systems comes in second. Everyone TAs for 2 semesters, and then you’ll either keep doing it or move onto an RA ship once you find a project. Funding guaranteed, pretty high graduation rate. 5-6 years for systems folks, theory typically can get through in 5. You do quite a bit of course work in the first year or two and then move into research for the remainder. If you quit after 2-3 you leave with a masters. They were saying only 2-3 per class didn’t make it. 1-2 because they never found an advisor they actually connected with and 1-2 because they left for industry without ever writing up their thesis. (they don’t have much sympathy for the later type)

The second talk was by the systems group. We went down to the systems lab (they’re going to replace it next year, but it doesn’t look that bad as it is. Certainly very spacious, and the cube setup is pretty productive – I like it from last summer) Mayers focuses on PLs a bit more than the rest of them, and he was talking for most of it. He has been writing java based compilers – extending the things in a java-like syntax – and actually has a full new language out of it. Called fabric. part of it’s claim to fame is that it has distributed objects that can be executed on remote hosts really easily. (he directs the DB toaster project I heard about sunday – which compiles down databases down to the needed queries.) He also does stuff with trust control.

Then there were a set of three meetings.
first was with Gun. He’s doing some really cool stuff. Mostly interested in flixQ which is a hybrid p2p content delivery service – you download movies like youtube, but the server coordinates to maximize global bandwidth. He’s also got his hands in a bunch of different places. One was a trusted operating system in order to make use of the trusted computing chips in modern computers. Another is looking at network measurements – i think? He has taught the advanced p2p class, so that’s his stuff as much as anything.

Then robert van renesse. He talked about DHTs and other designs that sacrificed different things, in order to work better in different situations. The Kad system is good for fault tolerance, but you certainly have trouble always getting the most up-to-date copy, so it looses information potentially.
Anyway, he had a system where you could have master-slave nodes for different parts of your ring and re-divide partitions on your ring as data or hosts increased. It’s mostly aimed at data center situations, where you still want all the distributed stuff kad gives you, but you need redundancy in a different way.

Then was Hakim, the guy I’ve talked too before. We talked a bit about his work in filesystems, and the differences between UW and cornell.

I talked to both Gun and Hakim about UW, since they both have done stuff there (hakim did undergrad, gun did his PHD). they both said that the two departments were the most comparable I would find. The real difference that I could get out of them was that they thought cornell went a bit further in formalizing their systems. That is – they said UW had a lot more industrial connections, and so could do lots of work improving and finding problems in existing systems. However, they cautioned that the problem is that just fixing the thing is problematic when you can’t theoretically back up your proof. (So, cornell they claim has more theoretical backing to their systems group, and does a better job proving optimality of their solutions) I’m waiting until I visit UW to trust them on that point.

Lunch was at a chinese place halfway down the hill. oh yeah, cornell is at the top of a big hill, and we were staying in downtown at the bottom. “college Town” is where undergrads primarily hang out about halfway down the hill and really close to the undergrad dorms – that’s where this restaurant was. It wasn’t bad, and they seemed perhaps a bit more chinese than most americanized restaurants – probably good given that tons of grad students are going to be chinese.

Talked more to one of the grad students about her work optimizing energy efficiency at Archive.org over lunch, and then talked with their physics post doc after lunch. This is project Ken, one of the older systems guys has been leading, where they’ve been working with cicso on core routing algorithms. This part right now is monitoring, where they’ve built a laser that can send valid packets through modulation at very precise timing. Then they can receive the packets and determine how the network is delaying and working with these packets. They found a lot more clumping than anyone was expecting, especially because most of that clumping is re-randomized by the end host in other simulations that have been done.

Of interest here is that Dan is Mike’s younger brother – mike being the guy I was interested in working with a princeton.

Then I sat through the theory presentation. One half of theory was people doing game-theory optimization stuff and solving big matrix problems (sci-comp). The other half was vision and graphics, which meant lots of pretty pictures. They’ve got lots of physics simulations to quickly and accurately render things like hair and yarn. THey also have a new vision guy who’s apparently super famous in the field named Noah. (at least all the vision graphics students were really excited about working with him) He’s the guy who’s been doing 3d reconstruction from photos on flickr and such. Microsoft has been working with him on photosynth (I think). He had some pretty visualizations of 3d reconstructed cities.

After that there were three more meetings. i talked first with Fred first, he’s a security guy – and has advised and shaped government policy quite a bit. he’s also a very good figure skater. We talked about side-channel attacks, and how you could leak information from secure systems from things like timing and power monitoring. It was interested stuff, but he had a bit less to say about distributed systems than the other guys.

Next was Mayers – who talked more about his java compiler and about systems to keep track of data flow through your programs so that you can figure out how many bits of information end up doing what you want (like optimizing randomness so you aren’t wasting entropy in converting distributions.)

Finally was Ken Bergman? he talked more about the physics simulation and also talked about how cisco was planning to do more with them, and was willing to let them play with the next generation of internet routers at pretty much any level – he’s personally working on some fault tolerance applications. One of the things he also said though was that there was a huge potential here to build a new protocol to fix problems with IP so that a second set of global traffic could get sent around the country. His thoughts were to build a parallel system that would either have QOS guarantees or security guarantees. (e.g. only trusted parties are guaranteed to see/understand the data. The latter is a really cool problem, and is one where you really do need a new protocol at that low level, because if your messages are entirely encrypted while on the wire then you need to have the routers be able to somehow deal with that encryption.

There was a poster session after that, where lots of grad students hung out and talked about what they were working on. i talked more about Gun’s work on hybrid p2p systems with the grad student who had built it with him, and talked with some of his other students.

A group of us went out to dinner after that, I talked mostly with a guy from UT Austin who spent a gap year working at intel and is now wanting to do more theoretical stuff and stop doing the formal verification work that intel was having him do.

In the evening we went out ice skating. I can skate well enough that I don’t fall down and don’t look like a complete klutz, but I’m certainly not great. The CS department has it’s own hockey team where grad students play the faculty. They were pushing that event as one of their big social activities.

I got up this morning and caught a shuttle to Syracuse. My first leg to new york was with a guy from cal tech who’s interested in AI and graphics. He’s choosing between princeton and Cornell, and had a bunch of interesting things to say about cal tech.

And now, I’m almost back to reality. It was a positive trip, and the ball is now in UW’s court to convince me that they’re better. There was a very laid back atmosphere in the department which I think would be fun to involve myself in, and the profs all were very approachable. All terms were on first-names, and I didn’t notice the east-coast formalism that I had been half-expecting.

The two things that are the biggest cons are
1. location – it’s hard to work with people outside of cornell, since that’s all that’s there.
2. Housing – the housing looked pretty run-down – I think a city like seattle could get me a much newer / nicer apartment for not that much more money.

Torrent Auditor

I have now migrated the python torrent client that I’ve been working on to a google code project.

It lives at torrentauditor and now has basic support for actually downloading torrent files.

I researched the bittorrent extension protocols this week, but was somewhat frustrated by what I found. Most of the interesting ones are implemented on a per-client basis, and aren’t well documented outside of that client. The Vuze client it turns out switches to an entirely different application specific protocol when it meets another client of the same time. The libtorrent based clients do much the same thing, although they send their additional messages over the existing connection.

However, the good news is that the basic protocol is friendly enough that it can be implemented without major trouble. I chose to focus on in-order reading for now simply for simplicity sake, although it is highly inefficient.

One goal that I’m going to try to focus on a bit in the next weeks as I have time, is to be able to extract frames of videos from downloaded data. For my digital animation class I would like to make an automated program that stitches together frames / short clips of videos entirely automatically – a visual representation of the swarm.

Downloading Torrents

I extended the work from last week in order to actually get data from a swarm. The main change is that new sockets are now allocated for each connection, state is remembered, and the client can actually get so far as to download data from the other peers.

What needs to happen still is that 1. I need a better way of stopping, right now it stops upon a lull in network activity, but that doesn’t really mean anything. 2. I need to actually save downloaded data somewhere, or alternatively keep track of the rate that it’s coming in (or the time delay between packets.) 3. I need to factor out the different message types into separate modules to keep the code readable.

I’m going to set up a code page for this project soon, so that I don’t have to post stuff just in this blog. I should get to that in the coming week, so that there’s a nicer way of interacting with this code base. This model isn’t super memory efficient, but it is a very simple system to work with, and I think it’s become a good jumping off point for trying to interact with swarms. In particular it’s really easy to add in hooks to get methods to run when messages are received, and to store arbitrary state about connections.

[python]
# Standalone Torrent Auditor
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select
import hashlib
import urllib
import server

#a bit of global state about who we are
client = "AZ"+chr(0x05)+"31"
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20))
peerCache=[]

#load in a .torrent file
def readFile(filePath):
f = open(filePath, ‘r’)
data = ”.join(f.readlines())
f.close()
return benc.bdecode(data)

#register with the tracker to generate potential peers
def register(torrent,myPort):
url = torrent[‘announce’];
ihash = hashlib.sha1(benc.bencode(torrent[‘info’])).digest();
query = urllib.urlencode({‘info_hash’:ihash,
‘peer_id’:myID,
‘port’:myPort,
‘uploaded’:0,
‘downloaded’:0,
‘left’:0,
‘compact’:1,
‘event’:’started’});
url += "?"+query;
trackerhandle = urllib.urlopen(url);
trackerdata = ”.join(trackerhandle.readlines());
trackerhandle.close();
parseddata = benc.bdecode(trackerdata);
initialnodes = parseddata[‘peers’];
peers = [];
while len(initialnodes) > 5:
ip = initialnodes[0:4];
port = initialnodes[4:6];
initialnodes = initialnodes[6:];
peers.append({‘state’:0,
‘ip’:socket.inet_ntoa(ip),
‘ihash’:ihash,
‘port’:ord(port[0])*256+ord(port[1]),
‘buffer’:”,
‘callback’:handleData});
return peers;

#register a new incoming connection
def onNew(socket,(ip,port)):
obj = {‘socket’:socket,
‘ip’:ip,
‘port’:port,
‘state’:1,
‘buffer’:”,
‘callback’:handleData};
handleData(obj)
server.track_socket(obj);

#reply to a socket with a torrent message
def reply(socket,mtype,payload):
#first flatten the payload
s = ”;
while len(payload):
p = payload.pop();
if isinstance(p,int):
s += struct.pack(‘!i’,p)
else:
s += p
s = chr(mtype) + s;
l = len(s);
pl = struct.pack(‘!i’,l) + s;
socket.sendall(ppl);
return pl;

#parse torrent msg
def handleMsg(obj,msg):
if msg[0] == 0: #choke
obj[‘state’] &= ~4
elif msg[0] == 1: #unchoke
obj[‘state’] |= 4
#and we’d like to request a piece from them.
reply(obj[‘socket’],6,[0,0,2<<15])
elif msg[0] == 2: #interested
obj[‘state’] |= 8
elif msg[0] == 3: #uninterested
obj[‘state’] &= ~8
elif msg[0] == 4: #have
idx = struct.unpack(‘!i’,msg[1:5])
obj[‘have’][idx/8] |= (1 << (8 – (idx%8)))
elif msg[0] == 5: #bitfield
obj[‘have’] = msg[1:]
elif msg[0] == 6: #request
#ignored
return;
elif msg[0] == 7: #piece
#got our data
print "succeeded in downloading!"
elif msg[0] == 8: #cancel
#we aren’t in that business
return;
#parse incoming data
def handleData(obj):
try: nbuf = obj[‘socket’].recv(4096)
except socket.error, err:
print "disconnected %s" %obj[‘ip’]
server.closed_socket(obj[‘socket’])
return;
if not nbuf:
print "disconnected %s" % obj[‘ip’]
server.closed_socket(obj[‘socket’])
return;
data = obj[‘buffer’] + nbuf;
#Handshake
if obj[‘state’] &2 == 0:
if data[0] == chr(19) and len(data) >= 68:
obj[‘ihash’] = data[28:48]
obj[‘peerid’] = data[48:68]
obj[‘buffer’] = data[68:]
obj[‘state’] += 2
if obj[‘state’] & 1== 0:
#we need to respond to the handshake
obj[‘socket’].sendall(handshake(obj[‘ihash’]))
obj[‘state’] += 1
print "shook hands with %s" % obj[‘ip’];

#all other messages are prefixed by their length
else:
mlen = struct.unpack(‘!i’,data[0:4])[0]
while len(data) > mlen:
msg = data[1:mlen+1]
data = data[mlen+1:]
if len(msg):
#Actual message received
handleMsg(obj,msg);
if len(data):
mlen = ord(data[0]);
else:
break;
#save the rest of the data
obj[‘buffer’] = data
print "unknown message %s"%data

#define the handhsake
def handshake(ihash):
announce = chr(19) + ‘BitTorrent protocol’
announce += chr(0x0)*8
announce += ihash
announce += myID
return announce

#talk with cached peers
def onTimeout():
global peerCache;
inited = 0;
if len(peerCache) == 0:
return True;
for i in range(5):
if len(peerCache) == 0:
break;
obj = peerCache.pop()
obj = server.track_socket(obj)
if not obj:
continue;
obj[‘socket’].sendall(handshake(obj[‘ihash’]))
obj[‘state’] &= 1
return False;

def usage():
global peerCache;
print "Usage:";
print "client –file=loc.torrent";
print "Will report on statistics for the desired torrent";

def main():
filePath = "default.torrent";
try:
opts, args = getopt.getopt(sys.argv[1:], "hf:", ["help", "file="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o in ("-f", "–file"):
filePath = a;
else:
assert False, "unhandled option";
print "Loading Info… ",
info = readFile(filePath);
print "okay";
port = 6886;
print "Detecting Swarm… ",
seeds = register(info,port);
print len(seeds), " peers returned";
peerCache.extend(seeds);
print "Entering Main Loop";
onTimeout();
server.main_loop(onTimeout,onNew,port);
print "Finished Snapshot";

if __name__ == "__main__":
main()
[/python]

Auditing Bit torrent

One of the strengths of bit torrent is that the primary data transfer protocol is entirely separate from the advertisement protocol. This also has created a strain both in discovering other users who have data, and keeping accurate reports of data that was transfered.

The first issue is one that has been developed for extensively, culminating in many extensions to the protocol which purport to make it easier to find other users. These include distributed trackers, PEX, DHT, among many others.

The second issues has been covered less throughly, since it is a problem that can not fundamentally be solved due to the distributed nature of the system. There is no real way to verify the legitimacy of statistics a client reports, since neither it nor any of the peers it has interacted with can be trusted.

One attempt to get a better sense of what is really going on is to create a client that actually interacts with with the data transfer protocol, to verify that reported statistics are not entirely inaccurate. This client does not interact in the traditional way, but will infrequently connect to peers and ask them to send it data – which it can then use to estimate the bandwidth of that client. This knowledge combined with knowledge of which clients have what portions of the data will allow the client to estimate the interactions that are taking place within the swarm.

These estimates can then be checked against reported statistics to discover when a client is misreporting its statistics.

The code below is not finished. It completes the initial functions of peer discovery and connection, but is not able to successfully download or monitor peers. The primary focus of work will be to implement the encryption protocol which is now standard for torrent traffic, so that the client is able to interact successfully with most users.

[python]
# Standalone Torrent Auditor
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select
import hashlib
import urllib

#Initialize a UDP Socket,
#and the other global info about who this client is
client = "AZ"+str(0x05)+"31";
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM);
s.connect(("msn.com",80));
myIP = s.getsockname()[0];
s.close();
myPort = 6886;
UDPSocket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM);
UDPSocket.bind((myIP,myPort));
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
knownPeers=[];

#handle sending a raw UDP datagram
def sendData(data,host,port):
global UDPSocket;
#print ‘messaged %s:%d’%(host,port);
UDPSocket.sendto(data,0,(host,port));

#load in a .torrent file
def readFile(filePath):
f = open(filePath, ‘r’);
data = ”.join(f.readlines());
structure = benc.bdecode(data);
return structure;

#register with the tracker to get peers
def register(torrent):
url = torrent[‘announce’];
ihash = hashlib.sha1(benc.bencode(torrent[‘info’])).digest();
query = urllib.urlencode({‘info_hash’:ihash,
‘peer_id’:myID,
‘port’:myPort,
‘uploaded’:0,
‘downloaded’:0,
‘left’:0,
‘compact’:1,
‘event’:’started’});
url += "?"+query;
trackerhandle = urllib.urlopen(url);
trackerdata = ”.join(trackerhandle.readlines());
trackerhandle.close();
parseddata = benc.bdecode(trackerdata);
initialnodes = parseddata[‘peers’];
peers = [];
while len(initialnodes) > 5:
ip = initialnodes[0:4];
port = initialnodes[4:6];
initialnodes = initialnodes[6:];
peers.append({‘state’:0,’ip’:socket.inet_ntoa(ip),’ihash’:ihash
,’port’:ord(port[0])*256+ord(port[1])});
return peers;

def AnnouncePeer(myID,key,token,lp,host,port):
data = {‘q’:’announce_peer’,’a’:{‘id’:myID,’info_hash’:key,
‘token’:token,’port’:lp},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def parseQuery():
global UDPSocket,knownPeers;
(msg,(hn,hp)) = UDPSocket.recvfrom(4096); #should be more than enough
found = 0;
for p in knownPeers:
if p[‘ip’] == hn and p[‘port’] == hp:
found = 1;
p[‘state’] &= 2;
print msg;
if not found:
print msg;
knownPeers.append({‘state’:2,’ip’:hn,’port’:hp,’ihash’:0});
#data = benc.bdecode(msg);

#check the type of message here, maybe
#hisid = data[‘r’][‘id’];
#nodes = data[‘r’][‘nodes’];
#l = len(nodes)/26;
#for i in range(0,l):
# nid = nodes[(26*i):(26*i+20)];
# nhost = nodes[(26*i+20):(26*i+24)];
# nport = nodes[(26*i+24):(26*i+26)];
# knownHosts[nid]=socket.inet_ntoa(nhost);
# knownPorts[nid]=ord(nport[0])*256+ord(nport[1]);
# if bitdif(nid,targetID) < bitdif(hisid,targetID):
# FindNodeReq(myID,targetID,knownHosts[nid],knownPorts[nid]);
#knownHosts[hisid] = hn;
#knownPorts[hisid] = int(hp);
#return hisid;

def initiateConns():
global knownPeers;
inited = 0;
for p in knownPeers:
if(p[‘state’] == 0 and inited < 5): #uncontacted
announce = str(0x19) + ‘BitTorrent protocol’;
announce += str(0x0)*8;
announce += p[‘ihash’];
announce += myID;
p[‘state’] = 1; #contacted
inited += 1;
sendData(announce,p[‘ip’],p[‘port’]);
return inited == 0;

def MainLoop():
global UDPSocket;
print "Communicating",
rate = 0;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
parseQuery();
else:
if initiateConns():
return;
continue; #we don’t care much about errors, since it’s all datagrams

def usage():
print "Usage:";
print "client –file=loc.torrent";
print "Will report on statistics for the desired torrent";

def main():
global myID,knownPeers;
filePath = "default.torrent";
try:
opts, args = getopt.getopt(sys.argv[1:], "hf:", ["help", "file="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o in ("-f", "–file"):
filePath = a;
else:
assert False, "unhandled option";
print "Loading Info… ",
info = readFile(filePath);
print "okay";
print "Detecting Swarm… ",
seeds = register(info);
print len(seeds), " peers returned";
knownPeers.extend(seeds);
print "Entering Main Loop";
MainLoop();
print "Finished Snapshot";
print "Discovered Swarm State:";
for p in knownPeers:
print p[‘ip’],": ",
if(‘has’ in p):
print p[‘has’],
if(‘speed’ in p):
print p[‘speed’];
else:
print "unconnectable";

if __name__ == "__main__":
main()

UDPSocket.close()
[/python]

Talking to Kad

[python]
# Standalone Mainline Kad Client
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select

client = "AZ"+str(0x05)+"31";
UDPSocket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM);
targetID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
reqs = 0;
knownHosts={};
knownPorts={};

def sendData(data,host,port):
global UDPSocket,reqs;
reqs += 1;
#print ‘messaged %s:%d’%(host,port);
UDPSocket.sendto(data,0,(host,port));

def sendPing(myID,host, port):
data = {‘q’:’ping’,’a’:{‘id’:myID},’v’:client,’y’:’q’,’t’:0x05+0x05};
sendData(benc.bencode(data),host,port);

def GetPeersReq(myID,ih,host,port):
data = {‘q’:’get_peers’,’a’:{‘id’:myID,’info_hash’:ih},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def FindNodeReq(myID,target,host,port):
data = {‘q’:’find_node’,’a’:{‘id’:myID,’target’:target,’want’:[‘n4′]},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def GetPeersReq(myID,ih,host,port):
data = {‘q’:’get_peers’,’a’:{‘id’:myID,’info_hash’:ih},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def AnnouncePeer(myID,key,token,lp,host,port):
data = {‘q’:’announce_peer’,’a’:{‘id’:myID,’info_hash’:key,’token’:token,’port’:lp},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def parseFindNodeResponse():
global UDPSocket,myID,targetID,knownHosts,knownPorts;
(msg,(hn,hp)) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
#check the type of message here, maybe
hisid = data[‘r’][‘id’];
nodes = data[‘r’][‘nodes’];
l = len(nodes)/26;
for i in range(0,l):
nid = nodes[(26*i):(26*i+20)];
nhost = nodes[(26*i+20):(26*i+24)];
nport = nodes[(26*i+24):(26*i+26)];
knownHosts[nid]=socket.inet_ntoa(nhost);
knownPorts[nid]=ord(nport[0])*256+ord(nport[1]);
if bitdif(nid,targetID) < bitdif(hisid,targetID):
FindNodeReq(myID,targetID,knownHosts[nid],knownPorts[nid]);
knownHosts[hisid] = hn;
knownPorts[hisid] = int(hp);
return hisid;

def parseGetDataResponse():
global UDPSocket,myID,targetID;
(msg,host) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
token = data[‘r’][‘token’] or print(data);
nodes = data[‘r’][‘nodes’];
print nodes;
return token;

def readGetDataResponse():
global UDPSocket,myID,targetID;
(msg,host) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
token = data[‘r’][‘token’];
nodes = data[‘r’][‘nodes’];
print nodes;
return nodes;

def bitdif(ia,ib):
totalDifferences = 0;
for i in range(0,len(ia)):
for j in range(0,8):
if ord(ib[i]) & (0x01 << j) != ord(ia[i]) & (0x01 <<j):
totalDifferences+=1;
return totalDifferences;

def findClosestPeerMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Searching",
foundID = 0;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
foundID = parseFindNodeResponse();
else:
print "!",
reqs=0;
return (knownHosts[foundID],knownPorts[foundID])
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return (knownHosts[foundID],knownPorts[foundID]);

def getDataMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Waiting",
data = "";
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
data = parseGetDataResponse();
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def getDataReadLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Waiting",
data = "";
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
data = readGetDataResponse();
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def announcePeerMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
data = False;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
data = True;
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def usage():
print "Usage:";
print "client <-i|-o> [–key=key]";
print "where -i will store data on stdin, and -o will retrieve data to stdout";

def main():
global targetID, myID;
try:
opts, args = getopt.getopt(sys.argv[1:], "hk:s:p:io", ["help", "key=","server=","port="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
rootHost = "router.bittorrent.com";
rootPort = 6881;
client = ‘Az’+str(0x05)+’31’;
save = True;
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o == "-o":
save = False;
elif o == "-i":
save = True;
elif o == "–key":
targetID = a;
if len(targetID)!=20:
targetID = targetID[0:20]+" "*(20-len(targetID));
elif o == "–server":
rootHost = a;
elif o == "–port":
rootPort = int(a);
else:
assert False, "unhandled option";
# inititation
if save:
#to store data, we’re going to
# 0. read the data in
# 1. find the node closest to the key
# 2. put in a bunch of phoney add_peers to save data there
print "Enter Message:";
data = ”;
for line in sys.stdin:
data += line;
print "Finding Host in charge of key %s"%binascii.b2a_base64(targetID);
FindNodeReq(myID,targetID,rootHost,rootPort);
(targetHost,targetPort) = findClosestPeerMainLoop();
print;
packets = 1+len(data)/20;
print "Adding Data (%d packets)"%packets,
for i in range(0,packets):
buf = data[(i*20):((i+1)*20)];
buf += " "*(20-len(buf));
GetPeersReq(buf,targetID,targetHost,targetPort);
token = getDataMainLoop();
AnnouncePeer(buf,targetID,token,10000+i,targetHost,targetPort);
confirm = announcePeerMainLoop();
print ".",
print "Done";
# check to see if it held
GetPeersReq(myID,targetID,targetHost,targetPort);
token = getDataReadLoop();
else:
#to retrieve data, we’re going to
# 1. find the node closest to the key
# 2. run the get_node, and parse the returned datals
FindNodeReq(myID,targetID,rootHost,rootPort);
(targetHost,targetPort) = findClosestPeerMainLoop();

if __name__ == "__main__":
main()

UDPSocket.close()
[/python]

Distributed Computing and Me

The moniker distributed computing has become a broad catchall term that has been applied to everything from parallel computation to the auto configuration of ad-hoc networks. In order explore the technical problems in this field it is necessary to clearly define what exactly the field is, and to this end I want to elaborate on what I believe is and is not distributed computing.
The first use of distributed computing was to describe the problem faced by projects like SETI at home and the genome project. These organizations needed to process huge quantities of data and did not have the budget to do so internally, and instead turned to the greater population. They did not diverge from client-server architecture, but allowed any computer running their client software to request and process chunks of data. The problem that was created was not one of communication, but of trust, since none of the clients could be expected to be either reliable or trustworthy. This problem is typically solved by checking for suspicious responses, and by sending each chunk of data to multiple clients to ensure they agree on an answer.
Five years ago the term grid computing was a new big thing, though it has since been outmoded by the term cloud computing. Both of these models are related to distributed computing, in that they focus on the problem of scalability and reliability of a large number of machines. Both terms imply that the machines are entirely under ones control, with grid computing referring specifically to the use of such clusters internal to an organization and cloud computing referring to the outsourcing of that resource. The problem faced is not as different from the previous as we might initially think. Given the large number of machines and components, failure will occur regularly, and it is important to make sure data and computation are redundant and can hide these individual problems. The problem is lessened only in that nodes are typically not purposefully malicious in this scenario. Cloud computing and grid computing before it both are also systems which have been built and have essentially overcome the issues faced. Many large tech companies run enormous data centers, and consumer services such as Google’s AppEngine and Amazon’s EC2 allow consumers to outsource their computation to the cloud.
Mesh networks are another seemingly disjoint field that has traditionally been linked to distributed computing. Here, a number of computers need to work together to access a common resource, like the Internet, but don’t necessarily trust each other. The typical visualization is that only one computer will have an Internet connection, and that resource needs to be shared to computers that are not directly connected to it. The problem here is now primarily one of trust, since not only is there a likelihood of reliability issues, but each node benefits from ignoring requests from others since that leaves more bandwidth for it. There have been several implementations of mesh computing, although none that can boast tremendous success. The most known is certainly the OLPC, which attempted to use mesh computing as a solution to the intermittent access to resources found in developing countries. Beyond these bespoke solutions, there has been a 802.11s standard for mesh computing developed by the 802.11 working group, which defines a global standard and frequency for wireless communication.
To me distributed computing is both all and none of these problems. Instead I see the fundamental problem to be answered as one of cooperation. Users should be viewed as selfish, in that they want to get the most from a service while giving as little as possible. For proof of this, look at the number of successful free web services versus those that cost money. Free and ad supported continues to rule as a business model because most users are unwilling to pay more than they need to for a service. The challenge then is to get a set of strangers to cooperate so that they all get something more than they started with.
In addition to the previously mentioned fields, this problem is faced by file sharing networks. Here as much as anywhere we see the need for cooperation among selfish individuals. The goal for each user is to get data from others as quickly as possible, while sending as little as possible – both for legal and purely selfish reasons. One attempt to solve this problem have been to develop communities surrounding the technology so that past performance is reflected and good behavior encouraged, but even this is not foolproof.
For me, the goal of distributed computing is to provide a structure where users can gain the ability to burst beyond resources they control, and not worry about peers who are malicious or self-serving. I plan to delve into this problem by looking and experimenting with existing systems, looking specifically at the issue of fault tolerance – which I see as fundamental to any solution, and then building a structure of my own.

The One-Day Website

SetTimeout Logo
I spent today building http://setTimeout.net, a website that I was inspired to create yesterday evening.

I set myself the goal of finishing the project in one day, I’ve managed to get done enough in that time period, and I’m pretty happy with how it turned out. I came up with several ideas for how to improve it in the process, namely letting you create a bookmark that immediately saved a page for a set duration without further interaction, and integration with twitter.

The goal of the site is to work like javascript’s setTimeout(); function. You pass it a URL, and a time (in days), and the url will pop up in your news reader after the timeout expires. It’s useful if you want to check on the status of a project, but it isn’t interesting enough to monitor constantly, or if you find an interesting website that isn’t loading.

It’s very minimal in a lot of ways, and that’s sort of the point. It’s actually fairly easy to interface with: you give it data in one end, and when the timeout expires it spits them out as rss. I’m considering spending another day at some point to allow it to push data when the timeout expires, or to provide alternative interfaces to the resulting pages.