Tag Archives: Online

Torrent Auditor

I have now migrated the python torrent client that I’ve been working on to a google code project.

It lives at torrentauditor and now has basic support for actually downloading torrent files.

I researched the bittorrent extension protocols this week, but was somewhat frustrated by what I found. Most of the interesting ones are implemented on a per-client basis, and aren’t well documented outside of that client. The Vuze client it turns out switches to an entirely different application specific protocol when it meets another client of the same time. The libtorrent based clients do much the same thing, although they send their additional messages over the existing connection.

However, the good news is that the basic protocol is friendly enough that it can be implemented without major trouble. I chose to focus on in-order reading for now simply for simplicity sake, although it is highly inefficient.

One goal that I’m going to try to focus on a bit in the next weeks as I have time, is to be able to extract frames of videos from downloaded data. For my digital animation class I would like to make an automated program that stitches together frames / short clips of videos entirely automatically – a visual representation of the swarm.

Auditing Bit torrent

One of the strengths of bit torrent is that the primary data transfer protocol is entirely separate from the advertisement protocol. This also has created a strain both in discovering other users who have data, and keeping accurate reports of data that was transfered.

The first issue is one that has been developed for extensively, culminating in many extensions to the protocol which purport to make it easier to find other users. These include distributed trackers, PEX, DHT, among many others.

The second issues has been covered less throughly, since it is a problem that can not fundamentally be solved due to the distributed nature of the system. There is no real way to verify the legitimacy of statistics a client reports, since neither it nor any of the peers it has interacted with can be trusted.

One attempt to get a better sense of what is really going on is to create a client that actually interacts with with the data transfer protocol, to verify that reported statistics are not entirely inaccurate. This client does not interact in the traditional way, but will infrequently connect to peers and ask them to send it data – which it can then use to estimate the bandwidth of that client. This knowledge combined with knowledge of which clients have what portions of the data will allow the client to estimate the interactions that are taking place within the swarm.

These estimates can then be checked against reported statistics to discover when a client is misreporting its statistics.

The code below is not finished. It completes the initial functions of peer discovery and connection, but is not able to successfully download or monitor peers. The primary focus of work will be to implement the encryption protocol which is now standard for torrent traffic, so that the client is able to interact successfully with most users.

[python]
# Standalone Torrent Auditor
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select
import hashlib
import urllib

#Initialize a UDP Socket,
#and the other global info about who this client is
client = "AZ"+str(0x05)+"31";
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM);
s.connect(("msn.com",80));
myIP = s.getsockname()[0];
s.close();
myPort = 6886;
UDPSocket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM);
UDPSocket.bind((myIP,myPort));
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
knownPeers=[];

#handle sending a raw UDP datagram
def sendData(data,host,port):
global UDPSocket;
#print ‘messaged %s:%d’%(host,port);
UDPSocket.sendto(data,0,(host,port));

#load in a .torrent file
def readFile(filePath):
f = open(filePath, ‘r’);
data = ”.join(f.readlines());
structure = benc.bdecode(data);
return structure;

#register with the tracker to get peers
def register(torrent):
url = torrent[‘announce’];
ihash = hashlib.sha1(benc.bencode(torrent[‘info’])).digest();
query = urllib.urlencode({‘info_hash’:ihash,
‘peer_id’:myID,
‘port’:myPort,
‘uploaded’:0,
‘downloaded’:0,
‘left’:0,
‘compact’:1,
‘event’:’started’});
url += "?"+query;
trackerhandle = urllib.urlopen(url);
trackerdata = ”.join(trackerhandle.readlines());
trackerhandle.close();
parseddata = benc.bdecode(trackerdata);
initialnodes = parseddata[‘peers’];
peers = [];
while len(initialnodes) > 5:
ip = initialnodes[0:4];
port = initialnodes[4:6];
initialnodes = initialnodes[6:];
peers.append({‘state’:0,’ip’:socket.inet_ntoa(ip),’ihash’:ihash
,’port’:ord(port[0])*256+ord(port[1])});
return peers;

def AnnouncePeer(myID,key,token,lp,host,port):
data = {‘q’:’announce_peer’,’a’:{‘id’:myID,’info_hash’:key,
‘token’:token,’port’:lp},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def parseQuery():
global UDPSocket,knownPeers;
(msg,(hn,hp)) = UDPSocket.recvfrom(4096); #should be more than enough
found = 0;
for p in knownPeers:
if p[‘ip’] == hn and p[‘port’] == hp:
found = 1;
p[‘state’] &= 2;
print msg;
if not found:
print msg;
knownPeers.append({‘state’:2,’ip’:hn,’port’:hp,’ihash’:0});
#data = benc.bdecode(msg);

#check the type of message here, maybe
#hisid = data[‘r’][‘id’];
#nodes = data[‘r’][‘nodes’];
#l = len(nodes)/26;
#for i in range(0,l):
# nid = nodes[(26*i):(26*i+20)];
# nhost = nodes[(26*i+20):(26*i+24)];
# nport = nodes[(26*i+24):(26*i+26)];
# knownHosts[nid]=socket.inet_ntoa(nhost);
# knownPorts[nid]=ord(nport[0])*256+ord(nport[1]);
# if bitdif(nid,targetID) < bitdif(hisid,targetID):
# FindNodeReq(myID,targetID,knownHosts[nid],knownPorts[nid]);
#knownHosts[hisid] = hn;
#knownPorts[hisid] = int(hp);
#return hisid;

def initiateConns():
global knownPeers;
inited = 0;
for p in knownPeers:
if(p[‘state’] == 0 and inited < 5): #uncontacted
announce = str(0x19) + ‘BitTorrent protocol’;
announce += str(0x0)*8;
announce += p[‘ihash’];
announce += myID;
p[‘state’] = 1; #contacted
inited += 1;
sendData(announce,p[‘ip’],p[‘port’]);
return inited == 0;

def MainLoop():
global UDPSocket;
print "Communicating",
rate = 0;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
parseQuery();
else:
if initiateConns():
return;
continue; #we don’t care much about errors, since it’s all datagrams

def usage():
print "Usage:";
print "client –file=loc.torrent";
print "Will report on statistics for the desired torrent";

def main():
global myID,knownPeers;
filePath = "default.torrent";
try:
opts, args = getopt.getopt(sys.argv[1:], "hf:", ["help", "file="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o in ("-f", "–file"):
filePath = a;
else:
assert False, "unhandled option";
print "Loading Info… ",
info = readFile(filePath);
print "okay";
print "Detecting Swarm… ",
seeds = register(info);
print len(seeds), " peers returned";
knownPeers.extend(seeds);
print "Entering Main Loop";
MainLoop();
print "Finished Snapshot";
print "Discovered Swarm State:";
for p in knownPeers:
print p[‘ip’],": ",
if(‘has’ in p):
print p[‘has’],
if(‘speed’ in p):
print p[‘speed’];
else:
print "unconnectable";

if __name__ == "__main__":
main()

UDPSocket.close()
[/python]

Talking to Kad

[python]
# Standalone Mainline Kad Client
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select

client = "AZ"+str(0x05)+"31";
UDPSocket = socket.socket(socket.AF_INET,socket.SOCK_DGRAM);
targetID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20));
reqs = 0;
knownHosts={};
knownPorts={};

def sendData(data,host,port):
global UDPSocket,reqs;
reqs += 1;
#print ‘messaged %s:%d’%(host,port);
UDPSocket.sendto(data,0,(host,port));

def sendPing(myID,host, port):
data = {‘q’:’ping’,’a’:{‘id’:myID},’v’:client,’y’:’q’,’t’:0x05+0x05};
sendData(benc.bencode(data),host,port);

def GetPeersReq(myID,ih,host,port):
data = {‘q’:’get_peers’,’a’:{‘id’:myID,’info_hash’:ih},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def FindNodeReq(myID,target,host,port):
data = {‘q’:’find_node’,’a’:{‘id’:myID,’target’:target,’want’:[‘n4′]},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def GetPeersReq(myID,ih,host,port):
data = {‘q’:’get_peers’,’a’:{‘id’:myID,’info_hash’:ih},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def AnnouncePeer(myID,key,token,lp,host,port):
data = {‘q’:’announce_peer’,’a’:{‘id’:myID,’info_hash’:key,’token’:token,’port’:lp},’v’:client,’y’:’q’,’t’:str(0x05)+str(0x05)};
sendData(benc.bencode(data),host,port);

def parseFindNodeResponse():
global UDPSocket,myID,targetID,knownHosts,knownPorts;
(msg,(hn,hp)) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
#check the type of message here, maybe
hisid = data[‘r’][‘id’];
nodes = data[‘r’][‘nodes’];
l = len(nodes)/26;
for i in range(0,l):
nid = nodes[(26*i):(26*i+20)];
nhost = nodes[(26*i+20):(26*i+24)];
nport = nodes[(26*i+24):(26*i+26)];
knownHosts[nid]=socket.inet_ntoa(nhost);
knownPorts[nid]=ord(nport[0])*256+ord(nport[1]);
if bitdif(nid,targetID) < bitdif(hisid,targetID):
FindNodeReq(myID,targetID,knownHosts[nid],knownPorts[nid]);
knownHosts[hisid] = hn;
knownPorts[hisid] = int(hp);
return hisid;

def parseGetDataResponse():
global UDPSocket,myID,targetID;
(msg,host) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
token = data[‘r’][‘token’] or print(data);
nodes = data[‘r’][‘nodes’];
print nodes;
return token;

def readGetDataResponse():
global UDPSocket,myID,targetID;
(msg,host) = UDPSocket.recvfrom(4096); #should be more than enough
data = benc.bdecode(msg);
token = data[‘r’][‘token’];
nodes = data[‘r’][‘nodes’];
print nodes;
return nodes;

def bitdif(ia,ib):
totalDifferences = 0;
for i in range(0,len(ia)):
for j in range(0,8):
if ord(ib[i]) & (0x01 << j) != ord(ia[i]) & (0x01 <<j):
totalDifferences+=1;
return totalDifferences;

def findClosestPeerMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Searching",
foundID = 0;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
foundID = parseFindNodeResponse();
else:
print "!",
reqs=0;
return (knownHosts[foundID],knownPorts[foundID])
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return (knownHosts[foundID],knownPorts[foundID]);

def getDataMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Waiting",
data = "";
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
data = parseGetDataResponse();
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def getDataReadLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
print "Waiting",
data = "";
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
print ".",
data = readGetDataResponse();
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def announcePeerMainLoop():
global UDPSocket,knownHosts,knownPorts,reqs;
data = False;
while 1:
(x,y,z) = select.select([UDPSocket],[],[],1) #wait to receive something
if len(x):
data = True;
else:
print "!",
sys.stdout.flush()
reqs -= 1;
if reqs == 0:
return data;

def usage():
print "Usage:";
print "client <-i|-o> [–key=key]";
print "where -i will store data on stdin, and -o will retrieve data to stdout";

def main():
global targetID, myID;
try:
opts, args = getopt.getopt(sys.argv[1:], "hk:s:p:io", ["help", "key=","server=","port="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
rootHost = "router.bittorrent.com";
rootPort = 6881;
client = ‘Az’+str(0x05)+’31’;
save = True;
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o == "-o":
save = False;
elif o == "-i":
save = True;
elif o == "–key":
targetID = a;
if len(targetID)!=20:
targetID = targetID[0:20]+" "*(20-len(targetID));
elif o == "–server":
rootHost = a;
elif o == "–port":
rootPort = int(a);
else:
assert False, "unhandled option";
# inititation
if save:
#to store data, we’re going to
# 0. read the data in
# 1. find the node closest to the key
# 2. put in a bunch of phoney add_peers to save data there
print "Enter Message:";
data = ”;
for line in sys.stdin:
data += line;
print "Finding Host in charge of key %s"%binascii.b2a_base64(targetID);
FindNodeReq(myID,targetID,rootHost,rootPort);
(targetHost,targetPort) = findClosestPeerMainLoop();
print;
packets = 1+len(data)/20;
print "Adding Data (%d packets)"%packets,
for i in range(0,packets):
buf = data[(i*20):((i+1)*20)];
buf += " "*(20-len(buf));
GetPeersReq(buf,targetID,targetHost,targetPort);
token = getDataMainLoop();
AnnouncePeer(buf,targetID,token,10000+i,targetHost,targetPort);
confirm = announcePeerMainLoop();
print ".",
print "Done";
# check to see if it held
GetPeersReq(myID,targetID,targetHost,targetPort);
token = getDataReadLoop();
else:
#to retrieve data, we’re going to
# 1. find the node closest to the key
# 2. run the get_node, and parse the returned datals
FindNodeReq(myID,targetID,rootHost,rootPort);
(targetHost,targetPort) = findClosestPeerMainLoop();

if __name__ == "__main__":
main()

UDPSocket.close()
[/python]

The One-Day Website

SetTimeout Logo
I spent today building http://setTimeout.net, a website that I was inspired to create yesterday evening.

I set myself the goal of finishing the project in one day, I’ve managed to get done enough in that time period, and I’m pretty happy with how it turned out. I came up with several ideas for how to improve it in the process, namely letting you create a bookmark that immediately saved a page for a set duration without further interaction, and integration with twitter.

The goal of the site is to work like javascript’s setTimeout(); function. You pass it a URL, and a time (in days), and the url will pop up in your news reader after the timeout expires. It’s useful if you want to check on the status of a project, but it isn’t interesting enough to monitor constantly, or if you find an interesting website that isn’t loading.

It’s very minimal in a lot of ways, and that’s sort of the point. It’s actually fairly easy to interface with: you give it data in one end, and when the timeout expires it spits them out as rss. I’m considering spending another day at some point to allow it to push data when the timeout expires, or to provide alternative interfaces to the resulting pages.

Launchpad Update

I am almost done rewriting my launchpad code so that it can be run without a kernel module. Instead, I’ll be using libusb, which is a reasonably common and cross-platform library for interacting with USB devices.

I’m having a couple issues with callbacks and polling, but I’m making pretty steady progress, and should have a stable version to post pretty soon.

The header for interactions with the launchpad is going to look like this:

typedef void(*launchpad_callback)(unsigned char* data, size_t len, void* user_data);

struct launchpad_handle* launchpad_register(launchpad_callback e, void* user_data);

int launchpad_write(struct launchpad_handle *dp, unsigned char* data, size_t len);
int launchpad_poll(struct pollfd* descriptors, size_t num);

void launchpad_deregister(struct launchpad_handle* dp);

You start by writing your callback function, which will be called whenever new data is available from the launchpad, or a write to the launchpad completes. Then register that to start receiving notifications. Call write to send data to the device, and use the launchpad_poll in your main loop, which externally will act as a standard system poll() call, but also handles device events.

It’s worth noting that you can play with the kernel module already by downloading the code from the project page

On the technical side, I’ve worked through a couple issues that took a bit more debugging than I really wanted, so I figured I’d post them here:

The correct formulation forĀ libusb_lock_events appears to be put immediately before your call to poll(), and you should unlock immediately afterwards. If you lock events for the entire access time, you will find that although you’re polling, the reads and writes you initiate never get processed.

If libusb_submit_transfer() is failing with code -1, it’s possibly a IO error, meaning that you don’t have your endpoint correctly defined. For me the issue was that although my output endpoint was an interrupt type, it actually was registered as a bulk-data type. (that is, it’s address was 0x02 rather than 0x01. Checking your endpoints with lsusb -v will let you check what the actual endpoints should be.)