Category Archives: Online

Measuring page speed

There is a large amount of effort spent in the networking community in an attempt to minimize latency.  It would be really great if the data I transfer could take a geographically efficient route, and if the things I ask for get sent back.  It turns out that reality lags a bit behind this ideal.

Geographic Efficiency.

There is an ultimatum that it will take at least 67ms for my traffic to get to the other side of the world – namely the speed of light.  If that was true, we would be in great shape.  In reality, it generally takes 200 to 250ms to go around the world, and if you want to get a response that means waiting 400-500ms.  There are two reasons for inefficiency.  First, data is not going to travel on the shortest path to the destination – there isn’t a wire on that path.  Instead it will travel through the various autonomous systems which form the Internet.  These systems are connected in an economically driven way, which compete over choosing the true shortest path.  Second, there isn’t a single wire connection you and the destination.  Instead, there are many (expect 10-15 for continental US, more for other continents) routers that the data will pass through.  Each router need to figure out which of the many wires connected to it my data needs to be sent out on.  To do this, they actually have to spend the time to read the data, check that it has a valid checksum and  is ‘intact’, and then put it in queue after any previously routed data to the outgoing wire.  A substantial fraction of transfer time is spent on routers, rather than on the wire.

Technical Efficiency.

Applications are built on top of this network, and have to cope with delays that are noticeable by users each time data needs to be transferred.  How do they do it?  The first realization is that there are a couple common types of ways that data will need to be transferred.  One is in ‘near realtime’, with examples like phone calls, video chats, and the like.  For this, there is a UDP protocol, which offers best-effort delivery – you bundle your data into small bundles and send them out.  There is no guarantee of if or when they’ll get to the other side – okay, because if there is a problem it isn’t worth waiting for that data to show up much later, the damage has already been done.  On the flip side, the other common protocol is TCP, which guarantees the reconstruction of a data stream.  The potential downside in this conversation is that a tcp connection has an initiation phase.  This takes 1 round trip (you send a request, the destination acknowledges it, and then you can begin sending data).   Tick off 400ms if you were trying to get somewhere in europe or asia.

Once a connection is made, you have to actually request the web page you want.  This is done using the HTTP protocol.  The main cost that is worth noting with HTTP is that it allows fetching of individual resources.  If I want the page, I ask for it, and the server responds.  In the response, there may be a reference to an image.  I will need to the request that image, and get it in a response – the protocol doesn’t let the server send me additional resources until I ask for them.  (Why, you might ask, would you make pages composed of multiple resource if you care about speed?  The answer seems to be that I might already have some of those resources saved in a local cache, and won’t necessarily need to request them from you.)  The upshot is that at best there are two additional rounds of communication required to get all the resources together to actually show a web page.

Reality.

This chart shows averages for how long it took to load each of 100 different web sites.  The requests were made from planet lab, a geographically-diverse set of university computers (which are not terribly reliable.).  Ignore the tails, which are products of requests being banned.  This is indicating that for an average (around the world, not average in the US) user, it takes 3-5 seconds to load a web page.  That’s horrible!

Page Load Time 

Toggling a pulse audio client’s sink

I often find myself pulling up the pulse audio GUI to flip the output of a specific application between speakers and headphones. I wanted to set up a hot key to toggle where audio from the active application was sent, but as far as I could tell, there wasn’t a way to do that from the command line.

A quick C program was required for communicating with the pulse audio server, and I’ve uploaded that to github.

To toggle the sink for the active application, I’m using this shellscript:

#!/bin/sh
root=`xprop -root _NET_ACTIVE_WINDOW`
activewindow=`echo $root | cut -d" " -f 5`
pidstr=`xprop -id $activewindow _NET_WM_PID`
pid=`echo $pidstr | cut -d" " -f 3`
patogglepid $pid

proxy2p


A quick screenshot of what I’m doing for my distributed computing final project. I’m calling it friend.s for now, although I’m not totally satisfied with that name. It’s a cross-platform piece of software that brings elements of peer-to-peer communication into your web browser. The goal is to be able to eventually offer a completely decentralized social network that can offer the same services as Facebook while guaranteeing privacy.

Distributed Computing so far

I gave a half-hour presentation on my research at the admitted students weekend earlier.  I think I was able to convey some sense of the opportunities that Mudd offers for research, and presented a reasonable face of what it could look like.

The talk was an explanation of the concepts behind distributed hash tables, and an introduction to the work I’ve been doing with bitTorrent.

These are the slides I used: Admitted Students presentation

Experimenting with 802.11s

The purpose of this document is an explanation of what is defined in the IEEE 802.11s standard, and the states of current implementations based on this standard. 802.11s is the standardization of mesh networking. That is to say that it defines a protocol for mobile devices where they can communicate amongst themselves even when not directly connected, and can route data beyond their immediate environment.
This work has been ongoing for close to a decade, and has been progressing as a single proposal for the last four. The standard is for a complete system, explaining the MAC level standard that data will be transferred. This includes how clients can securely create connections, and how and when they should resend packets they receive which are meant for other devices in the mesh. As a result, this standard has become effectively all encompassing, as it defines standards from the hardware signal levels all the way up to complex software routing and security.
Even today, the standard is not formalized, meaning that nothing is focalized, and implementers are really very cautious to begin implementing mesh networking, because it may change and they would then have to re-write or design their implementation (it means specifically that wireless cards are not being built with mesh networking in mind, since chip design is more expensive and less changeable then software drivers.) The lowest level changes are made to the wireless frames in order to allow hops within the mesh, and would benefit from hardware changes to make those changes quickly implemented, however they will still work even when those changes are processed entirely in software, which is the current choice implementations are making until the standard is formalized.
Routing can be specified either explicitly, as the tree that the source is aware of that will extend to the root (or final destination in the mesh) or ad-hoc, where intermediate nodes make their own decisions on how to route the packet. However, routing is done using a QOS-esque process, where the route is first established, and subsequent packets can use the same path until it needs to be recalculated due to topological changes. In addition there is a periodic discussion about which node in the mesh should act as a root, and the ordering of nodes, which is based upon which node has wired access to other networks, and tie-broken by the MAC addresses. Beyond this there are different levels of structure that these networks can take on, similar to the differences we see today in ad-hoc networks and networks with access-points. These are called meshes with and without registration. Without registration, any node can come and go, and the rest of the mesh is unaware of any topology beyond that directly adjacent to it. In registration mode, the root is aware of the entire tomography of the network, and can make informed choices about how routes should be created.
While the major protocol specification seems to not be changing much at the present, there is still enough flux in details that implementations are not compatible with each other, and are still very rudimentary. I looked at three different implementations, none of which are inter-compatible, and all claim to be based off of the standard.
The first is the implementation of mesh networks used in the OLPC. This was one of the first implementations of mesh networking, and is not compatible with any widely used implementation outside of those devices. It works in OLPC clusters, and is a technical feat, although since the central points that were intended to ground the mesh networks have not yet materialized, it is quite underutilized as I understand it. Instead, in the situations where OLPCs have been deployed, a standard 802.11b network typically works well since all of the machines are close enough to the central access point.
The second is the open 802.11s project, which is designing a software driver for the Linux kernel. This version works currently for a limited set of wireless cards, whose Linux drivers are sufficiently open to allow the needed modifications. It is not included as part of the standard kernel distribution, which is a fairly powerful comment on the state of it’s development, since the kernel is typically thought of as including everything and the kitchen sink. It is simply the reference implementation being created as part of the development of the standard.
The third implementation is available in the latest version of FreeBSD. This version is included in the distribution, and notes that it does not work with the Linux version due to how primitive they are. I was unable to test this implementation as I was hoping to, because the emulated lab I was using didn’t allow the permissions that the code wanted to make to the wireless drivers. However, this code promises to be one of the most practical current versions of mesh networking. Setup is performed through the standard ifconfig linux tool. Where one specifies the access point in other protocols, here they specify the mesh they wish to use, and the parameters associated with it. Once the mesh is specified, routes are automatically formed, and users can manually specify nodes they wish to block or favor using included implementations.
I think the important takeaway from this exploration is into the state of mesh networking. I think the IEEE process here has been a hindrance to the development, and that it has trouble keeping up with the technological developments that are going on in this area. Enough new research keeps coming in that the standard needs to be constantly updated to keep from being outdated, and in so doing means that vendors are hesitant to implement inoperable devices, and keeps the whole field from the popularity it needs to really succeed. I think that we have had the opportunity to settle on an imperfect draft of this standard and actually have widespread mesh networking, but due to the fact that it can become useful only if everyone does the same thing, we have been very hesitant to do so.
Hopefully as smart phones continue to evolve we will begin to see mesh networking standardized in those devices, as the potential it envelops becomes increasingly important to our communication. Given the issues we’ve seen in recent years with cell companies growing troubles keeping their access point bandwidth in line with the consumer demand feels unsustainable, and will hopefully result in an increasing desire for truly distributed communication for both better reliability and bandwidth.

on p2p in the browser

The final project I want to accomplish this semester in my distributed computing independent study, is to come up with a way to access p2p capabilities from within the browser. I was originally hoping to do this with JavaScript API that relied behind the scenes on the new flash p2p protocol, rtmfp. The protocol has been around since last spring, but development for it is not exactly optimal. You need to be using flash to make the movie, and you need a proprietary and expensive piece of server software to coordinate data transfers. The protocol itself is still proprietary, meaning that Adobe won’t actually tell you how to make a server that can work with it, or how data is sent across the wire.

They finally have started advertising it at least, and so it will gain some prominence, and eventually the details will come to light, but for now it is unrealistic to attempt to use flash for a general server agnostic service. (http://www.ietf.org/proceedings/10mar/slides/tsvarea-1.pdf)

Instead, my plan is to modify the open source privoxy software and add p2p capabilities at that layer. My eventual goal is to make a communal grease-monkey system, where pages can be modified by the user through pieces of JavaScript, and those scripts can then be shared to friends. With the right level of abstraction, I think that this can produce a very powerful system. Starting next week, I’ll begin reading and hacking privoxy to find out how to integrate new code into that project.

Torrent Auditor

I have now migrated the python torrent client that I’ve been working on to a google code project.

It lives at torrentauditor and now has basic support for actually downloading torrent files.

I researched the bittorrent extension protocols this week, but was somewhat frustrated by what I found. Most of the interesting ones are implemented on a per-client basis, and aren’t well documented outside of that client. The Vuze client it turns out switches to an entirely different application specific protocol when it meets another client of the same time. The libtorrent based clients do much the same thing, although they send their additional messages over the existing connection.

However, the good news is that the basic protocol is friendly enough that it can be implemented without major trouble. I chose to focus on in-order reading for now simply for simplicity sake, although it is highly inefficient.

One goal that I’m going to try to focus on a bit in the next weeks as I have time, is to be able to extract frames of videos from downloaded data. For my digital animation class I would like to make an automated program that stitches together frames / short clips of videos entirely automatically – a visual representation of the swarm.

Downloading Torrents

I extended the work from last week in order to actually get data from a swarm. The main change is that new sockets are now allocated for each connection, state is remembered, and the client can actually get so far as to download data from the other peers.

What needs to happen still is that 1. I need a better way of stopping, right now it stops upon a lull in network activity, but that doesn’t really mean anything. 2. I need to actually save downloaded data somewhere, or alternatively keep track of the rate that it’s coming in (or the time delay between packets.) 3. I need to factor out the different message types into separate modules to keep the code readable.

I’m going to set up a code page for this project soon, so that I don’t have to post stuff just in this blog. I should get to that in the coming week, so that there’s a nicer way of interacting with this code base. This model isn’t super memory efficient, but it is a very simple system to work with, and I think it’s become a good jumping off point for trying to interact with swarms. In particular it’s really easy to add in hooks to get methods to run when messages are received, and to store arbitrary state about connections.

[python]
# Standalone Torrent Auditor
#
import socket
import time
import sys
import getopt
import random
import benc
import binascii
import select
import hashlib
import urllib
import server

#a bit of global state about who we are
client = "AZ"+chr(0x05)+"31"
myID = "".join(chr(random.randrange(0, 256)) for i in xrange(20))
peerCache=[]

#load in a .torrent file
def readFile(filePath):
f = open(filePath, ‘r’)
data = ”.join(f.readlines())
f.close()
return benc.bdecode(data)

#register with the tracker to generate potential peers
def register(torrent,myPort):
url = torrent[‘announce’];
ihash = hashlib.sha1(benc.bencode(torrent[‘info’])).digest();
query = urllib.urlencode({‘info_hash’:ihash,
‘peer_id’:myID,
‘port’:myPort,
‘uploaded’:0,
‘downloaded’:0,
‘left’:0,
‘compact’:1,
‘event’:’started’});
url += "?"+query;
trackerhandle = urllib.urlopen(url);
trackerdata = ”.join(trackerhandle.readlines());
trackerhandle.close();
parseddata = benc.bdecode(trackerdata);
initialnodes = parseddata[‘peers’];
peers = [];
while len(initialnodes) > 5:
ip = initialnodes[0:4];
port = initialnodes[4:6];
initialnodes = initialnodes[6:];
peers.append({‘state’:0,
‘ip’:socket.inet_ntoa(ip),
‘ihash’:ihash,
‘port’:ord(port[0])*256+ord(port[1]),
‘buffer’:”,
‘callback’:handleData});
return peers;

#register a new incoming connection
def onNew(socket,(ip,port)):
obj = {‘socket’:socket,
‘ip’:ip,
‘port’:port,
‘state’:1,
‘buffer’:”,
‘callback’:handleData};
handleData(obj)
server.track_socket(obj);

#reply to a socket with a torrent message
def reply(socket,mtype,payload):
#first flatten the payload
s = ”;
while len(payload):
p = payload.pop();
if isinstance(p,int):
s += struct.pack(‘!i’,p)
else:
s += p
s = chr(mtype) + s;
l = len(s);
pl = struct.pack(‘!i’,l) + s;
socket.sendall(ppl);
return pl;

#parse torrent msg
def handleMsg(obj,msg):
if msg[0] == 0: #choke
obj[‘state’] &= ~4
elif msg[0] == 1: #unchoke
obj[‘state’] |= 4
#and we’d like to request a piece from them.
reply(obj[‘socket’],6,[0,0,2<<15])
elif msg[0] == 2: #interested
obj[‘state’] |= 8
elif msg[0] == 3: #uninterested
obj[‘state’] &= ~8
elif msg[0] == 4: #have
idx = struct.unpack(‘!i’,msg[1:5])
obj[‘have’][idx/8] |= (1 << (8 – (idx%8)))
elif msg[0] == 5: #bitfield
obj[‘have’] = msg[1:]
elif msg[0] == 6: #request
#ignored
return;
elif msg[0] == 7: #piece
#got our data
print "succeeded in downloading!"
elif msg[0] == 8: #cancel
#we aren’t in that business
return;
#parse incoming data
def handleData(obj):
try: nbuf = obj[‘socket’].recv(4096)
except socket.error, err:
print "disconnected %s" %obj[‘ip’]
server.closed_socket(obj[‘socket’])
return;
if not nbuf:
print "disconnected %s" % obj[‘ip’]
server.closed_socket(obj[‘socket’])
return;
data = obj[‘buffer’] + nbuf;
#Handshake
if obj[‘state’] &2 == 0:
if data[0] == chr(19) and len(data) >= 68:
obj[‘ihash’] = data[28:48]
obj[‘peerid’] = data[48:68]
obj[‘buffer’] = data[68:]
obj[‘state’] += 2
if obj[‘state’] & 1== 0:
#we need to respond to the handshake
obj[‘socket’].sendall(handshake(obj[‘ihash’]))
obj[‘state’] += 1
print "shook hands with %s" % obj[‘ip’];

#all other messages are prefixed by their length
else:
mlen = struct.unpack(‘!i’,data[0:4])[0]
while len(data) > mlen:
msg = data[1:mlen+1]
data = data[mlen+1:]
if len(msg):
#Actual message received
handleMsg(obj,msg);
if len(data):
mlen = ord(data[0]);
else:
break;
#save the rest of the data
obj[‘buffer’] = data
print "unknown message %s"%data

#define the handhsake
def handshake(ihash):
announce = chr(19) + ‘BitTorrent protocol’
announce += chr(0x0)*8
announce += ihash
announce += myID
return announce

#talk with cached peers
def onTimeout():
global peerCache;
inited = 0;
if len(peerCache) == 0:
return True;
for i in range(5):
if len(peerCache) == 0:
break;
obj = peerCache.pop()
obj = server.track_socket(obj)
if not obj:
continue;
obj[‘socket’].sendall(handshake(obj[‘ihash’]))
obj[‘state’] &= 1
return False;

def usage():
global peerCache;
print "Usage:";
print "client –file=loc.torrent";
print "Will report on statistics for the desired torrent";

def main():
filePath = "default.torrent";
try:
opts, args = getopt.getopt(sys.argv[1:], "hf:", ["help", "file="])
except getopt.GetoptError, err:
# print help information and exit:
print str(err) # will print something like "option -a not recognized";
usage();
sys.exit(2);
for o, a in opts:
if o in ("-h", "–help"):
usage();
sys.exit();
elif o in ("-f", "–file"):
filePath = a;
else:
assert False, "unhandled option";
print "Loading Info… ",
info = readFile(filePath);
print "okay";
port = 6886;
print "Detecting Swarm… ",
seeds = register(info,port);
print len(seeds), " peers returned";
peerCache.extend(seeds);
print "Entering Main Loop";
onTimeout();
server.main_loop(onTimeout,onNew,port);
print "Finished Snapshot";

if __name__ == "__main__":
main()
[/python]