One of the key algorithms in bittorrent is the rarest-first piece picker. It is vital to bittorrent’s performance that the piece picker fulfills both of these requirements:
The reason to pick a random rarest piece is to always strive towards evening out the piece distribution in the swarm. Having an even piece distribution improves peers’ ability to trade pieces and improves the swarm’s tolerance to peers leaving.
The operations of the piece picker are:
Overall, it is fairly safe to assume that operation 1 and 2 are going to be, by far, the most common ones.
First of all, technically, the piece picker doesn’t actually pick pieces, it picks blocks (16 kiB parts of pieces). This means it needs to keep state on partially picked pieces. It also means it should, among pieces of equal rarity, prefer to pick from a piece that already has outstanding requests (i.e. is partially picked). Keep in mind that partially downloaded pieces are wasteful, the sooner they can be completed, the sooner they can be uploaded to other peers.
There’s a special piece picker mode called end-game mode. If all piece have been picked or are completed, we enter the end-game mode. In this mode, each peer that we are ready to request more blocks from, requests a single block that has already been requested from someone else. This ensures that the download doesn’t get stuck on a single slow peer. Picking only one block at a time (i.e. disabling pipelining of block requests) ensures that the redundant download is kept low.
What data structure should one use to make picking pieces really fast?
Now, the typical data structure would be a list of buckets. Each bucket represents a piece rarity and the bucket is full of piece indices to pieces with that rarity. The buckets with piece indices are shuffled.
With a data structure like this you simply iterate the list from beginning to end, down into each bucket, whenever you find a piece the peer has, you dive down into that piece to pick blocks. Let’s call this the pieces list.
Here are a few observations of this data structure.
The piece picker’s data structure can be considered a multi-index container. That is, a data set which has multiple indices, optimized to do lookups based on different properties of the objects.
In this case, the object is a piece and we want to look it up either based on priority (when scanning in priority order, starting with the lowest values going up) or based on their actual piece index when modifying their rarity.
Each piece object needs to have at least the following state:
This data structure could be illustrated with the following code snippet:
struct piece_pos { int peer_count; // availability int state; // partial or not (i.e. is there an entry in downloading) int index; // index into piece list, -1 means we have it }; std::vector<piece_pos> piece_map; // piece_index -> index into pieces std::vector<int> pieces; // pieces sorted by priority std::vector<int> priority_boundaries; // indices into pieces struct downloading_piece { int index; // piece // for each block; open, requested, writing or finished state std::vector<int> block_state; }; std::unordered_set<int, downloading_piece> downloading;
Illustration of the relationships between the lists in the piece picker
Finding a piece rare piece is then just a matter of iterating pieces until we find a piece that the peer we’re picking from has. Pieces that cannot be picked are removed from pieces, to avoid iterating over them. This includes pieces we already have and pieces that have been fully requested. To pick a piece (not block) can be illustrated by this pseudo code:
int pick_piece(bitmask& have) { for (int p : pieces) { if (!have[p]) continue; return p; } return -1; // in this case we may have to enter end-game mode }
Once we have the piece, we either look-up the downloading_piece object, or create a new one (and update the state in piece_map). In the downloading_piece we mark the blocks we pick as requested, to avoid picking them again.
Incrementing the availability of a piece, can be illustrated by this pseudo code:
void inc_refcount(int piece) { int avail = piece_map[piece].peer_count; priority_boundaries[avail] -= 1; piece_map[piece].peer_count += 1; int index = piece_map[piece].index; int other_index = priority_boundaries[avail]; int other_piece = pieces[other_index]; std::swap(pieces[other_index], pieces[index]); std::swap(piece_map[other_piece].index, piece_map[piece].index); }
This function swaps this piece with the border piece of this piece priority bucked, and then decrements the boundary index to make it belong to the priority bucket one level up. The dec_refcount() function is similar but moves it down, towards lower availability buckets instead.
This operation is not extremely cheap, especially not when having to do it for every piece, every time a peer leaves and joins the swarm. When faced with having to update the availability of many pieces there are two optimizations:
This yields a generally fast piece picker, and is essentially what’s in libtorrent (+a few extra features). However, there are still at least two cases that are not very optimized:
Leave a Reply
You must be logged in to post a comment.