VirtualAlloc pitfall

When allocating blocks in the disk cache, libtorrent uses valloc(), to allocate page-aligned 16kiB blocks. On windows, the natural couterpart to valloc() is VirtualAlloc(). Having these blocks page aligned may provide performance improvements when reading and writing files that are aligned to the block boundaries.

The 16kiB allocation size is derived from the bittorrent protocol data transfer unit. Even if a piece is 1MiB, you can only ask to download 16kiB per request (and you need multiple pipelined requests to saturate the bandwidth delay product).

There used to be a bug in libtorrent where, on 32 bit systems, memory allocations would start failing even though only a quarter of the physical RAM was in use. The disk cache would grow to about 580 MiB and then stop, see ticket 508.

It turned out to be caused by a little known (and poorly documented) “feature” of VirtualAlloc(). Every allocation it makes, allocates 64 kiB of virtual memory. In libtorrent’s case, every call allocated 16 kiB, but 64kB of virtual memory. This caused the process to run out of virtual address space long before the physical memory was exhausted, because each call to VirtualAlloc would waste 3/4 of the virtual address space it allocated.

This feature is called dwAllocationGranularity. It is actually documented on msdn, but not with VirtualAlloc, where one might expect. It’s in the documentation for the SYSTEM_INFO structure.

Here’s another post diving into some more details of this topic.

now libtorrent uses _aligned_malloc() instead.

Leave a Reply