Reducing metadata leakage from software updates

Update: now you can do this with Tor Onion Services

leakageMany software update systems use code signing to ensure that only the correct software is downloaded and installed, and to prevent the code from being altered. This is an effective way to prevent the code from being modified, and because of that, software update systems often use plain, unencrypted HTTP connections for downloading code updates. That means that the metadata of what packages a machine has installed is available in plain text for any network observer, from someone sitting on the same public WiFi as you, to state actors with full network observation capabilities.

That means that potentially private information is leaking. That private information could be which packages you have installed and which versions. That information can help an attacker figure out the best way to break into the target machine. Also, a unique fingerprint can be generated based on which packages a machine has installed, and that could help de-anonymize traffic that goes over Tor or other anonymity tool.

For people who use apt-get in Debian, Ubuntu or any related GNU/Linux distro, there is a lot of metadata leaked to the internet when apt-get contacts Debian repositories using a standard configuration. Mostly, that is because by default, the connections are unencrypted (http, ftp, rsync). The integrity of the package itself is not reason enough to use HTTPS since the GPG signing is much more reliable for that task. Here is how I break it down:

  1. package authenticity
    (software can be modified while being downloaded)
  2. repo availability
    ( whole sites or specific URL paths can be selectively blocked by governments and companies)
  3. package availability
    (software security updates can be individually blocked)
  4. who’s downloading what package (currently visible to anyone who can see the
    network traffic, including open wifi, etc.

The current apt model covers #1 well, but only covers #2 and #3 with a two week window (the expiration date on the repo metadata). And it does not cover #4 at all. Using HTTPS for apt repos is a simple way to improve the security of all 4. It adds a weak backup security layer for #1, it makes it much more difficult for a portion of a large internet mirror to be seletively blocked (e.g. #2 and #3). For example, if you use HTTPS to, everything has to be blocked to block Debian repos or packages. And pipelining downloads through a reused HTTPS connection makes it very difficult for the network observer to track metadata about packages, #4).leakage control

Luckily, there are some relatively easy steps that greatly reduce the amount of metadata that is leaked: using HTTPS connections to the mirrors and running those connections through Tor. Setting apt-get to pipeline as many transactions into a given HTTPS session is also useful, but currently only supported for HTTP and not HTTPS. Even though HTTPS/TLS has security weaknesses, it is a lot better than nothing, and can help provide real world protection. The downside is that it is not common for Debian machines to connect to apt mirrors using HTTPS, so that potentially marks the install as a machine worth targeting. There are more and more HTTPS mirrors, and more interest in using them, so I think in time, that will only lessen as a concern. Here are the HTTPS mirrors that I have had good luck with:


On that note, here is the config that I have been using on a number of Debian-deriv machines, and it has been working well. It requires apt-transport-https, and privoxy setup as an HTTP proxy for Tor.

$ cat /etc/apt/apt.conf.d/99force-tor
# force everything through privoxy HTTP proxy to tor
Acquire::ftp::Proxy "";
Acquire::http::Proxy "";
Acquire::https::Proxy "";

# don't use SSL, its insecure, only use TLS
Acquire::https::SslForceVersion "TLSv1";

I have found about 10 official Debian mirrors that have reliable HTTPS. Then I have a script that finds all of them, but many have self-signed certs and other issues. A number of the HTTPS mirrors also mirror the “security” archive, but I recommend that the http URL to the official repo is still included to make sure that security updates are promptly available.

I also have a test security repo running that is only available via an .onion address. I hope to encourage people to run official mirrors on a Tor Hidden Service, then HTTPS is not needed. Note that apt-transport-tor is not required if a tor proxy is setup. To try mine, add it to your sources.list (and make sure apt-get is somehow using Tor). The order is important, that determines the priority of where apt-get will get the package from is all other variables are the same.

deb http://dju2peblv7upfz3q.onion/debian-security/ wheezy/updates main
deb wheezy/updates main

Update: Use the official Debian Tor Onion Services now, dju2peblv7upfz3q.onion is deprecated and will be shut down!

A specific example: TAILS

TailsTAILS is an operating system that aims to be as private and anonymous as possible to enable, and has allowed journalists like Laura Poitras to work without leaking information despite being targeted by some very skilled and highly resourced organizations. TAILS mostly works as a “live CD”, meaning the whole operating system is downloaded as a single “image” file, then either burned to a CD/DVD, or to a USB thumb drive. Updates work the same way. But TAILS has an optional feature to use the Debian package system to install and persist packages that are not included by default. TAILS does not use the default set of mirrors that a standard Debian install uses, it is set up by default with a range of possible Debian package sources, including the current stable version (called wheezy), the versions in testing, and packages backported to the stable version. That means that when this feature is used, TAILS fetches the metadata for all of those sections of Debian (stable/wheezy, testing, wheezy-backports, unstable).

Given all of the proven fingerprinting approaches, like using the font list from the browser, I think its a safe assumption that the apt-get metadata will also provide similar fingerprinting opportunities. For basic TAILS use, this is all avoided since updates are done via ISO images. But once a user installs packages via apt-get, that changes since TAILS then goes out onto the internet to fetch all of the repo metadata. That goes over Tor since TAILS forces all network traffic over Tor, so that helps break the link between the machine downloading the updates and those that can see that machines internet traffic.

It seems quite likely that the set of mirrors and the order in which they are run will provide a way to identify the system as TAILS. As for identifying individual machines, apt-get sends a lot of metadata, like language that the system is using, which packages need updates, etc. On top of the set of mirrors used, there is potentially enough metadata there to fingerprint the individual machine.

One open question is how the apt-get downloads map to different Tor circuits. If all of the traffic from a given apt-get session goes over a single Tor circuit, then the exit node, the mirror server, and any network observer that can see the traffic between those two can use that as the fingerprint.

To expand on this, if TAILS fetched all of its apt sources (wheezy, backports, testing, etc) via HTTPS from the same mirror (e.g., then the exit node and network observer could not really distinguish the distro the machine making the connection was running since hosts many distro mirrors. There are two key parts here: using HTTPS to encrypt the data, and using HTTP pipelining so that network connections are reused for multiple downloads, rather than the default behavior of making a new HTTPS for each individual download. This setup would also prevent the custom pattern of apt sources from being distinguished since it would just show as downloading some series of files, and those files could be packages, package metadata, perl modules, source tarballs, etc.