B3RN3D

Let your plans be dark and impenetrable as night, and when you move, fall like a thunderbolt.

XKeyScore Source Code Review

The recent revelations from Jake Applebaum, et all today, highlight some not-to-be-understated revelations.

  • Tor users are directly targeted
  • TAILS users are directly targeted
  • People searching for privacy tools are targeted

While the conclusion by some may have been “well of course the NSA is doing that” the revelation and the insight it provides is direct, specific evidence that people worried about their privacy are being attacked. More-over, this provides further evidence that the NSA’s goal is to collect it all.

The Revelations

To summarize, the source code to the NSA program, XKeyscore (known since the early days of the Snowden disclosures) has been leaked. It sounds like the program’s entire source code is in the hands of another party and it shows what the code does, how it works and who it’s targeting. We now know that privacy-conscious bystanders are targeted: Tor users, people that visit the Tor website, people that use TAILS, or try to view hidden services. Each of these are considered suspicious activities and flagged as the IP’s of suspects by NSA’s network monitoring machine. This is further a reminder that if you look like you are privacy conscious, you are going to be a targeted as an “extremist” in the eyes of the United States and it’s allies.

Tracking Bridge Users

XKeyscore is tracking the IP addresses that are sending emails to the Tor bridge automated account. When a user is in a country that blocks Tor, they have an option to use an unlisted Tor entry node called bridges. To get an unlisted bridge IP, one of the options is to email a Tor Project email address which auto-replies with an address of a bridge. The Five Eyes have been documenting each IP that makes a request to that email address.

Tracking Tor Directory Authorities

Another facet disclosed was that the NSA are targeting a specific Tor directory server run by Sebastian Hahn. I believe that this is the case only because the source of the XKeycode leak was by a node in Germany. Looking at nodes in other countries, would point to a corresponding directory authority in that region.

Until relatively recently, the Tor Network consisted of only 9 directory servers of which all clients would first make a connection to prior to joining the Tor network. These 9 directory servers are still in place, but an additional feature lets Tor nodes act as a directory server cache. With this feature, you weren’t automatically required to connect to one of the directory authorities during each bootup. This helps mitigate this risk.

Tracking Tor Entry Nodes

Even if your connection to the directory authorities were not caught by the program, your connections to the Tor entry nodes were. So while directory authorities were only used during boot, the connections to Tor entry nodes were used repeatedly as your client will build a circuit.

There’s not much you can do to defend against this one. Using a bridge would ensure that XKeyScore won’t know which IP’s to track, but the requests for bridges are caught as well. One may consider running their own unlisted Tor entry node, which is possible, but it severely degrades your anonymity. Users concerned with this may consider using a VPN service and then connecting over Tor. This would not fix it, but it would make it more difficult to identify the originating request to connect to Tor.

Tracking Torproject.org Visits

One of the more useless network iterations that are logged is that of users visiting www.torproject.org. The document shows what they are calling “microplugins” that highlight specific pieces of information that are caught in transit. Your visit to the Tor Project’s website logged and you are now flagged as suspicious.

XKeyScore Code

The most interesting part is the code released showing how XKeyScore works. Many have already highlighted that the NSA programs are merely malicious implementations of existing technology (as opposed to custom software built from the ground up). We can see that XKeyScore’s database uses a MapReduce model. One very common with NoSQL databases like Hadoop. This is hinted in the “mapper” and “reducer” functions that searches for onion addresses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
mapper<onion_t>: 
      static const std::string prefix = "anonymizer/tor/hiddenservice/address/";

      onion_t onion;
      size_t matches = cur_args()->matches.size();
      for (size_t pos=0; pos < matches; ++pos) \{
        const std::string &value = match(pos);
        if (value.size() == 16)
          onion.set_address(value);
        else if(!onion.has_scheme())
          onion.set_scheme(value);
        else
          onion.set_port(value);
      \}

      if (!onion.has_address())
        return false;

      MAPPER.map(onion.address(), onion);
      xks::fire_fingerprint(prefix + onion.address());
      return true;
  
    reducer<onion_t>:
      for (values_t::const_iterator iter = VALUES.begin();
          iter != VALUES.end();
          ++iter) {
        DB["tor_onion_survey"]["onion_address"] = iter->address() + ".onion";
        if (iter->has_scheme())
          DB["tor_onion_survey"]["onion_scheme"] = iter->scheme();
        if (iter->has_port())
          DB["tor_onion_survey"]["onion_port"] = iter->port();
        DB["tor_onion_survey"]["onion_count"] = boost::lexical_cast<std::string>(TOTAL_VALUE_COUNT);
        DB.apply();
        DB.clear();
      }
      return true;

Full Code

Below is the full code released: xkeyscorerules100.txt