Saturday, March 21, 2015

Complexity is the bugdoor's friend

Backdoors are a fashionable topic these days, ever since the BULLRUN program was uncovered by the Snowden leaks. Bruce Schneier and others recently wrote a survey on the topic, which covers much of what is known about backdoors, available here. You should also check the Underhanded Crypto Contest out, which has ended a couple of weeks ago.

Backdoors come in all shapes and sizes: some can be mathematical in naturelike the Dual EC generatorwhile others can be simply coding mistakes in a key routine in an application, also known as “bugdoors”. They may be used to leak (parts of) secret keys, derandomize random number generators, bypass authentication measures, or simply leak plaintext. Today I will show an example of the latter.

There is a now well-known method to introduce a “bugdoor” in RC4 implementations:

#define TOBYTE(x) (x) & 255
#define SWAP(x,y) do { x^=y; y^=x; x^=y; } while (0)

static unsigned char A[256];
static int i=0, j=0;

void init(char *passphrase) {
  int passlen = strlen(passphrase);
  for (i=0; i<256; i++)
    A[i] = i;
  for (i=0; i<256; i++) {
    j = TOBYTE(j + A[TOBYTE(i)] + passphrase[j % passlen]);
    SWAP(A[TOBYTE(i)], A[j]);
  }
  i = 0; j = 0;
}

unsigned char encrypt_one_byte(unsigned char c) {
    int k;
    i = TOBYTE(i+1);
    j = TOBYTE(j + A[i]);
    SWAP(A[i], A[j]);
    k = TOBYTE(A[i] + A[j]);
    return c ^ A[k];
}


This method relies on the clever (and old!) trick of swapping registers without using a temporary variable by using XOR (or, in general, addition and subtraction on any commutative group). When i == j, however, this trick will instead zero the state byte. Eventually the entire state will be zero, and the result of RC4 encryption will be the unaltered plaintext!

I present now a twist on this trick, this time in C++. Here it is:

#include <cstddef>
#include <cstdint>

#include <tuple>
#include <utility>

namespace security {
  namespace util {
    // type safety is great
    template <typename T> struct wrap {
      T x;
      wrap(const T &x = T()) : x{x} {}
      operator T &() { return x; }
    };

    using  uint8 = wrap< std::uint8_t>;
    using uint16 = wrap<std::uint16_t>;
    using uint32 = wrap<std::uint32_t>;
    using uint64 = wrap<std::uint64_t>;
  }

  namespace rc4 {
    template <typename uint8 = util::uint8>
    inline void stream(unsigned char * s, std::size_t slen,
                       const unsigned char * k, std::size_t klen) {
      using std::swap;
      uint8 S[256];

      for (int i = 0; i < 256; ++i) {
        S[i] = i;
      }

      for (int i = 0, j = 0; i < 256; ++i) {
        j = (j + S[i] + k[i % klen]) & 255;
        swap(S[i], S[j]);
      }

      for (uint8 i = 0, j = 0; slen--;) {
        i += 1;
        j += S[i];
        swap(S[i], S[j]);
        *s++ = S[(S[i] + S[j]) & 255];
      }
    }
  }
}

// elsewhere
namespace security {
  namespace util {
    template <template <typename...> class U, typename... V>
    void swap(U<V...>& x, U<V...>& y) {
      // Option #1
      // decltype(auto) t = x; x = y; y = t;
      // Option #2
      using std::tie;
      tie(x, y) = tie(y, x);
    }
  }
}

This one relies on a combo of C++ idiosyncrasies. Starting with the swap, I present two similar ways to subvert it:
  • Option #1 uses the C++14 decltype(auto) keyword, which is essentially the auto keyword with decltype semantics. What this means is that the type of t will be wrap<uint8_t> &, instead of the expected wrap<uint8_t>. The result of the swap will then be (y, y) instead of (y, x).
  • Option #2 uses std::tie instead. The idea is the same here: tie(x, y) creates an std::tuple<wrap<uint8_t>&, wrap<uint8_t>&>, that is, a tuple of references, and thus the assignment will necessarily overwrite one of the registers.
In either case, the result will be that the RC4 internal state will now have 2 equal values, instead of 2 swapped values. This will ensure that eventually we will have an output of always (or almost always) the same value. This is somewhat superior to Wagner and Biondi's method, in that simply by looking at the ciphertext it is not as easy to detect that something has gone horribly wrong.

But this is not enough; rc4::stream explicitly imports std::swap, so this (correct) version of the swap should be used instead. To get around this, we use C++’s argument-dependent lookup (ADL). ADL basically says that, when given an unqualified (i.e., without the explicit std::) function call, C++ will also look for the function in the argument's namespaces. Therefore, the call swap(S[i], S[j]) will find both std::swap and security::util::swap, since wrap<uint8_t> also lives in security::util. The wrap<T> type may be justifiable as a type employed for added security checks, e.g., to detect integer overflow/range or to ensure two’s complement semantics. In this example it is a very simple barebones wrapper around T.

However, if we place a generic swap(T&, T&) swap function in security::util, we will get a compiler error, since the call is now ambiguous according to C++ rules. Since swap(wrap<uint8_t> &, wrap<uint8_t>&) would be too obvious, I went with swap(U<V>&, U<V>&). C++ orders overloadable functions by specialization, and since U<V>& is “more specialized” than T&, U<V>& gets priority and solves the ambiguity.

This is not yet enough. Since we defined swap after rc4::stream, std::swap would still be picked, since the name lookup process would not look at the later definition. To work around this, we make rc4::stream a template function, which forces the lookup of swap to occur at instantiation time instead of when the function is defined.

What do we learn from all of this? Well, certainly that C++ is a complex language, but I doubt that was ever in question. C++11 is in many ways a nicer language than its predecessor, but it is not a simpler one. And in complexity it is easier to hide flawsit would certainly take me much longer to spot the flaw in this version than in Wagner and Biondi’s.

Many of our current woes with cryptography-related implementations, such as Heartbleed, the Triple Handshake Attack, and OpenSSL’s regular CVEs, can be (partially) traced back to complexity; it is therefore imperative to look at such code as not only a blackbox, but also a source of attack surface which can lead to privacy loss. On our audits, unnecessary complexity and over-engineering are often things that we flag, despite not being flaws in and of themselves, because they often enable subtle security issues to creep in unannounced.