It’s too late to say that we need to prevent the surveillance machine from being created. The machine is alive and deployed and we should instead work on throwing a wrenches into its gears. It’s not the machines the novels warned us about, it’s machine learning. Many of the indefensible exploits that plague us are thanks to machine learning and classification models.
Example: Tor Website Traffic Fingerprinting
Tor’s anonymity promises fall over when when an attacker is able to intercept the traffic between a client and a guard and run the traffic features through a machine learning algorithm. This lets an attacker predict the websites you’re visiting with a 95% accuracy. Machine learning is the only way to do this. Many of the pluggable transports including Meek have succumb to fingerprinting via deep packet inspection facilitated by machine learning.
Example: Computer Vision with Surveillance Cameras
A recent article talked about how China’s video surveillance system was able to take a picture of a journalist, and find them within 7 minutes flat. That’s not out of sheer human effort they can invest, it’s their computer vision systems and how they’ve become advanced thanks to machine learning algorithms.
We know that stylometry is a real threat to anonymity. If you’re writing blog posts or writing code, the way you type and write leaks information about you. Tools like Anonymouth aim to evade them but it’s an arms race.
We need to understand the threats of machine learning so that we can figure out how to break them.
The Wrench: Adversarial Examples
Machine learning algorithms are built on models that are created by training the system. In the case of classification models, input is supplied into the system and the result is a prediction of how to classify the input. You provide a picture of a panda and the classifier should show that there’s a 95% chance that it’s a picture of a panda. What does it do when you show it a picture of a panda wearing a hat?
This is where the idea of adversarial examples come in to play. The idea is simple, modify the input into the machine learning system that causes it to come to a different conclusion. A good example would be to make slight modifications to an existing image so that when a machine learning algorithm goes up against it, they don’t know what it is or even better, think that it’s something else.
There are two goals for adversarial examples in the real world:
- Modify the input to trick the machines
- Modify the input to trick humans
Doing one and not the other makes it a useless in real world examples. And this is where I’m going.
To exploit the system, we must first know where it’s vulnerable.
Classification Model Exploits: Have you ever noticed how video surveillance companies aren’t big proponents of open-source software or open access programs? They all know that machine learning has a major vulnerability which is this: if you know the models you can bypasses them. In other words, if you were able to get access to the machine’s models, you could use machine learning to reverse engineer and identify flaws in these models.
Training Poisoning: What if you could get upstream on the system and affect the training data? There’s the idea of training poisoning attacks where we could influence the data that the system used to train against. Maybe I could poison the system in such a way that it is never able to classify my face or the face of my friends.
Missing Research for Real World Applications
We can’t fight what we don’t know about and there are open research questions that need answering.
How can we fingerprint surveillance systems?
You can nmap a host to see what software it’s running but how do you know what surveillance system a business or government is using? At this point, I don’t know if there is any direct way. Do you? Cameras are dumb machines and processing is done on the backend.
How can we inject into video cameras?
Lots of research is coming out about doing input manipulation on images but none that’s doing it while you’re being recorded. There are things that can block a camera but that isn’t spoofing anything.
What attacks are possible in the real world?
Can you really come up with something that’s better than simply wearing a mask? Can your defense be a new way of performing targeted surveillance? If a computer can figure out tiny differences in a human face, imagine how easy it would be to find someone wearing a Guy Fawkes mask.
You can read books like Homeland that talks about ways of bypassing gate-tracking systems by putting a tack in your shoes. Or read about the artist that is building masks that make it difficult to determine who you are.
But the point is this: Stop finding ways to hack into websites like the other million script kiddies. Start developing skills to learn about machine learning from an adversarial perspective. We are still in the early phases of machine learning affecting our lives and now is a good time to teach the world about the weaknesses. Get started today.