B3RN3D

Let your plans be dark and impenetrable as night, and when you move, fall like a thunderbolt.

Evading Stylometry Attacks Using VIM Spellcheck

We’ve discussed the concerns of stylometric attacks and how good adversaries are at it today. There are only a few tools that attempt to anonymize your writing style and none so far can do it inline. But there’s help.

If you are concerned about these types of attacks you can start customizing your favorite writing application (VIM, Kate, etc) to leverage syntax highlighting as a way of pointing out when you’re stylometrically leaking information.

For example, when I write, I often say “for example” and while this is a common phrase, it can be used to begin to help fingerprinting your writing style across multiple sources of text. In VIM, you can use custom spell check to highlight certain syntax or just use the spell checker to highlight the fact that you’re using buzz phrases like “the fact that”.

VIM

Here’s a way to add your favorite words to the VIM spell check list as a bad word. Whenever you use that word, VIM will prompt you in red that it’s been misspelled. By definition, everyone’s common words are different so you’ll need to review some of your recent writings' grammar, as well as help you find which words to add.

  • Enable spell check in your .vimrc file by adding the following line:
1
set spell spelllang=en_gb
  • Add “bad words” in VIM by putting the cursor over a word and sending “zW”. This will add it to VIM’s internal bad word list.

Now whenever you type that word, VIM will automatically red-flag it to remind you that you may not want to use it.

Grammar

This is a poor mans way of defending against stylometric attacks and I would recommend attempting to use tools like Anonymouth to help you anonymize your writings, but sometimes you just need a reminder that you use a phrase too often.

UPDATE

If you want to at least do a frequency analysis of the words in your text, it’s pretty easy to do. The following is an example in python. If you run this and provide an argument of your text file, it will give you a break down of your most common words, hopefully to help you determine what words you could be attributed to using most often.

stemmer.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/usr/bin/python
# Author: b3rn3d
# Based on: http://www.nltk.org/api/nltk.stem.html
# Usage: ./stemmer.py filename.txt

import re
import sys
try:
  from Stemmer import Stemmer as SS
except:
  print "Stemmer/Snowball not installed. Please install it via apt-get install python-stemmer"
  sys.exit()

if not len(sys.argv) == 2:
  print "FAIL: Supply test file to analyze."
  sys.exit()

f = open(sys.argv[1])
text = "".join(f.readlines())

#extract words
words = [word.lower() for word in re.findall(r'\w+',text)]

stemmer = SS('english')
counts = dict()

#count stems and extract shortest words possible
for word in words:
    stem = stemmer.stemWord(word)
    if stem in counts:
        shortest,count = counts[stem]
        if len(word) < len(shortest):
            shortest = word
        counts[stem] = (shortest,count+1)
    else:
        counts[stem]=(word,1)

#convert {key: (word, count)} to [(word, count, key)] for convenient sort and print
output = [wordcount + (root,) for root,wordcount in counts.items()]
#trick to sort output by count (descending) & word (alphabetically)
output.sort(key=lambda x: (-x[1],x[0]))
for item in output:
    print '%s:%d (Root: %s)' % item