Neural Network Prediction of Censorable Language
Internet censorship imposes restrictions on what information can be publicized or viewed on the Internet. According to Freedom House{'}s annual Freedom on the Net report, more than half the world{'}s Internet users now live in a place where the Internet is censored or restricted. China has built the world{'}s most extensive and sophisticated online censorship system. In this paper, we describe a new corpus of censored and uncensored social media tweets from a Chinese microblogging website, Sina Weibo, collected by tracking posts that mention {`}sensitive{'} topics or authored by {`}sensitive{'} users. We use this corpus to build a neural network classifier to predict censorship. Our model performs with a 88.50{\%} accuracy using only linguistic features. We discuss these features in detail and hypothesize that they could potentially be used for censorship circumvention.
PDF Abstract