13 min read

Distinctly Donald

Github repo for partial reproduction of the analysis in this post

Almost 63 million Americans voted for Donald Trump in November 2016, puzzling many who couldn’t see his appeal. After Trump’s upset win, reporters, academics and liberals in general scrambled to understand the Trump voter, embarking on journeys of cultural tourism through the Rust Belt and poring over demographic data. I seek to shed a different kind of light on the values and preoccupations of Trump supporters. I analyze more than 430,000 comments on the social media platform Reddit from Trump supporters and “mainstream Reddit,” identifying the words and phrases that distinguish Trump’s most ardent online supporters. I want to be clear up front that I’m not testing any particular hypotheses. At this point I want questions, not answers. But before I begin, let’s turn to previous attempts to make sense of Trump’s popularity.

Even before Trump won the Republican nomination, two dominant narratives were already emerging within the liberal zeitgeist. The first is the economic anxiety theory. This theory begins with the assumption that Trump voters are largely blue-collar workers who have been hit hard by free trade, outsourcing, and automation. In Trump’s plans to reinvigorate US infrastructure, to encourage private domestic investment, to get tough on China, they heard the promise of economic security, and they rallied around him.

I call the other popular narrative on the left the bigotry theory: Trump voters are just racist and sexist. Trump voters, the theory goes, were still fuming that their country would put a black man in the White House. To see that black man replaced by a woman would be too much too bear. Enter Trump, a tough-talking white man who kicked off his campaign by calling Mexican immigrants rapists, who issued a non-apology about “locker room talk,” and who has a decades-long history of off-color comments about women and minorities. Trump supporters, on this view, saw these characteristics as features, not bugs.

Exit polling confirms that Trump voters tended to be white, male and less educated, but not that they are especially disadvantaged economically. And while white supremacists might form a subset of Trump’s base, I’m not aware of any work establishing that the typical Trump supporter is motivated primarily by racial animus. To the contrary, Ashley Jardina argues in White Identity Politics that we should expect white racial solidarity to have significant political consequences more through in-group favoritism than through out-group animus. High white racial identification, Jardina finds, predicts support for culturally “white-coded” policies (e.g. medicare and social security) but does not predict opposition to policies perceived as disproportionately benefiting other groups (e.g. affirmative action and medicaid). If racial attitudes comprise a significant part of Trump supporters’ story, then “they just hate minorities” may be no more than a subplot.

Though demographic data and survey responses are revealing, I propose a different way to see what makes Trump supporters tick: listen to them—a lot of them, all at once. To this end I analyze 434,717 social media comments to find the repeated words and phrases that distinguish Trump supporters from other social media users. I use Reddit to conduct my analysis for a number of reasons. Reddit accounts are pseudonymous, granting users freedom to express themselves more honestly, but still allowing me to collate the comment histories of users across many different threads and forums. Reddit is also organized into communities with shared interests, allowing me to identify the most enthusiastic Trump supporters. One community (or “subreddit”) in particular is the obvious choice to locate die-hard Trump supporters: r/the_donald. r/the_donald has something of a reputation even outside of Reddit. Before the 2016 election, Donald Trump himself did an AMA (“ask me anything”) on r/the_donald, lending legitimacy to the subreddit. But it has also been described as “Reddit’s hate-mongering shadow”, a “home to copious Islamophobia”, and “a melting-pot of frustration and hate.”

My aim here is not to pass judgment. I want to know what motivates and interests r/the_donald users as a group and by extension maybe shed some light on Trump’s base writ large. I also don’t want to provide a mere summary of what goes on inside r/the_donald. Many users on Reddit are active within a number of subreddits, and r/the_donald users are no different, so my analysis relies only on comments r/the_donald users have posted in subreddits other than r/the_donald. I hope to find patterns in their interests that, perhaps, have nothing to do with Donald Trump or politics.

What I did

I scrape (up to) the most recent 400 comments of 1,091 r/the_donald users, selected because they were active in r/the_donald on 8/24/2019 and 8/29/2019. I eliminate users who have posted fewer than 200 comments in public subreddits other than r/the_donald, leaving me with 548 users.

After removing a number of common words (articles, prepositions, etc., called stop words), I find that the most commonly used word among r/the_donald users is “people.”

Not very enlightening.

To see how Trump supporters differ from others, I need to find words and phrases that they use at much higher rates relative to other Reddit users. So, I also scrape the comment history of 2,366 Reddit users who posted in threads at the top of r/all (the closest I can find to “mainstream Reddit”) on 8/24/2019 and 8/29/2019. I again eliminate users with fewer than 200 public comments, leaving me with 1,853 users. I then draw a random sample of 599 users from the 1,853 who remain because my laptop is ancient and terrible.

Now I can find words and phrases that are most distinctive of r/the_donald users. Using the ratio of word frequencies, however, is somewhat misleading. By this measure, “popbob” (I had to look it up; don’t bother) lands squarely in r/the_donald’s top 10 most distinctive words. On closer examination, this is because two r/the_donald users have written comments containing “popbob” 199 times, compared with almost no uses among the comparison group. Because I’m interested in widespread patterns across the r/the_donald user base—not the obscure idiosyncracies of a tiny subset—I need a usage measure that accounts not only for how many times a word is used, but also how many people use that word. So I calculate—separately for r/the_donald and the comparison group—a “usage score” for each word:

Crude? Yes. But this metric has some desirable properties. The usage score weights a word more heavily if it is used many times by many users than if it is used the same number of times by a small subset of users. Adding one to each numerator generates non-zero (but still very small) usage scores even when a word wasn’t ever used by the comparison group (and vice versa), allowing me to take ratios of usage scores. To further ensure my results aren’t driven by a minority of narrowly focused users, I also drop words and phrases used by less than 4% of users in the r/the_donald sample.

I calculate usage scores for individual words and two- and three-word phrases (known as bigrams and trigrams). Those words and phrases with the highest usage score ratios, taken as r/the_donald’s usage score divided by the comparison group’s usage score, give us a picture of r/the_donald’s most distinctive language. To maximize information, I require that bigrams contain no stop words and that trigrams contain at most one stop word. I also remove links and Unicode characters.

Results

The top ten most distinctive words, bigrams and trigrams are below. You can find the top 200 for each in this post’s Github repo.

Word Bigram Trigram
podesta illegal aliens trump derangement syndrome
derangement clown world the clinton foundation
jussie russian collusion red flag laws
smollett flag laws the clinton campaign
fisa democrat party evidence of collusion
globalist clinton foundation colluded with russia
tranny trump 2020 for free speech
tds seth rich alexandria ocasio cortez
dossier trump russia trump colluded with
colluded tim pool the russian collusion

To gain additional insight, I also find the average sentiment polarity of each comment in which a word or phrase appears, using the Jockers-Rinkers lexicon. If antifa, for example, is mainly mentioned in comments that also include more negatively coded words like terrorists or bad, then it will receive a more negative sentiment score. These sentiment scores need to be interpreted cautiously. A phrase’s negative sentiment score doesn’t imply that r/the_donald users dislike the subject of that phrase. All we can say is that the phrase tends to appear alongside negative language.

Below is a chatterplot of 100 of r/the_donald’s most distinctive words, bigrams and trigrams. Phrases farther to the right (and lighter in color) are more likely to appear alongside positive language in comments. Vertically higher phrases have higher usage scores among r/the_donald users. Phrases in larger font are the most distinctive of r/the_donald; that is, they have the highest usage score ratios, where usage score ratio is r/the_donald’s usage score divided by the comparison’s usage score (note that every word plotted has a usage ratio of at least 50). Click the plot below to enlarge.

What does this plot actually tell us?

We can immediately see broad patterns emerge. r/the_donald users are overwhelmingly concerned with immigration, the perceived sins and excesses of the left, accusations of collusion with Russia, and media bias. Honorable mentions include Trump (of course), Islam, guns, and a number of politicians or politically opinionated public figures. Interestingly, r/the_donald users don’t use the word Trump much more often than other Reddit users (about 3x as often), but they are far more likely to include his title, President Trump, when referring to him (more than 10x).

Aside: What the hell is “clown world”? Taken at face value, clown world and its cousin honk honk are part of an absurdist alt-right meme denoting that the world has gone crazy with liberalism. On another reading, however, honk honk is a stand-in for “Heil Hitler” that grants its users plausible deniability for their white supremacist views, and clown world posits the anti-Semitic trope that the world is controlled by a cabal of Jews. Whatever the case, both terms appear to function as in-group shibboleths within right-wing internet spaces.

While some terms are clearly related to the economy (“record low unemployment”, “hundreds of billions”), economic concerns don’t seem to dominate. Notably absent is any mention of trade. Race-related terms also comprise only a small part of r/the_donald’s distinctive lexicon, and one stands out in particular. The phrase “hate white people” may reveal that a significant subset of r/the_donald users feel racially aggrieved—not necessarily because of out-group animus, but because they feel white people are treated unfairly. On the other hand, we do see some evidence that r/the_donald users choose offensive terms when referring to the LGBT community, with slurs for both trans people and gay people making the top 100 most distinctive terms.

In other words, I don’t find new evidence for either of the common explanations for Trump support—the economic anxiety and bigotry theories. Rather, this community seems bound together by a deep opposition to gun control and illegal immigration along the southern border, and an abiding joy in dunking on the politicians and culture of the left. Note, for instance that social justice warriors and Clinton foundation tend to be mentioned in positive contexts despite making reference to people and things Trump supporters are not likely to feel positively about.

While some terms are clearly related to the economy (record low unemployment and hundreds of billions), economic concerns don’t seem to dominate. Notably absent is any mention of trade, the national debt, or the deficit. Race-related terms also comprise only a small part of r/the_donald’s distinctive lexicon, and one stands out in particular. The phrase hate white people may reveal that a significant subset of r/the_donald users feel racially aggrieved—not necessarily because of out-group animus, but because they feel white people are treated unfairly, consistent with Jardina’s thesis in White Identity Politics. On the other hand, we do see some evidence that r/the_donald users choose offensive terms when referring to the LGBT community, with slurs for both trans people and gay people making the top 100 most distinctive terms. Also note 2 genders towards the bottom of the plot.

In other words, I don’t find overwhelming evidence for either of the common explanations for Trump support—the economic anxiety and bigotry theories. Rather, this community seems bound together by a deep opposition to gun control and illegal immigration along the southern border, and an abiding joy in dunking on the politicians and culture of the left. Note, for instance that social justice warriors and Clinton foundation tend to be mentioned in positive contexts despite making reference to people and things Trump supporters are not likely to feel positively about. We also see some phrases that indicate a degree of giddiness in owning the libs: left can’t meme, thanks for proving, and trump wins. Quite high on the graph is also the phrase orange man bad, a succinct accusation that liberals’ political views are reducible to an unexamined hatred for Donald Trump.

I hesitate to draw firm conclusions from this analysis. For one thing, r/the_donald users are probably not a representative subset of all Trump supporters. Reddit users are younger, whiter and more educated than the US as a whole, and this probably holds true for r/the_donald users as well. I’ve found some evidence for r/the_donald’s fixation on the left, but this is unsurprising given the heightened polarization in US politics. However, my results do suggest some possibilities for future research. Much has been made of racial attitudes as a driver of support for Trump, but maybe we’ve overlooked rigid attitudes toward sexual orientation and gender expression as an important motivator for the belief that (to borrow a phrase) orange man good.

Sources and code I adapted

Scraping Reddit: dmarx’s github

Text analysis: Text mining with R by Julia Silge and David Robinson,

Chatterplots: Daniel McNichols’ Toward Data Science blog