Do you feel like partisanship is running amok? It’s not your imagination. As an example, the modern State of the Union has become hyperpartisan, and topic modeling quantifies that effect.
Topic modeling finds broad topics that occur in a body of text. Those topics are characterized by key terms that have some relationship to each other. Here are the four dominant topic groups found in State of the Union addresses since 1945.How would you characterize the topic of each group? Their fluctuation over time rings true in light of the events that dominated each presidency.
It turns out that their the fluctuation isn't just a factor of current events. It illustrates the influence of television on the SOTU and the rise of modern partisanship. Can you guess how?
The topics illustrate three distinctive eras of the State of the Union address. The blue topic reigns prior to1960 and is filled with legislative and fiscal terms, indicative of who was the primary audience before television. The green topic shows a change in tone from legislative to cultural. The rise of television meant that the primary audience of the speech was no longer Congress, with the people getting a review the next day in the newspaper. With TV more prevalent, the SOTU went directly to the people. In line with this hypothesis, the red and purple topics - indicative of modern popular political culture - start together and turn hyper-partisan from the Clinton years to today - red and purple flip-flop drastically as the president's party changes.
The blue topic consists of legislative terms like “expenditure”, “fiscal year”, and “recommend.” This topic dominates the late 40s through most of the 50s and declines as television ownership seems to hit a critical mass. The first televised State of the Union was in 1965, as TV reaches ~80% of US households. It's probably not coincidental that this is the year that the event is moved from daytime to the evening. What formerly was a daytime presidential speech primarily to members of Congress, in 1965 it was a prime time television event available to millions of people. The tone visibly shifts -to issues of the culture of the day - the green line - highlighted by terms like “space”, “soviet union”, “Vietnam”, and “missile”. (It's fascinating how humans change technology and then technology changes us.)
President Clinton’s first State of the Union is in stark contrast with the gradual changes in the decades leading up to 1993. Throughout the Carter, Bush Sr., and Reagan years, modern cultural topics - the red and purple lines - trend upward together. That trend changes sharply in 1993 with Bill Clinton’s first address and stays much the same throughout his presidency, with terms such as “college”, “parent”, “laughter” and, ironically, “bipartisan” dominating the topic. The topics flip immediately in 2001 with Bush Jr.’s first speech. The tone shifts heavily to terms like “terrorist/terror”, “Iraq”, “oil”, “violence”, and “fighting” (the purple line). The trend is at its starkest for his speech in 2001, which was delivered in January of that year, 8 months before the 9/11 attacks. Then the topics immediately revert back to a Clinton-esque pattern in 2009 for Obama’s first State of the Union, where it has remained largely unchanged.
The amplified nature of modern partisanship is jarring, compared to the decades prior. Without annotations to the year and presidency it would be difficult to locate exactly when Johnson left and Nixon started, despite being on opposite ends of the political spectrum during the height of the Vietnam War. It is equally difficult to see the difference in Carter and Reagan, the transition is smooth and largely unchanged. The eye can immediately locate the hyper-distinctions of Clinton to Bush to Obama. Why?
The graphics above represent a Latent Dirichlet Allocation (LDA) topic model for four topics built for all Presidential State of the Union addresses since 1945. The LDA model allows 1- to 3-word phrases to be considered as terms and only considered terms that appear in more than 5% and less than 60% of the speeches, as the idea is to find the terms that distinguish one speech from another.
For more on text clustering, check out my prior post using R to do some different text analysis of the same data set. The data are available here and here.