Congressional Quarterly, the news service covering Congress, asked Garth Sundem, author of Geek Logic: 50 Foolproof Equations for Everyday Life, to write an algorithm for predicting the outcome of individual House and Senate races this Tuesday.
The resulting algorithm (click here for a detailed explanation), seen to the right, brings together a variety of factors, many of which influence researchers will recognize. Voting patters in the district in the 2004 election, the number of Google hits for the candidates' names, the number of Republican candidates whose names appear in articles with "Bush," the number of Google hits with the candidates name and the word "apologize," the candidate's incumbency, the amount of money in millions of dollars the candidate accepted from criminal lobbyist Jack Abramoff, and the expected weather on election day all factor into the calculation.
If V is greater than 1, the Democrat is expected to win; a V of less than 1 suggests the Republican will win.
The algorithm is subject to the pitfalls of language analysis, ignoring the tone of comments about President Bush or the word "apologize" in texts while emphasizing word occurence by volume over the sources of those statements. Consider the difference in the number of blog postings by supporters who write a candidate "should not have to apologize" about something being counted as negative despite their favorable stance toward the candidate.
It also treats recent elections as the main determinants of voting this year. But in a close election in a country seeing such distinct political divisions as the United States, it would also be important to look back at the district's historic tendency to change sides in non-presidential years. Longitudinal analysis is almost completely missing here.
Nevertheless, this is a nifty demonstration of how Web chatter and other data can be combined to perform novel and useful projections of current events.
But even a thing of simple elegance like this algorithm can produce unexpected results. In at least one case, the Nebraska Senate race, the algorithm suggests a candidate with a wide lead will lose. We'll see how things have turned out by Wednesday.


Comments (2)
Thanks for the intelligent commentary on my recent CQ election predictor equation! You're absolutely right that I looked for ways to simplify the semantics of search engines��what I came up with is to compare candidates as a ratio, with the assumption that extraneous Google hits for one candidate will generally balance those of the other (the same is true of hits for "candidate apologize" that are actually positive��statistically these generally balance). The logitudinal analysis comment is valid; I should have averaged dem vs rep voting records over the last X number of years, with emphasis given to recent trends (the desirability of simplicity beat out precision in this case). And frickin' Nebraska! I couldn't figure out why, in math, a Democrat should win in such a red state! I looked for scandals, etc. and got stumped. Maybe I should have included something that recognized Ben Nelson as a Republican despite his Democratic tag...Other than Nebraska, though, the equation did quite well, including correctly predicting tight races in Montana, New Jersey, and Virginia.
Thanks again, and I'm happy to see people are keeping an intelligent eye on mathematicians who shoot their (our) mouths off!
Posted by Garth Sundem | November 29, 2006 5:45 PM
Posted on November 29, 2006 17:45
Thanks, Garth. Maybe we can do something together to solve the challenges you wrestled with in the last election. Let's talk about making a workspace available for you. Give me a ring at 253.468.2125 or drop me a line.
Mitch
Posted by Mitch Ratcliffe | November 29, 2006 7:38 PM
Posted on November 29, 2006 19:38