Update [June 24, 2009]: I have compiled a more detailed and thorough analysis on the following subject. I highly recommend reading the new article.
The Iranian people held their presidential election this Friday, and despite the Mainstream Media's abysmal coverage of the event, the internet community seems to be thoroughly invested in the outcome. The consensus seems to believe that the election was rigged for the incumbent, Ahmadinejad; a sentiment I have essentially accepted. Juan Cole, a prominent Middle East expert and History Professor at the University of Michigan, has compiled a fairly persuasive list of anomalies. Despite Dr. Cole's logical analysis, the internet seems to be enthralled with the following graph, for all the wrong reasons:
For the original context, we must first reference Andrew Sullivan's original post at The Atlantic:
[Above image in English]
Yes, this obviously was a "divine assessment". They didn't even attempt to disguise the fraud. Which, to me, tells me they panicked. This graph is a red flag to Iran and the world.
Source: Andrew Sullivan at TheAtlantic.com
Unfortunately, Andrew provided absolutely no evidence to support the proposed correlation between election fraud and the aforementioned image, but he did provide a link to the apparent authors, TehranBureau, which is good journalism. I really know nothing about TehranBureau, other than their complete inability to identify correlation and causation:
The vertical axis (y) shows Mr. Mousavi's votes, and the horizontal (x) the President's [Ahmadinejad]. R^2 shows the correlation coefficient: the closer it is to 1.0, the more perfect is the fit, and it is 0.9995, as close to 1.0 as possible for any type of data.
Statistically and mathematically, it is impossible to maintain such perfect linear relations between the votes of any two candidates in any election -- and at all stages of vote counting. This is particularly true about Iran, a large country with a variety of ethnic groups who usually vote for a candidate who is ethnically one of their own.
Source: Faulty Election Data via TehranBureau.com
While TehranBureau has correctly identified a correlation and the associated math surrounding the correlation, they have entirely failed to identify the cause; it seems self evident, but its not that simple. Let's first address their claim of impossibility by looking at data from the 2008 US Elections.
As a corollary: On November 4th, I collected each state's reported vote total about 400 times an hour from MSNBC.com. MSNBC was used as the source because it was the only website that presented the election results as pure HTML; CNN, CBS, et al. used an asynchronous reporting scheme that prohibited the automated retrieval of their reported election results. Using some of this data, I will prove that a linear trend is the expected outcome.
Let's first begin by analyzing Kentucky, one of the first reporting states. The below graph illustrates the number of votes for each presidential candidate with respect to the time at which they were recorded. You'll notice that the graph looks decisively non-linear:
I think this is the graph most people expected to see. However, the graph presented by TehranBureau and later Andrew Sullivan, features decisively different axes. I'll now present the Kentucky version of the TehranBureau map:
The Kentucky graph directly above is virtually identical to the Iran graph of internet lore. I would also argue that Kentucky represents an acceptable microcosm of "ethnic groups," and an identical analysis of Virginia's data seems to confirm this assertion. First the votes vs. time graph from Virginia:
Now the silver bullet:
The empirical evidence cannot be argued; I could go through and create similar graphs for every state and the linear relationship with a very high R^2 would hold. For the record, I did analyze data from California, Minnesota, Vermont and West Virginia to verify this result. I also used a great many more data points for the linear regression than the six points used on the original graph; this simply serves to illustrate that essentially any six points could be selected along the entire time frame and the linear relationship would remain valid. The greater number of points has a negligible effect on the R^2 coefficient.
The bottom line is this, a linear relationship between two candidates' vote totals is the expected correlation.
The direct result of this research seems to support the idea that the election was clean, but that in and of itself is peculiar; the election outcome was almost too consistent for an 85% turnout. Either the election was rigged very carefully, or the riggers got lucky; my money's on the former. If a conscious decision had been made to alter the result of this election, it would seem illogical to ignore statistics. The people of power in Iran definitely had the means to ensure that the election appeared clean from a statistical point of view. Going forward, I plainly expect other anomalies to appear, but I highly doubt the smoking gun will come in the form of mathematical/statistical analysis.
Unlike Andrew Sullivan, I do not believe "they panicked," I believe the outcome was coldly and methodically calculated.
Voting Regularities in Iran
Leave a Reply:
Name: (Defaults to Anonymous)
Type the characters you see in the image below: