# Voting Regularities in Iran

Update [June 24, 2009]: I have compiled a more detailed and thorough analysis on the following subject. I highly recommend reading the new article.

The Iranian people held their presidential election this Friday, and despite the Mainstream Media's abysmal coverage of the event, the internet community seems to be thoroughly invested in the outcome. The consensus seems to believe that the election was rigged for the incumbent, Ahmadinejad; a sentiment I have essentially accepted. Juan Cole, a prominent Middle East expert and History Professor at the University of Michigan, has compiled a fairly persuasive list of anomalies. Despite Dr. Cole's logical analysis, the internet seems to be enthralled with the following graph, for all the wrong reasons:

For the original context, we must first reference Andrew Sullivan's original post at The Atlantic:

[Above image in English]

Yes, this obviously was a "divine assessment". They didn't even attempt to disguise the fraud. Which, to me, tells me they panicked. This graph is a red flag to Iran and the world.

Unfortunately, Andrew provided absolutely no evidence to support the proposed correlation between election fraud and the aforementioned image, but he did provide a link to the apparent authors, TehranBureau, which is good journalism. I really know nothing about TehranBureau, other than their complete inability to identify correlation and causation:

The vertical axis (y) shows Mr. Mousavi's votes, and the horizontal (x) the President's [Ahmadinejad]. R^2 shows the correlation coefficient: the closer it is to 1.0, the more perfect is the fit, and it is 0.9995, as close to 1.0 as possible for any type of data.

Statistically and mathematically, it is impossible to maintain such perfect linear relations between the votes of any two candidates in any election -- and at all stages of vote counting. This is particularly true about Iran, a large country with a variety of ethnic groups who usually vote for a candidate who is ethnically one of their own.

While TehranBureau has correctly identified a correlation and the associated math surrounding the correlation, they have entirely failed to identify the cause; it seems self evident, but its not that simple. Let's first address their claim of impossibility by looking at data from the 2008 US Elections.

As a corollary: On November 4th, I collected each state's reported vote total about 400 times an hour from MSNBC.com. MSNBC was used as the source because it was the only website that presented the election results as pure HTML; CNN, CBS, et al. used an asynchronous reporting scheme that prohibited the automated retrieval of their reported election results. Using some of this data, I will prove that a linear trend is the expected outcome.

Let's first begin by analyzing Kentucky, one of the first reporting states. The below graph illustrates the number of votes for each presidential candidate with respect to the time at which they were recorded. You'll notice that the graph looks decisively non-linear:

I think this is the graph most people expected to see. However, the graph presented by TehranBureau and later Andrew Sullivan, features decisively different axes. I'll now present the Kentucky version of the TehranBureau map:

The Kentucky graph directly above is virtually identical to the Iran graph of internet lore. I would also argue that Kentucky represents an acceptable microcosm of "ethnic groups," and an identical analysis of Virginia's data seems to confirm this assertion. First the votes vs. time graph from Virginia:

Now the silver bullet:

The empirical evidence cannot be argued; I could go through and create similar graphs for every state and the linear relationship with a very high R^2 would hold. For the record, I did analyze data from California, Minnesota, Vermont and West Virginia to verify this result. I also used a great many more data points for the linear regression than the six points used on the original graph; this simply serves to illustrate that essentially any six points could be selected along the entire time frame and the linear relationship would remain valid. The greater number of points has a negligible effect on the R^2 coefficient.

The bottom line is this, a linear relationship between two candidates' vote totals is the expected correlation.

The direct result of this research seems to support the idea that the election was clean, but that in and of itself is peculiar; the election outcome was almost too consistent for an 85% turnout. Either the election was rigged very carefully, or the riggers got lucky; my money's on the former. If a conscious decision had been made to alter the result of this election, it would seem illogical to ignore statistics. The people of power in Iran definitely had the means to ensure that the election appeared clean from a statistical point of view. Going forward, I plainly expect other anomalies to appear, but I highly doubt the smoking gun will come in the form of mathematical/statistical analysis.

Unlike Andrew Sullivan, I do not believe "they panicked," I believe the outcome was coldly and methodically calculated.

## 4 Response(s) to Voting Regularities in Iran

1
George
6/14/2009 8:51:14 AM CT

There is a huge difference between .998 and .9946 or .9954. Given the fractured nature of Iran's electorate, you'd expect to see the kinds of trends visible in your VA and KY examples. That you don't raises questions about the vote.
2
southpaw
6/14/2009 5:05:51 PM CT

The commenter above is correct; the results from Virginia and Kentucky show 3.3 and 3.9 times more variability, respectively, than the results from Iran.

In any case, please do this analysis for the entire United States, which more closely resembles the geographically diverse electorate of Iran than does any single state.
3
Sam
6/15/2009 6:27:50 AM CT

I think the variation in the slope is critical: The KY plot has a slope of 0.7026 and the VA plot has a slope of 1.1346. That is a tremendous difference and is what would be expected to occur in a regionally and ethnically diverse electoral map.

Lastly, why does everyone plot x = a*x + b with b not equal to zero? The curves should be forced through the origin in this case because there is likely to never be a moment when one candidate has zero reported votes and the other has a non zero value. This is even more the case in VA where when McCain has zero votes, Obama has a negative value. If you do this, the R^2 value will not justify the use of a linear relationship to the same degree as it did before.
4
lookCloser
6/15/2009 4:17:49 PM CT

This is a very dishonest writeup. You have all the data time stamped from CNN. So, if you were not dishonest, you would take a few snapshots in time and plot those FOR THE ENTIRE US and compare the linear relationship.

So 9pm 10pm 11pm etc and compare the trend when you look across a broad geographic and political landscape not just one state. The variation with only Virginia is usually mostly from which precincts can report first and many states immediately start out their results with absentee data, so you are comparing technical nuances of vote tabulation within states where certain counties can report early, bigger urban counties later, all counties may or may not immediate release the absentee data which might make up a large portion of total votes.

Compare the entire US versus time and see how linear it is.

