Realtime Results From Iran and USA

On June 12th, 85 percent of eligible Iranian voters cast a presidential ballot; on June 13th, many of these same citizens took to the streets to protest the apparent reelection of Ahmadinejad. The final vote tally, as reported by Juan Cole, a prominent Middle East expert and History Professor at the University of Michigan, is below:

So here is what Interior Minister Sadeq Mahsouli said Saturday about the outcome of the Iranian presidential elections:

"Of 39,165,191 votes counted (85 percent), Mahmoud Ahmadinejad won the election with 24,527,516 (62.63 percent)."

He announced that Mir-Hossein Mousavi came in second with 13,216,411 votes (33.75 percent).

Mohsen Rezaei got 678,240 votes (1.73 percent)

Mehdi Karroubi with 333,635 votes (0.85 percent).

He put the void ballots at 409,389 (1.04 percent).

Source: Stealing the Iranian Election via

Despite the veil of electoral authenticity, rather large anomalies have been identified. Juan Cole quickly provided circumstantial evidence while the academic folks took a little more time completing their peer-reviewed papers. A consensus has emerged, even the Iranian State TV has acknowledged discrepancies in the election.

The purpose of this article is to invalidate the preliminary claims of election fraud in Iran. The first attempt came in the form of a graph popularized by The Atlantic columnist Andrew Sullivan. A composite of the original graphs is presented below; the multiple colors depict different perspectives on the same data set:

Iran Linear 2009 Relationship

Andrew Sullivan posted multiple versions of this graph to his blog on June 13th. From the various versions it became clear that the data source was consistent, but the application varied. Iran's Entekhab News and web based both used the election results provided by to create their graphs; which were later referenced by Sullivan.

The percent of the vote reported at each given coordinate is calculated with respect to the final two-way vote total; the reporting percent is overlaid near its associated coordinate. It is also important to note that there are two data sets. One is blue and has six dots while the other is red and has seven dots; the other red dots are exactly hidden behind the six blue points.

I will now provide four additional facts which have not be explicitly stated; these facts are either crucial to the creation or subsequent interpretation of the original graphs:

1. Ahmadinejad's vote total is represented by the X-axis and Mousavi's vote total by the Y-axis.

2. Entekhab News plotted [PNG] seven data points while TehranBureau's graph [PNG] excluded the first data point, while using the other six.

3. The regression technique is a linear least-squares approximation that is not forced through the origin. Ideally, the linear equations should pass through the origin; because at some point in time, before any votes have been counted, both candidates have zero votes.

4. The original source is written in Farsi, a language I cannot read; because of this, the coordinates for the data points were not explicitly available. TehranBureau provided [PNG] coordinates for the six data points they used, but the first point used by Entekhab News is still unavailable. It was however possible to use the least-squares equation depicted on their graph and the other six points to determine a very reliable estimate[*] for the first by reversing the regression. The coordinates used on the above graph are presented below:

    Report %   Ahmadinejad      Mousavi       Two Way
     12.98*      3,469,534     1,429,332      4,898,867
     26.45       7,027,919     2,955,131      9,983,050
     39.37      10,230,478     4,628,912     14,859,390
     54.55      14,011,664     6,575,844     20,587,508
     62.10      15,913,256     7,526,117     23,439,373
     66.50      16,974,382     8,124,690     25,099,072
     72.15      18,302,924     8,929,232     27,232,156

     Final      24,527,516    13,216,411     37,743,927		

By applying the data within fact #4, it becomes clear that the graph only encompasses about 45% or 60% of the total vote for the six and seven point graphs respectively. The entire analysis takes place within this region; the respective linear correlations are only valid within these ranges.

Sullivan initially referenced the Entekhab News version but it was not and still is not useful due to the language barrier; Sullivan would later reference the English analysis by Judging from their about page, shares strong ties with the Columbia Journalism School and features a slew of qualified contributors. Muhammad Sahimi, a chemical engineer and TehranBureau contributor, provided the following analysis on the six point graph:

The vertical axis (y) shows Mr. Mousavi's votes, and the horizontal (x) the President's [Ahmadinejad]. R^2 shows the correlation coefficient: the closer it is to 1.0, the more perfect is the fit, and it is 0.9995, as close to 1.0 as possible for any type of data.

Statistically and mathematically, it is impossible to maintain such perfect linear relations between the votes of any two candidates in any election - and at all stages of vote counting. This is particularly true about Iran, a large country with a variety of ethnic groups who usually vote for a candidate who is ethnically one of their own.

Source: Faulty Election Data via

[The referenced article has since been removed from]

Muhammad Sahimi's assertion is not well received and lacks any proper causation; especially given the 45% window of relevance. In fact his "impossible" claim appears to be baseless when compared to relevant data from the 2008 Presidential Election in the USA. I intend to reproduce the high R^2 value using real-time data I collected on November 4th, 2008. The reported vote totals from each state were queued for download about 400 times an hour from; data was collected in a circular queue as fast as possible. This does not mean that I have a complete set of data; networking and storage issues created significant discontinuities within the data, especially as the night progressed. MSNBC was used as the source because it was the only website that presented the election results as pure HTML; CNN, CBS, et al. used an asynchronous reporting scheme that prohibited the automated retrieval of their reported election results. Using some of this data, primarily from the East Coast, I will prove that a linear trend with a very high R^2 is the expected outcome of such a graph.

Let's first begin by analyzing Kentucky, one of the first states to begin reporting results. The state of Kentucky lies across the Eastern and Central time zones; about half the state's polls closed at 6 ET and the other half at 7 ET. The graph below illustrates the number of votes received by each presidential candidate with respect to the time at which they were recorded; I began collecting data from all states at around 6:40 CT. The graph below depicts each US candidate's vote total as a function of time and looks decisively non-linear, as many would expect:

Kentucky 2008 Election Votes

The graph above simply intends to illustrate the discontinuities and imperfections of our data set in a more logical format. The graph above is not supposed to resemble the Iran graph; the version intended for comparison, using Kentucky data, is presented below:

Kentucky 2008 Election Linear

From simple inspection, the Kentucky graph appears to be reasonably linear, clearly depicting a strong similarity to the Iran graph of internet lore. Although the R^2 is slightly lower than its TehranBureau counterpart, an R^2 value of .9995 still remains plausible. I would argue that Kentucky represents an acceptable microcosm of "ethnic groups," but other factors may be at play. Kentucky may be the norm or it may be the exception, the only way to find out is by analyzing more data. I conducted the same analysis using Virginia's data; first the votes vs. time graph from Virginia for a glimpse at our data set:

Virginia 2008 Election Votes

The Virginia data is clearly smoother than its Kentucky equivalent, but the curves resemble the same general form. I looked at a number of other states and the same general shape held across geographic and demographic borders. We'll now explore the relation between each candidate's vote totals in Virginia:

Virginia 2008 Election Linear

The Virginia graph seems to support the linear trend we saw in the Kentucky graph, but again the R^2 value is slightly lower than our target. This discrepancy can likely be attributed to the large number of points plotted, around 1,500, in the preceding graphs. If we were to strictly adhere to our four previously stated facts, specifically by using just six or seven points, we could probably achieve higher R^2 values. Let's go ahead and do that now.

The observations I made earlier will now play an important role in definitively disproving the "impossibility." Let's first begin by establishing the various threshold reporting levels for Kentucky and Virginia with respect to the original:


  Report %    Time CT   McCain       Obama      Two Way
   16.76       18:40    173,406     126,564      299,970
   26.49       19:12    268,616     205,480      474,096
   36.00       19:32    371,124     273,229      644,353
   53.03       20:02    532,940     416,231      949,171
   62.09       20:17    621,411     489,848    1,111,259
   64.64       20:24    642,008     514,837    1,156,845
   80.57       20:27    818,572     623,366    1,441,938

   Final              1,043,264     746,510    1,789,774

  Report %    Time CT   McCain       Obama      Two Way
   13.38       19:02     278,094     214,706     492,800
   26.53       19:32     547,199     430,212     977,411
   40.32       19:52     780,552     705,180   1,485,732
   56.49       20:17   1,049,451   1,031,789   2,081,240
   63.84       20:42   1,179,737   1,172,437   2,352,174
   67.88       20:52   1,251,123   1,250,040   2,501,163
   72.36       21:07   1,328,103   1,338,087   2,666,190
   Final               1,726,053   1,958,370   3,684,423

Some rough extrapolations must be done to satisfy these thresholds; there are several ways to do this, but the two-way vote total was chosen as the measuring stick. When the distribution of the data resulted in two points equally spaced from the intended threshold, the larger percentage was used. This is not a perfect scenario, but it should still serve to facilitate an unbiased result. If you don't like my methodology you can download the data in CSV format at the end of this article and make your own rules.

The composite six and seven point graphs for Kentucky, Virginia, Michigan and Pennsylvania are presented below with strict adherence to the original's methodology:

Kentucky 2008 Election Linear

The Kentucky data set is by far the most variable of the four states depicted and the threshold percentages also have the largest error relative to their corresponding target. Unfortunately, Kentucky is unable to provide definitive evidence, in terms of the R^2 value, to entirely vacate the "impossible" claim. Virginia is our next stop:

Virginia 2008 Election Linear

The R^2 value associated with the six point regression, .9996, is higher than the R^2 value of .9995 associated with the TehranBureau graph. The seven point R^2 value is however lower than the Entekhab News value of .9986. This unarguably debunks Muhammad Sahimi's assertion of statistical and mathematical impossibility. Such an outcome is very possible, perhaps even probable.

Virginia is also geographically representative of the urban/rural population demographics in Iran. Virginia has an urban population of 72.9% according to the 2000 Census while Iran's urban population is 68% according to their 2006 Census. Dissimilarities do however remain, including the margin of victory and the total number of votes cast; and while this may not be an ideal comparison, the aspect of impossibility has been erased. Onto Michigan:

Michigan 2008 Election Linear

The Michigan graph overcomes our seven point target with an R^2 of .9991, but it fails to match the six point result put forth by TehranBureau. The urban population of Michigan, at 75.5%, is also fairly close to Iran's. We have yet another example proving the possibility of such a correlation. Pennsylvania continues the trend:

Pennsylvania 2008 Election Linear

Pennsylvania's R^2 values match the TehranBureau mark and fall just short of the .9986 value needed to equal the seven point correlation coefficient presented by Entekhab News. Pennsylvania is however more urban than Iran by about 10.0%; but given that this is now the third state with an R^2 in excess of, or equal to, the value claimed by an Iranian source, the presence of a linear correlation is irrelevant to the possibility of election fraud.

Having dispelled the individual R^2 values for both the six and seven point data sets, I never ran into a state that met or exceeded both R^2 values. This lack of repeatability may be significant, but based upon the preceding work, its likely just a case of random coincidence and inconsistent data.

The bottom line is this, a linear relationship between two candidates' vote totals is the expected correlation. The direct result of this research does not however prove or disprove election fraud, it simply invalidates the linear correlation metric as a means of identifying fraud.

And finally, the real time data as promised:

Kentucky: [CSV, 49KB]
Virginia: [CSV, 60KB]
Michigan: [CSV, 56KB]
Pennsylvania: [CSV, 62KB]

If you do anything useful or interesting with this data, please let me know.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Furl
  • NewsVine
  • Reddit
  • SphereIt
  • Technorati
  • YahooMyWeb
  • Ma.gnolia
  • StumbleUpon
Your Ad Here

3 Response(s) to Realtime Results From Iran and USA

6/24/2009 11:18:28 AM CT

Unfortunately, this analysis is a little silly as none of those states you mention has 70 million people in it. Iran is a large country, so if you show that the US election had .9995 R^2 after 40 million votes, this would be a little bit more convincing.
6/24/2009 3:18:03 PM CT

In a two candidate election the votes of either will depend almost entirely on the votes of the other, thus the linearity and high R^2. For all 4 states mentioned in the article combined R^2 is 0.9996 - higher than the quoted number for Iranian election. In an election with more than 2 candidates the fit should be less perfect, but for the particular case of Iran, which has only one time zone and where the elections had two clear leaders, high R^2 is to be expected.
6/28/2009 12:22:25 PM CT

This analysis is silly; people who want to believe in electoral fraud in Iran will completely dismiss this analysis, even though the evidence for fraud is vanishingly small.

Leave a Reply:

Name: (Defaults to Anonymous)
Type the characters you see in the image below:
(Word Verification)
Electoral College Projection Map
Senate Projection Map