Georgian Election | ODIHR Preliminary Report and its Percentages

So the preliminary report on yesterday’s Parliamentary Elections which ODIHR has just released again notes that the count had problems.

While this, as discussed yesterday, is not a good overall indicator for how the counts went throughout the country, it raises the question whether we can at least compare this report with the one for the Presidential Election in January. Presumably, if 23% of observers managed to find a bad count in January, and 22% identify problems now, it should mean that the number has remained relatively stable. So: in terms of count, the election roughly is the same.

Right? Actually, no. First, different observers have different standards in terms of what they characterize as “bad”. As the ODIHR statistician (a figure fighting for more attention internally, and fortunately making some progress) will tell you, Russian observers, for example, fill out their forms somewhat differently. Since there is no training, there’s no calibration of what “bad” means, and how to distinguish that from “reasonable” or “very bad”. Change the composition of the Election Observation Mission, and you may change the results. Although this is the biggest problem when comparing two very different missions (Georgia’s numbers, with 22% of counts assessed as bad or very bad and Armenia’s Presidential Election in February, with 16% in that category just can’t be meaningfully compared), it can also affect a comparison of two elections in the same country.

A bigger challenge comes from better targeting of observers: since this is a repeat election within a relatively short time frame, ODIHR can target so-called problem districts and precincts much more accurately. More observers in these problem districts means more problems found. It is perfectly possible that a relatively stable number actually hides a marked improvement. Again, that’s a sort of non-obvious selection bias.

Add another curious component: in the January election at least some teams were ordered to abandon the observation because of rough cold conditions and snowfall at some point in the night (“drive before the driver gets too tired”), and return to their hotels. This time, with better weather, the observation probably was more sticky, and more teams stayed until the very end when some of the problems become really apparent. Again, this could have some impact when comparing the numbers.

Noting these counterintuitive impacts (some small, some big) on absolute numbers shouldn’t serve to dismiss the observation effort, nor the attempt to quantify. Yes, no count should be bad, and training and everything else should remain as ambitious as possible. We’re noting this primarily to contribute to a sophisticated use of the data, and again to underline the need for a revised observation methodology, which ideally emphasizes more sophisticated sampling.