A bigger challenge comes from better targeting of observers: since this is a repeat election within a relatively short time frame, ODIHR can target so-called problem districts and precincts much more accurately. More observers in these problem districts means more problems found. It is perfectly possible that a relatively stable number actually hides a marked improvement. Again, that’s a sort of non-obvious selection bias.
Add another curious component: in the January election at least some teams were ordered to abandon the observation because of rough cold conditions and snowfall at some point in the night (“drive before the driver gets too tired”), and return to their hotels. This time, with better weather, the observation probably was more sticky, and more teams stayed until the very end when some of the problems become really apparent. Again, this could have some impact when comparing the numbers.
Noting these counterintuitive impacts (some small, some big) on absolute numbers shouldn’t serve to dismiss the observation effort, nor the attempt to quantify. Yes, no count should be bad, and training and everything else should remain as ambitious as possible. We’re noting this primarily to contribute to a sophisticated use of the data, and again to underline the need for a revised observation methodology, which ideally emphasizes more sophisticated sampling.