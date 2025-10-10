This is the third, hopefully final, post in a short series on the difficulties associated with identifying cloned records in large databases that are missing critical information. You can find Part One here. Part Two is here.

In Part One, I introduced Paquette’s Law—the empirical finding that in large voter registration databases, common names are remarkably rare. Across 126 million records in 15 states, names appearing more than 25 times represent less than 4% of all records in Texas and less than 1% in most other states. This distribution has critical implications for the “common names” hypothesis—the claim that duplicate registrations can be explained away as different people who happen to share names.

In Part Two, I examined the methodology for identifying clones—records with different voter IDs that appear to belong to the same person. I showed how different matching criteria produce vastly different results, each with trade-offs. The most restrictive criteria (like FullName + House + DOB) minimize false positives but miss clones created when voters move. Less restrictive criteria (like FullName alone) maximize coverage but introduce false positives. The sweet spot—criteria like ShortName + DOB or FullName + Email—balances confidence with coverage, capturing real-world clone patterns while maintaining high accuracy.

In this post, I’ll bring these ideas together. Using the frequency band analysis I introduced in Part One, I’ll demonstrate why the “common names” hypothesis doesn’t just fail to explain the data—it’s mathematically impossible for it to do so. Finally, I’ll show how these findings lead to a defensible minimum estimate of approximately 700,000 clones in Wisconsin, despite the absence of date-of-birth data that other states provide.

Table 1 illustrates how rare truly common names (appearing over 1,000 times) are. They almost never appear, and then only in the largest states by population. The one apparent exception to this is Florida, with 0.51% in this frequency band. It may not sound like much, but compared to 0.00% everywhere else, it’s huge. However, almost all of the records are protected records that have had the names redacted, making them all look like they match because all three name fields are empty.

Table 1: Paquette’s Law, Common names are rare, rare names are common

Unique names—those appearing only once—represent the largest single category of records in every state, typically comprising 50-60% of all records. Names appearing 2-5 times are the next most common category. Together, these rare names account for over 90% of all voter records, while truly common names (26+ occurrences) represent less than 1%.

“Full name” matches on first name, last name, and middle initial. “Age” is sometimes a pre-calculated field provided by the state, but in states that have birthyear or DOB, I calculated it to compare with other states. Table 2 shows that when these two criteria are used, only one of the 13 states that could be checked on this criteria (TX) had any matches in the 26-50 range. In TX, there were only 56 records. This means Texas has at most two distinct names in this frequency band—essentially zero common names.

Table 2: Frequencies, matched on Full name and Age

Changing the match criteria to Full name & DOB increases the tightness of fit considerably (Table 3). When this is done, there are no matches for any names that appear more than 25 times, and the only state that has any matches above the rarest band of 2-5 name occurrences is New York, with 511 records in the 11-25 frequency band.

Table 3: Frequencies, matched on Full name and DOB

Fullname&DOB is not the most restrictive match type available, and yet it has already eliminated all possible common names from the pool of possible contributors.

The “Short name” match criteria is the first and last name only. Middle initial is ignored for this. I like short name over full name in most situations because I have found that middle name or initial is frequently missing in voter roll databases. This makes it unclear whether a person has a middle name but it was omitted, or they literally don’t have a middle name. Thanks to canvassing performed by New York Citizen’s Audit (NYCA), I know that many short name clones are not false positives. Either the voter’s middle initial was omitted, or was entered incorrectly.

By adding the house number criteria, matching is limited to records with the same address, which reduces the number of matches by about 8x (Table 4). It also creates many false negatives, by omitting people who have moved to a different county within the state. Despite this extremely restrictive match criteria, we still find clones in every state that has these fields. The numbers are significant because DOB and other age criteria are omitted. This remains a problem, but the reason will be explored more fully in Part IV.

Table 4: Frequencies, matched on short name, and house number

Wisconsin has no age fields in their database. My original goal with this research was to get around that by coming up with an estimate based on states that had that information, from which I could calculate a ratio of full name matches to full name&DOB matches. However, the states are so different from each other that I didn’t trust the results. Instead, I used the email and phone number fields in Wisconsin’s rolls.

Email and phone number are each more restrictive than DOB, making them both more restrictive and more likely to produce false negatives. This is even more true because less than 50% of records have a phone number listed, and only about half of those had an email. This missing information immediately creates a large pool of false negatives, because those records can’t even be checked.

Using FullName + Email and FullName + Phone criteria, Wisconsin shows 246,416 email-matched clones and 551,683 phone-matched clones. However, these aren’t entirely separate groups—102,885 clones (42% of email matches) appear in both categories, representing the highest confidence subset where someone registered multiple times while keeping both contact information. Accounting for this overlap, Wisconsin has approximately 289,000 distinct individuals with 705,000 voter ID numbers—about 9.1% of the state’s 7.7 million voter registrations (Table 5).

Table 5: Frequencies, matched on Full name, Email, and Phone number (email and phone listed separately)

New York has more clones found using any criteria. Table 6 shows how many were found using each search type, and in each frequency band. This clearly shows that clone counts increase as names become rarer, not the other way around. Even in higher frequency bands (6-25), 99%+ of matches are clones with different voter IDs, not duplicates with the same ID

Table 6: NY results, broken down by all matching criteria and all applicable frequency bands

Table 7 reveals why the common names argument isn’t just wrong—it’s mathematically impossible.

Consider New York’s FullName + Age matches: 3,250,932 clone records. If every single one of these were false positives caused by common names, they would all have to come from names appearing 26+ times. But New York has exactly zero records in this category. The clones exceed the theoretical maximum by infinity.

Even for the loosest criterion (ShortName with no middle initial), New York has 12,781,661 clones but only 3,017,632 common name records. The clones exceed common names by a factor of 4.24—and that’s assuming every single common name record is a false positive, which we know isn’t true.

Wisconsin shows the same pattern: 551,683 phone-matched clones, but only 160 common name records in that frequency band. The clones exceed by a factor of 3,448. For email matches: 246,416 clones vs. 79 common name records—a factor of 3,119.

This isn’t a matter of statistical uncertainty or modeling assumptions. The common names hypothesis requires more raw records than exist in the database. It’s not improbable—it’s impossible.

Table 7: Clone frequencies by match type compared to number of common name records in NY and WI databases

The data admits only one explanation: these are not different people who happen to share names. They are the same people with multiple voter registrations. The common name hypothesis, far from explaining the clones, is actually the opposite of the truth—having a common name is a strong indicator that a record is NOT a clone.

But there’s one more test. If these were truly different people sharing names, their ages would be randomly distributed across all possible values. If they’re clones—the same person registered multiple times—their ages should cluster. In Part 4, I’ll show the results of that test across 84 million voter records in 9 states. The pattern is so strong it can only be described as a smoking gun.

Speaking of which, there was a fire near the grocery store I shopped at yesterday and I think I inhaled some smoke. Due to some chest pain I’m still feeling, I may not post part IV right away.