Wisconsin’s voter rolls have been bothering me for months. They lacked Date of Birth (DOB) or age fields, which makes it difficult to confidently assert that any given pair of records with different ID numbers refers to the same person.

HAVA requires each state to maintain a statewide voter registration list that assigns a unique identifier to each legally registered voter. When the same person appears in the database multiple times with different voter IDs—what I call ‘clones’—this creates multiple unique identifiers for a single voter, which appears to violate HAVA’s requirement for a single, uniform registration system. The law’s language suggests each voter should have one registration with one unique identifier.

Even if the records are later purged, thus potentially nullifying any nefarious utility they may have had, it is illegal to create the records in the first place. For this reason, the law establishes certain guidelines to prevent their creation. The fact they are created regardless tells us that either the law was never implemented, it was ignored, or deliberately violated.

However, there are many ways to identify clones. Using a combination of a voter’s fullname and DOB is an excellent way to establish a clone relationship with high confidence. In Wisconsin, this was impossible. Therefore, a different method had to be used.

In my previous post, I described “Paquette’s Law” my name for something I might have discovered (and might not have, we’ll see after I do a literature review). Paquette’s Law establishes, with evidence, that common names are so rare in large databases that common names account for at most about 4% of all records in a database (Texas), but in most cases, less than 1%. For this, I am defining “common” as names that occur more than 25 times, despite the fact that some names appear almost 2,000 times.

My idea was to try every reasonable match criterion I could think of, then use states with DOB data to establish ratios for estimating clones in states without it. For instance, if X% of New York’s full-name matches also match on DOB, that ratio could help estimate Wisconsin’s true clone count, even though individual matches couldn’t be confirmed without DOB data.

The first step was to understand how different match criteria perform—which ones are too permissive (allowing false positives), which ones are too restrictive (missing real clones), and which ones strike the right balance between confidence and coverage. That’s what this post examines. In the next post, I’ll show you what happened when I applied these criteria to the data—and why the ‘common names’ explanation doesn’t just fail; it becomes mathematically impossible.

Each matching criteria gradually reduces the number of matches. The least restrictive criteria has the most false positives, and the most restrictive has the most false negatives. New York’s data shows how this works (Table 1). For this, “Short Name” means first name + last name, and “Full Name” adds the middle initial. As you can see, adding the middle initial doesn’t change results by much, but house number dramatically reduces the number of matches.

Table 1 Match Criteria Performance in New York (23.2M total records)

I prefer short names to full names because I know that canvassing establishes that middle names are often inaccurate, and clones often have a blank middle name field, leaving nothing to compare with the original record. Leaving out the middle name does introduce false positives, but not many. Tables 2 and 3 show ShortName and FullName performance with DOB across nine states..

Table 2: Shortname & DOB frequency distribution by state (record counts)

Table 3 Full name & DOB frequencies by state

House number has the opposite problem. Many clones are generated when a voter moves to a different county. Using the house number criteria excludes all such clones, leading to extremely high false negative rates. For this reason, I don’t like to use house number as a criteria, though it is useful to establish an absolute floor for high confidence clone identification.

When I work with this data, I have two standards in mind: mine (what looks right to me), and what I think can be easily defended in a court of law were I to testify on this subject. The standard I like to use is either Short name + DOB, or Full name + Age, depending on whether DOB is available. If neither is available, but Phone or email are, then I use Full name + either phone or email. I sometimes pay attention to suffixes, though they are so rare that they have minimal statistical impact. Also, they can be represented as Jr/Sr or I/II/III/IV, etc. where “Jr” could be a mark II or higher. Due to this ambiguity, I tend to ignore this.

For a legal standard, I’ll use Full name + suffix (represented as an “*” in initial search, then the original records compared. DOB or email/phone is also preferred. Again, it depends on what data is available. Speaking of which, let’s take a look at the states that contain these fields: California, North Carolina, Rhode Island, and Wisconsin (Table 4).

Table 4: Clone detection using Email and Phone in states with available data

The exciting news here is that in Wisconsin, Full name and Email establishes there are 115,045 high confidence clone groups (groups that share the same name) affecting 246,416 records. This is a very significant proportion of Wisconsin’s database. Phone numbers are about double that. It is possible that a Jr/Sr pair live at the same residence and share a phone number, but in Wisconsin, there are only 31,797 records with a “Sr” suffix and 75,736 with a “jr”, so even if all them were in clone groups matched by phone (improbable) they still aren’t numerous enough to bridge the gap between email and phone matches.

California’s low match rate (only 1,736 phone matches despite 86.69% phone coverage) suggests exceptionally clean voter rolls, while Wisconsin’s 551,683 phone matches from similar coverage (48.76%) indicates a much higher duplicate rate.

Different match criteria serve different purposes. FullName + House + DOB provides the highest confidence but misses clones created by moves—making it useful for establishing an absolute floor. FullName + DOB (or ShortName + DOB) captures clones regardless of address changes while maintaining high confidence through the specificity of birthdates. FullName + Email or Phone provides alternatives when DOB isn’t available, though with slightly different reliability characteristics.

The sweet spot—ShortName + DOB or FullName + Age—balances precision with coverage, minimizing both false positives and false negatives. These criteria are restrictive enough to give high confidence that matches represent the same person, while comprehensive enough to capture real-world clone patterns.

With this framework established, we can now turn to the central question: Can common names explain the clones we find? In the next post, I’ll present the results—and demonstrate why the answer is definitively no. The numbers aren’t just larger than expected; they’re larger than mathematically possible under the “common names” hypothesis.