I was recently reminded of an analysis I’d started a long time ago and never finished. I sat down this morning at about 10am to write a few words about it. However, a few questions had to be answered first, and that meant spending all day working on those answers.
New York’s voter rolls contain 4 algorithms used to assign ID numbers. The one I write about most often is the Spiral. However, the Shingle algorithm is in some ways much more interesting. The Spiral gets more attention because it is solved, but the Shingle isn’t.
ID numbers assigned by the Shingle, all 710,000 or so of them, are more than 99.34% purged. All by itself, that is interesting because it means that status for those records can be almost perfectly predicted based on the ID number alone. That shouldn’t be possible. They also have a much higher clone count, favor counties that use the Metronome algorithm in the lower “in-range” portion of the numbers used for state IDs, and tend to have out of range state ID numbers with registration dates before 2007, unlike numbers belonging to all other algorithms.
The Shingles are assigned in blocks of numbers that form a distinctive pattern in a scatterplot. The illustration at the top of this article show how it is formed, and the next one shows a small group of shingles, displaying their distinctive shape. Each “shingle” in the group represents about 5,000 records. The blue dots are purged records, the orange dots are active.
Here is a close up of one of them:
Notice the number of purged versus active or inactive records in the group. Most shingles are like this, but a few have one or two records buried in thousands of purged records, usually only one per block.
A check of the state ID numbers associated with these records show that they were assigned in large blocks of numbers, but the registration dates are off by decades. For reasons I won’t go into here, I do not believe the registration dates are the dates the numbers were assigned. It looks like the true date all out of range numbers were assigned is in the middle of 2007. And yet, almost all Shingle numbers, which are all out of range, were assigned before mid-2007. This implies that the registration dates are either false or grandfathered from a previous version of the database.
Today, I looked at the longest continuous sequence of numbers I know of, 98,880 SBOEID numbers in a row, all assigned to Nassau County registrants. Of the group, 99.98% are purged. In many years, 100% of the numbers are purged. This does not match the pattern of numbers that don’t use the Shingle, even for the same years. Overall, the percentage of purged records in Nassau starts at 89.80% in 1955, and quickly goes down to 51.82% by 2000, and 38.97% in 2007.
In the block of Shingle numbers, they start at 100% in 1988, and stay at or above 99.93% through 2002. This is important because the grandfather theory would suggest that all purged registrations in the same county before 2007 would use the Shingle. Instead, only about 25% of Nassau’s 790,795 purged records have Shingle ID numbers. Therefore, although it looks like it was known that the records were purged at the time they were given numbers, it isn’t clear if they were created that way, possibly with false registration dates, or if they had been purged before the algorithms were introduced, and then this group of numbers were selected for the Shingle treatment.
If they were selected, I don’t know why, but can say there are a lot of clones in the shingles. On average, there are twice as many clones in any given year as non-Shingle ID numbers. Within the Shingle, the percentage of clones starts at a low of 0.13% in 1988, and goes up to 15.20% in 2002. That follows the overall pattern found throughout the state, though at a higher amplitude.
Unfortunately, I still can’t say much more than this: the Shingle numbers behave differently than other numbers and are associated with the highest concentration of apparently illegal ID numbers/registrations than any of the other algorithms. All I found today are a few more features that distinguish them, but not why they are different. What is important is that a predictably high concentration of illegal records can be found by recognizing the Shingle pattern. That should be impossible.
I am flustered to understand wholly the intricacies of your fabulous work. So, what I may add here may be already known or be redundant.
“It looks like the true date all out of range numbers were assigned is in the middle of 2007. And yet, almost all Shingle numbers, which are all out of range, were assigned before mid-2007.
(This implies that the registration dates are either false or grandfathered from a previous version of the database.)”
What was going on inside the voter registration data base in New York in the middle of 2007 has to be considered with what was going on on the outside at the time. The HAVA lawsuit by the DOJ against NY and the change over from lever machines…to the modern computer system et al.?
https://elections.ny.gov/system/files/documents/2023/10/2009-amended-state-plan.pdf
On pages 9 and 10:
The statewide voter registration list, NYSVoter, was fully implemented in the summer of 2007. The system was developed on a Microsoft platform.
In 2005, the Commissioners of the State Board of Elections decided to use the Washington State voter registration system code and documentation and contracted with Saber Corporation (??) to redesign the Washington model to meet NY requirements for a larger number of voters.
“Saber Corporation 26+ Years of Senior Care Software Development
Founded in 1994, our company was started by a team consisting of a Multi-Purpose Senior Center Director and two programmers with a combined 28 years of experience. Our reputation as the most flexible and affordable software package for the management of needs of Senior Centers, Area Agencies on Aging, and State Units on Aging is a reputation we hold dear.”
https://www.sabersite.com/our-company
https://legistar.council.nyc.gov/View.ashx?M=F&ID=1035883&GUID=A9E45AD0-EA0B-46CC-B05C-15FE183583D7
Under minutes from the September/October 2007, NYSBOE meeting:
“ITU
George Stanton reported that the audits for all the counties information for
NYSVoter have been completed. Everything seems to be working properly and
everything is complete and the database was finished before the deadline and at a
lower cost estimate.”
Did they go cheap at the expense of security?
“Preliminary discussions are underway with the NYSVoter
Steering Committee to plan for monitoring the county boards of elections
maintenance activities on an on-going basis.”
https://elections.ny.gov/system/files/documents/2023/10/approved11072007minutes.pdf
https://www.nytimes.com/2006/11/16/nyregion/mayor-says-states-delays-may-muddle-2007-elections.html
Transfer of the ownership of the voting systems in New York from individual cities, towns and villages went over to each County Board of Elections. Prior to this change, only the City of New York, Monroe, Nassau and Suffolk owned their own voting machines.
One of the goals was to create a new uniform statewide voter registration list.
The 62 county boards are each responsible for registering voters in their county. They must keep the lists accurate and current, removing individuals from the list who are no longer eligible to vote in their jurisdiction. Under HAVA the SBOE created a statewide list by integrating with the current county voter registration systems—2007.
What does “purging” amount to? Is it just a flag like active/inactive? Are all the other pieces of information about the record retained? If so, all these purged records might be used as a bank of registrations that can be “unpurged” when required, and then voted. If you look at snapshots of the data over time, a red flag would be seeing any purged record become active again.