Page 1 of 1

Duplicate filter

Posted: Sat Oct 24, 2020 2:41 pm
by jdsh
Hi Tom,

The duplicates filter may be a little too fierce. I was trying to work out why the total number of records shown by "my county" for 2020 differs from both MapMate and a simple DDb query. In part it may be because two sites with similar names in the same monad are being considered as duplicates, for example "Castle Camps RSV 89/10" is a different site to "PRV S6 Castle Camps". For mapping common species at a monad scale this doesn't matter, but it might be more significant for scarce species that are locally common.

It will all be solved once we get a new system that can cope with site recording better than MapMate!

I'm not sure whether this entirely explains the difference between 38787 for my county, 41953 in the DDb and 43003 in MapMate, but a combination of a recent import and duplicates (both as seen by the DDb and in MapMate) may be sufficient.

Jon

Re: Duplicate filter

Posted: Mon Oct 26, 2020 11:04 am
by admin
Hi Jon,

The duplicate filter was mainly intended to give people a simple overview of distribution and survey coverage without the search results being cluttered with many identical or near-equivalent records (a major source of irritation in the early days of the database). As the filter works on live search results it needed to be quick to apply and completely automated. The process only considers grid-square, taxon and date - everything else, including site and recorder names are ignored. Within those limitations the process works reasonably well, provided one accepts that the definition of 'duplicate' is very loose. The filter isn't intended to assist with validation, as it could lead to some very distorted results and hide records that include important details or select an inferior copy of a duplicate pair of records.

In terms of counting total records the duplicate filter can be useful and, overall provides more credible results than raw counts of records (across the whole database 20 - 25% of records are thought to be duplicates).

If trying to directly compare with MapMate to check that totals tally then it would be best not to filter - the totals should then match. If there's still a discrepancy then it would be worth investigating further and perhaps resetting your MapMate synch and resynching, in case any data has been skipped at some point.