BSBI Distribution Database > message board

Duplicate filter

Please report any problems with the website. Suggestions for changes are also very welcome.

Duplicate filter

by jdsh » Sat Oct 24, 2020 2:41 pm

Hi Tom,

The duplicates filter may be a little too fierce. I was trying to work out why the total number of records shown by "my county" for 2020 differs from both MapMate and a simple DDb query. In part it may be because two sites with similar names in the same monad are being considered as duplicates, for example "Castle Camps RSV 89/10" is a different site to "PRV S6 Castle Camps". For mapping common species at a monad scale this doesn't matter, but it might be more significant for scarce species that are locally common.

It will all be solved once we get a new system that can cope with site recording better than MapMate!

I'm not sure whether this entirely explains the difference between 38787 for my county, 41953 in the DDb and 43003 in MapMate, but a combination of a recent import and duplicates (both as seen by the DDb and in MapMate) may be sufficient.

Posts: 42
Joined: Sun Nov 25, 2012 11:41 pm
name: Jon Shanklin

Re: Duplicate filter

by admin » Mon Oct 26, 2020 11:04 am

Hi Jon,

The duplicate filter was mainly intended to give people a simple overview of distribution and survey coverage without the search results being cluttered with many identical or near-equivalent records (a major source of irritation in the early days of the database). As the filter works on live search results it needed to be quick to apply and completely automated. The process only considers grid-square, taxon and date - everything else, including site and recorder names are ignored. Within those limitations the process works reasonably well, provided one accepts that the definition of 'duplicate' is very loose. The filter isn't intended to assist with validation, as it could lead to some very distorted results and hide records that include important details or select an inferior copy of a duplicate pair of records.

In terms of counting total records the duplicate filter can be useful and, overall provides more credible results than raw counts of records (across the whole database 20 - 25% of records are thought to be duplicates).

If trying to directly compare with MapMate to check that totals tally then it would be best not to filter - the totals should then match. If there's still a discrepancy then it would be worth investigating further and perhaps resetting your MapMate synch and resynching, in case any data has been skipped at some point.
Tom Humphrey
Database Officer, Botanical Society of Britain and Ireland (BSBI)
c/o Centre for Ecology and Hydrology,Maclean Building, Crowmarsh Gifford, Wallingford, Oxon, OX10 8BB, UK.
User avatar
Posts: 438
Joined: Tue Nov 20, 2012 4:16 pm
name: Tom Humphrey

Return to Bugs and suggestions