by AndyAmphlett » Tue Nov 05, 2019 9:54 am
Verification & validation of Indicia records on the DDB
I hope the following, based on my experience of validating Indicia records for two Scottish vcs will be of interest / help.
There are records from Indicia for all vcs apart from 6 vcs in Ireland (H) 11, 13, 22, 24, 26 and 32. Number of records per vc varies from 1 to 37,983 (median = 2751).
I have checked the records for vc94 and vc96 (2906 records in total). I undertook all the verification and validation within the Indicia workspace in the DDb, before the records were moved into the main DDb workspace. I split the records by dataset, and further subdivided them by taxon / recorder / grid reference etc as appropriate. Some vcs have a lot of records to check, eg. 48 vcs have >5000 Indicia records. To make the process of validation efficient make sure you use the DDb's validation tools, eg. to find hectad singletons. Familiarity with using the DDb, and some experience of validation are both required before tackling the Indicia records.
I ignored any records from Garden Bioblitz datasets (as being potentially outwith BSBI recording guidelines), as well as any records where there was no proper recorder name (mainly from the iSpot dataset). I also ignored records where the site name was invalid (typically an address that did not match the site grid reference), but did accept records with no location name. A number of records in the BSS Urban Flora project dataset had obviously corrupted sequences of grid references (perhaps errors created in a spreadsheet prior to import).
Some records could have been accepted if the taxon had been recorded as an aggregate, rather than one of the segregates that were unlikely to be correct for the vcs in question. For example records of Gymnadenia conopsea s.s. would be acceptable as the sens. lat. taxon, but not as the s.s. taxon (which does not occur in these vcs). Guidance is not to edit Indicia records within the DDb, so these were marked as 'needs checking' with a reason given.
Many potential new hectad records were also marked as 'needs checking', especially when they were a marked extension of range and the recorder was unknown to me. Few records have supporting details, but some do have photos which are often very useful, supporting or contradicting a record's ID. NB some of the record identifications are, from looking at accompanying photos, spectacular errors.
In total I accepted (confirmed and moved to the main DDb workspace) 91.5% of the vc94 and vc96 records. In this sample of records, the 'error' rate (records not acceptable for Atlas 2020) was approaching 10%, which is, in my experience, very high. Typically, eg records via record cards from experienced recorders, the error rate will be 0.5% or less. Therefore, I suggest that the Indicia records for other vcs will require very careful checking before inclusion in the main DDb workspace. It must be stressed that the Indicia records appear to have a much higher error rate, and proportion of records unsuitable for inclusion in the main DDb workspace, than VCRs may be used to dealing with.
On the positive side, the vc94 and vc96 records did contain a small number of potentially really interesting new records, but they will mainly require additional supporting information, or field visits to confirm. Hence they will be unlikely to make it into the final Atlas 2020 dataset. This is the primary frustration with the Indicia records; potentially good new records by recorders unknown to the VCR, and with no means of directly contacting them to raise any queries, and no supporting evidence.
I noticed that some records had been accepted by automatic checks within Indicia. I was happy that most of those records were OK, so moved them to the main DDb workspace. But in the DDb workspace they appear as unchecked, so still require confirmation. NB out of about 400 records accepted by the automatic checks within Indicia, I found 3 or 4 that I was not happy to accept.
Andy.