BSBI Distribution Database > message board

My county - not yet found

Please report any problems with the website. Suggestions for changes are also very welcome.

My county - not yet found

by jdsh » Mon Aug 14, 2017 2:20 pm

Hi Tom,

When looking at the not yet refound species lists for tetrads or hectads in My County they often include an aggregate or species even when the species or sub-species has been found. If you want to improve the percent refound statistics this almost forces one to record for example Chenopodium album agg, even though Chenopodium album has been found,. Would it be possible to remove the "parent" from the counting if a "child" has been found? I'll confess that my own record card generating software does list lots of parents even when the child has been recorded, so appreciate this is easier said than done.

Jon Shanklin
Posts: 42
Joined: Sun Nov 25, 2012 11:41 pm
name: Jon Shanklin

Re: My county - not yet found

by admin » Mon Aug 14, 2017 2:51 pm

Hi Jon,

As you noted it's a more difficult problem than it seems superficially. I'm agree with you that the current behaviour (counting aggregates separately) is unhelpful, but I've not yet worked out an efficient way to treat aggregates more correctly.

A related issue is that many recorders would like microspecies to be excluded from the counts. In practice the best solution to that would be to aggregate them to section or genus level.

I think it's likely that to treat higher-level groupings correctly, 'refound'-type queries would need to do a second pass through the data to test that any aggregates are not already represented by a child clade. As it stands this would entail horribly inefficient walking of the taxon tree for all aggregates returned in the first pass.

At present the ddb stores taxa fully hierarchically, but for each occurrence also caches its identity at species-or-above level. This caching allows for efficient comparison of occurrences at species-level (disregarding infra-specific clades), but doesn't help with aggregate resolution. To efficiently deal with aggregates an additional cache of 'aggregate-level' concept might be needed (which is more difficult, because a taxon can be part of multiple aggregates and aggregates can be nested). Even then I'd still need to do a second pass to test if aggregates where already represented at a species level - but that would be more efficient if a full tree walk could be avoided.

As well as explicit aggregates there are also some cases where species-level clades are children of species-clades e.g.

The Rosa canina x multiflora concept (a species-level hybrid with parents of indeterminate sex) includes the species-level child clades of:
Rosa canina x multiflora (f x m)
Rosa multiflora x canina (f x m)

implicitly it's an aggregate concept but without being marked as such.

I definitely hope to resolve this eventually, but at the moment I can't afford to make the queries used to compile county reports substantially slower than they are already.
Tom Humphrey
Database Officer, Botanical Society of Britain and Ireland (BSBI)
c/o Centre for Ecology and Hydrology,Maclean Building, Crowmarsh Gifford, Wallingford, Oxon, OX10 8BB, UK.
User avatar
Posts: 438
Joined: Tue Nov 20, 2012 4:16 pm
name: Tom Humphrey

Return to Bugs and suggestions