Promoting Data Quality
The basic aim of the NBN is to share wildlife data and information. As the mechanisms for communicating data become more effective, the importance of being able to judge the quality of data increases.
Since its inception, the NBN has aimed to enable data of “known quality” to be shared, rather than data of a particular quality, because users of data will have varying needs. The way that it has approached this has been to require details of the way that a particular dataset has been compiled and checked to be incorporated in the metadata that accompanies the dataset, and to issue guidance on this (see Section: Compiling metadata).
However, the capacity in the NBN to mix data of different qualities from different sources highlights the need to promote good data quality wherever possible, and to clarify what this implies, both for suppliers of data and for users.
What makes good data?
All biological data are snapshots of the occurrence of one or more organisms at a place at a particular time. Recognising the nature of what the record represents is therefore vital. The NBN Trust has sought to promote awareness of the issues behind data quality in the first instance, and to focus on best practise, as well as developing specific tools to help. The principal issues are:
Accuracy of taxonomic identification
Precision over the location and associated information in the record
Clarity of the recording approach and methodology
Accuracy of producing and documenting the record
Quality of data transmission
The NBN Trust has examined these issues, in collaboration with colleagues, and aims to produce more detailed best practise for at least some aspects acceptable to those involved. The outcome of initial discussions can be seen in the NBN Networking Naturalists Seminar report “Data validation and verification”, and in the report of the NBN Conference for National Societies & Recording Schemes in 2004 “How real are your records? – exploring data quality”.
Accurate identification of species and habitat features is the basis of all good biological data. From discussions so far, it is clear that there are many issues which underpin the ability to put accurate names to observations. These include:
Knowledge and competence of the field observer
Ease of identification
Steps taken to verify the record
Availability of expertise
Availability of reference collections or literature
Training in identification or other techniques
Availability of organisational support
In addition, each of these is underlain by a raft of issues of time, manpower and resources, which vary from subject area to subject area. Some might be readily addressed by clarifying existing activity, while others might need further work or support.
Recognising the issues, tackling some of the organisational improvements needed to augment existing recording, and documenting the way any particular dataset might be affected by any of these are first steps in improving data quality.
Clarity over the approach needed to get meaningful data for a particular group is particularly important. Casual data on some groups may be all that can be expected, but systematic approaches reflecting the occurrence and habits of the group yield the most useful data. Clarifying these needs with each taxonomic group will be one way to improve data quality. Other factors which can readily improve data quality include promoting precision in record collection or sampling.
Another thing that underpins data quality is the process through which data goes in compiling and transmitting an electronic dataset. Processes include initial computerisation (and the systems used for this), together with subsequent processes and routines, such as automated validation. See Section: “Managing wildlife data – principles and best practice” for more detailed information on the way the NBN Trust is promoting these aspects.
The NBN Trust has produced best practice advice on Data Verification and Validation in its Improving Wildlife Data Quality guide. This has been developed by Trevor James in consultation with different interest groups, so as to take account of existing activity, and also the different needs and capacities which exist in different areas.
It is evident, however, that “one size does not fit all”, and that the best way forward is to make use of existing networks of expertise. For more difficult taxonomic groups, “peer review” will remain the most important mechanism for checking data quality. Transparency about the way that identifications are made and authenticated will be vital. Improved communication is also important.