Appendix 1 - Data Integrity
Data collection and data processing.
Data are collected on the standardised forms , which are shown in
Books of forms are provided by the Pelagic Research Group of NIWA
(i.e., the aerial sightings database manager), are filled out by the
pilots and returned to Greta Point. For a full discussion of this
topic see Taylor (in preparation).
Data entry, process, and definitions.
This section outlines the flow of paper recorded data, for aerial
sightings data from field collection through to its availability to
researchers for stock assessment analyses, and defines the separate
tasks that are required to do this.
In this example pilots flying light aircraft collect hand written
data. These data are recorded on paper forms.
At the completion of each flight the recorder ensures that all
pages are in order and that all required data fields have been
correctly filled. The data are then forwarded to a project team
member (the client e.g., NIWA) responsible for checking the data
prior to keypunching.
There are 5 clear steps in the data flow following its collection.
These are listed and then discussed individually in detail.
Pre-key punching, checking and batching.
Key punching data entry.
Electronic transfer of raw data flat
files in disk and paper copy to client (i.e., project team).
Data error checking (manual and
computer), validation, and grooming.
"Groomed", validated data
loaded to database. Now available for analysis.
Pre-key punching, visual checking and
The paper forms from each flight are checked
for obvious gross errors or omissions and corrected if necessary.
Forms are placed into batches and allocated a unique file name. The
batches of raw data are sent for keypunching.
Key punching data entry:
At keypunching, the batches of raw data are
digitised and verified by trained data entry operators. Verification
simply means that the data are digitised twice and the two resulting
files are crosschecked for mismatches. Operator errors are corrected
at this point, and the completed digitised file is ready for transfer
for the error checking process. At no point in this process are
changes or interpretations made to the raw data. NIWA uses the KEYS
Data Emulator for data entry.
Electronic transfer of raw data flat files in
disk and paper copy to client:
The digitised data file is transferred
for error checking, along with the original raw data file.
At this point the data are now in a format that is compatible with
the data processing routines.
Data error checking, validation, and groomig:
Data files are put through a number of
computer error checking (validation) routines that look for
inaccuracies and inconsistencies in the data. Any errors detected are
corrected. Data are then passed through these error-checking routines
until the data reach a satisfactory standard that will allow them to
be inserted in the appropriate database.
In some instances, data may be inserted into "working tables"
in a database. This is often done to check the integrity of the data
by taking advantage of relational databases ability to manipulate,
match, and compare related sets of data. Details for this aerial
sightings data are given below.
- Reformatting raw data files (file_1) using the programme
"modify-raw" in the directory
- this programme calculates some estimates of search effort and
sightings tonnages, adds some extra attributes which increase the
efficiency of links between tables and writes to an output file
(file_2) in the appropriate format for running through the error
- Error checking using the checkq programme "as_check"
in the same directory as above this programme checks file_2 for values out of
range, for null values etc. and does some minor formatting including
insertion of EMPRESS field delimiters before producing five output
files, one for each of the main database tables (flight_group,
flight, school_sight, set, flightpath), and one (as_errors) which
contains error messages for the input file;
- Using as-errors, file_1 is then edited and
steps 1 and 2 repeated - until satisfied that the errors have been
"Groomed", validated data loaded
to database. Available for analysis:
The files of clean, groomed, and
validated data are inserted into the appropriate database and now
become available for analysis.
The clean digitised data files and raw paper data are then
archived for safekeeping.