Appendix 2 - Data entry, error checking, and loading
The data in nonfish_bycatch have come from the fishing industry.
This section outlines the flow of paper-recorded data, from
collection through to its availability to researchers for analysis,
and defines the separate tasks that are required to do this.
In summary, the nonfish_bycatch data are recorded on hand written
paper forms. Each trip is identified by a unique trip_key, each tow
or set by a unique station_key, and each capture by a unique
catch_key.
1. Pre-key entry, visual checking and batching:
The data are forwarded via Mfish, to a project
team member, who checks the forms, and forwards the data to key
entry.
2. Key entry of data:
At this point, trained data entry operators key in the data from
the collated forms to an electronic fixed format ASCII file format on
computer by keyboard entry. NIWA uses the KEYS Data Emulator for data
entry.
All data entry is verified, that is, each page of data is keyed in
twice and the two results are cross-checked for mismatches. Any data
entry operator errors are corrected at this point.
The electronic data files are transferred for error checking along
with the original raw data file. At this point the data are now ready
for error checking and formatting routines.
3. Data error checking, validation, and grooming:
Data files are put through a number of computer error checking
(validation) routines that look for inaccuracies and inconsistencies
within trips. Any errors detected are corrected. Data are then passed
through these errorchecking routines until the data reach a
satisfactory standard that will allow them to be inserted in the
appropriate database tables.
The data are inserted into "working tables". This allows
further checks of the integrity of the data, by taking advantage of
relational databases ability to manipulate, match and compare related
sets of data.
4. "Groomed", validated data loaded to
database.
Available for analysis: The clean, groomed, and validated data are
inserted into the appropriate database (in this case nonfish_bycatch)
and now become available for extraction and analysis.
The clean electronic data files and raw paper data are then
archived for safekeeping.