Appendix 1 - Data entry, error checking, and loading
The data in regional_ce have come from companies with New
Zealand vessels fishing on the high seas and furnishing non-NZ Catch
Effort forms. Data from CCAMLR trips are supplied in electronic form.
These data from CCAMLR trips are not subject to the same level of
checking by NIWA, as would be expected if NIWA was supplied with the
raw data and was responsible for the data entry of these data. That
is the data from CCAMLR trips do not pass through the data entry
stages described here.
This section outlines the flow of paper-recorded data, namely for
the South Pacific Regional Purseseine
fishery, and defines the separate tasks that are required to do
this.
The South Pacific Regional Purse-seine data are supplied on forms
as either hand written paper or typed format. Each trip is assigned a
unique trip number and each set a sequential station number.
The date and time will also be recorded as part of the station
data.
1. Pre-key entry, visual checking and batching:
The data are then forwarded via the Ministry
of Fisheries, to a project team member, who checks the above, and
forwards the data to key entry.
2. Key entry of data:
At this point, trained data entry operators key in the data from
the collated forms to a electronic fixed format ASCII file format on
computer by keyboard entry. NIWA uses the KEYS Data Emulator for data
entry.
All data entry is verified, that is, each page of data are keyed
in twice and the two results are crosschecked for mismatches. Any
data entry operator errors are corrected at this point.
The electronic data files are transferred for error checking along
with the original raw data file. At this point the data are now ready
for error checking and formatting routines.
3. Data error checking, validation, and grooming:
Data files are put through a number of
computer error checking (validation) routines that look for
inaccuracies and inconsistencies within trips. Any errors detected
are corrected. Data are then passed through these error-checking
routines until the data reach a satisfactory standard that will allow
them to be inserted in the appropriate database tables.
The data are inserted into "working tables". This allows
further checks of the integrity of the data, by taking advantage of
relational databases ability to manipulate, match and compare related
sets of data.
4. "Groomed", validated data loaded to
database. Available for analysis:
The clean, groomed, and validated data are
inserted into the appropriate database (in this case regional_ce)
and now become available for extraction and analysis.
The clean electronic data files and raw paper data are then
archived for safekeeping.