Skip Navigation LinksHome > Research Opportunities > Research Database Documentation > regional_ce > Appendix 1 - Data entry, error checking, and loading

Appendix 1 - Data entry, error checking, and loading

The data in regional_ce have come from companies with New Zealand vessels fishing on the high seas and furnishing non-NZ Catch Effort forms. Data from CCAMLR trips are supplied in electronic form. These data from CCAMLR trips are not subject to the same level of checking by NIWA, as would be expected if NIWA was supplied with the raw data and was responsible for the data entry of these data. That is the data from CCAMLR trips do not pass through the data entry stages described here.

This section outlines the flow of paper-recorded data, namely for the South Pacific Regional Purseseine

fishery, and defines the separate tasks that are required to do this.

The South Pacific Regional Purse-seine data are supplied on forms as either hand written paper or typed format. Each trip is assigned a unique trip number and each set a sequential station number.

The date and time will also be recorded as part of the station data.

1. Pre-key entry, visual checking and batching:

The data are then forwarded via the Ministry of Fisheries, to a project team member, who checks the above, and forwards the data to key entry.

2. Key entry of data:

At this point, trained data entry operators key in the data from the collated forms to a electronic fixed format ASCII file format on computer by keyboard entry. NIWA uses the KEYS Data Emulator for data entry.

All data entry is verified, that is, each page of data are keyed in twice and the two results are crosschecked for mismatches. Any data entry operator errors are corrected at this point.

The electronic data files are transferred for error checking along with the original raw data file. At this point the data are now ready for error checking and formatting routines.

3. Data error checking, validation, and grooming:

Data files are put through a number of computer error checking (validation) routines that look for inaccuracies and inconsistencies within trips. Any errors detected are corrected. Data are then passed through these error-checking routines until the data reach a satisfactory standard that will allow them to be inserted in the appropriate database tables.

The data are inserted into "working tables". This allows further checks of the integrity of the data, by taking advantage of relational databases ability to manipulate, match and compare related sets of data.

4. "Groomed", validated data loaded to database. Available for analysis:

The clean, groomed, and validated data are inserted into the appropriate database (in this case regional_ce) and now become available for extraction and analysis.

The clean electronic data files and raw paper data are then archived for safekeeping.


Updated : 16 November 2007