Appendix 1 - Data entry, error checking, and loading
This section outlines the flow of paper-recorded data for rock lobster data from field collection through to its availability to researchers for stock assessment analyses, and defines the separate tasks that are required to do this.
In this example, samplers working on a vessel or in processing facilities on shore collect data. These data are recorded on waterproof forms. Each sample is unique and is given a sample number that can be linked to every pot lifted and every rock lobster measured.
At the completion of each sample, the recorder ensures that all pages have a sample number, are numbered sequentially, are in order and have all the required data fields completed. The data are then forwarded to a project team member who checks the above, checks all data are legible, registers receipt of the data and forwards them to key punching.
There are 5 steps in the data flow from its collection to availability for analysis:
1. Data collection
- Data are collected11 on the following forms (Appendix 3):
- Cover Sheet: RTAG22/RLCS22
- Length Frequency Form: Typically RLCS35 for catch sampling.
- Pot Catch Form: RLCS21
- Tagging and Release Form: RTAG42 - can by used if tagging is done in conjunction with catch sampling, although typically RTAG42 and RLCS35 will be used separately in a sample.
- Examples of other forms occasionally used are also included in Appendix 3.
2. Pre-key punching, checking and collating
Paper forms from each sample are visually checked for obvious errors or omissions, corrected, and collated together with paper forms from other samples from the same fishing area, allocated a file name; e.g., cs***97, where cs = catch sampling, *** = unique sample no, and 97 = year.
3. Key punching data entry
At this point trained data entry operators keypunch the collated forms to a fixed-field ASCII file format on computer by keyboard entry. NIWA uses the KEYS data emulator package.
All data entry is verified, that is, each page of data is key punched twice and the two results are crosschecked electronically for mismatches. Any data entry operator errors are corrected at this point. This is an important step, as data entry errors can constitute a major source of all data errors.The digitised data files are transferred back to the client, along with the original raw data files. Data are now ready for error checking and validation routines.
If the client requires unvalidated data, a disk copy of the digitised data will be returned to the client, along with a hard copy print out together with the original raw data.
If validation is required then the data go through the next step.
4. Data error checking (manual and computer) and validation (“grooming”)
Here we define “grooming” as:
The process by which digitised data files are checked for validation errors (is value A within valid range?), data integrity (given that value A is valid, and value B is valid, does B make sense given A?), and the file structure is manipulated in preparation for insertion into the database.
The individual data files are now put through a number of computer error checking (validation) routines that look for inconsistencies within the sample and check ranges of data within set limits. Errors are corrected. This part of the process also accommodates real changes in data; e.g., a new bait or pot type, and a split in fishing area for the one sample. Changes can be made to the validation routines if required, and to the definitions in the database. Data are then run through these checking routines until all detected errors have been eliminated and changes updated. These “groomed” data files are then deemed to be of a sufficient standard to load into the rlcs database. The groomed data file is given a .dat suffix; e.g., cs12397.dat.
5. Groomed data loaded to database. Available for analysis
The groomed data are now loaded into the database. At this point the data become available for analyses.
The .dat file, along with the digitised data file and the paper raw data are all then archived for safekeeping.
11 See the Rock Lobster Catch Sampling Manual for more details.