Appendix 1 - Data entry, error checking, and loading
The data in rec_data have come from various sources. The database was created in 1996, and holds data from earlier surveys, currently back to 1991. These earlier data were supplied in electronic form and are assumed to be checked by researchers working with the data at the time.
Other research providers under contract to the Ministry of Fisheries are still supplying data.
These data are not all subject to the same level of checking by NIWA, as would be expected if NIWA was supplied with the raw data and was responsible for the data entry and checking of these data.
This section outlines the flow of paper recorded data, for recreational fishing data from collection through to its availability to researchers for analysis, and defines the separate tasks that are required to do this.
In this example, interviewers at boat ramps collect hand written data. These data are recorded on paper forms. Each session is identified by it's ramp code, session date, and if more than 1 session that day, by it's time of day code. This session will later be assigned an unique number by the checking and formatting software prior to loading to the database.
1. Pre-key punching, visual checking and batching:
At the completion of each session the interviewer should ensure that all pages are in order, and that all required data fields have been correctly filled out. The data are then forwarded to a project team member who checks the above, and forwards the data to key punching.
2. Key punching data entry:
At this point, trained data entry operators key-punch the data from the collated forms to a digitised fixed format ASCII file format on computer by keyboard entry. NIWA uses the KEYS Data Emulator for data entry.
All data entry is verified, that is, each page of data is key punched twice and the two results are cross-checked for mismatches. Any data entry operator errors are corrected at this point.
The digitised data files are transferred for error checking along with the original raw data file.
At this point the data are now ready for error checking and formatting routines.
3. Data error checking, validation, and grooming:
Data files are put through a number of computer error checking (validation) routines that look for inaccuracies and inconsistencies within sessions. Any errors detected are corrected. Data are then passed through these error-checking routines until the data reach a satisfactory standard that will allow them to be inserted in the appropriate database tables.
The data are usually inserted into "working tables" in a database. This is done to check the integrity of the data by taking advantage of relational databases ability to manipulate, match and compare related sets of data.
4. "Groomed", validated data loaded to database. Available for analysis:
The clean, groomed, and validated data are inserted into the appropriate database and now become available for analysis.
The clean digitised data files and raw paper data are then archived for safekeeping.