Appendix 1 - Data Integrity

Data collection and data processing.

Data are collected on the standardised forms , which are shown in Appendix 3.

Books of forms are provided by the Pelagic Research Group of NIWA (i.e., the aerial sightings database manager), are filled out by the pilots and returned to Greta Point. For a full discussion of this topic see Taylor (in preparation).

Data entry, process, and definitions.

This section outlines the flow of paper recorded data, for aerial sightings data from field collection through to its availability to researchers for stock assessment analyses, and defines the separate tasks that are required to do this.

In this example pilots flying light aircraft collect hand written data. These data are recorded on paper forms.

At the completion of each flight the recorder ensures that all pages are in order and that all required data fields have been correctly filled. The data are then forwarded to a project team member (the client e.g., NIWA) responsible for checking the data prior to keypunching.

There are 5 clear steps in the data flow following its collection. These are listed and then discussed individually in detail.

  1. Pre-key punching, checking and batching.

  2. Key punching data entry.

  3. Electronic transfer of raw data flat files in disk and paper copy to client (i.e., project team).

  4. Data error checking (manual and computer), validation, and grooming.

  5. "Groomed", validated data loaded to database. Now available for analysis.



  1. Pre-key punching, visual checking and batching:

    The paper forms from each flight are checked for obvious gross errors or omissions and corrected if necessary. Forms are placed into batches and allocated a unique file name. The batches of raw data are sent for keypunching.
  2. Key punching data entry:

    At keypunching, the batches of raw data are digitised and verified by trained data entry operators. Verification simply means that the data are digitised twice and the two resulting files are crosschecked for mismatches. Operator errors are corrected at this point, and the completed digitised file is ready for transfer for the error checking process. At no point in this process are changes or interpretations made to the raw data. NIWA uses the KEYS Data Emulator for data entry.
  3. Electronic transfer of raw data flat files in disk and paper copy to client:

    The digitised data file is transferred for error checking, along with the original raw data file.

    At this point the data are now in a format that is compatible with the data processing routines.

  4. Data error checking, validation, and groomig:

    Data files are put through a number of computer error checking (validation) routines that look for inaccuracies and inconsistencies in the data. Any errors detected are corrected. Data are then passed through these error-checking routines until the data reach a satisfactory standard that will allow them to be inserted in the appropriate database.

    In some instances, data may be inserted into "working tables" in a database. This is often done to check the integrity of the data by taking advantage of relational databases ability to manipulate, match, and compare related sets of data. Details for this aerial sightings data are given below.

    1. Reformatting raw data files (file_1) using the programme "modify-raw" in the directory "/neptune/grp1/sightings/aer_sigh_wd/dat_load_wd/chek_progs"
      - this programme calculates some estimates of search effort and sightings tonnages, adds some extra attributes which increase the efficiency of links between tables and writes to an output file (file_2) in the appropriate format for running through the error checking programmes;
    2. Error checking using the checkq programme "as_check" in the same directory as above this programme checks file_2 for values out of range, for null values etc. and does some minor formatting including insertion of EMPRESS field delimiters before producing five output files, one for each of the main database tables (flight_group, flight, school_sight, set, flightpath), and one (as_errors) which contains error messages for the input file;
    3. Using as-errors, file_1 is then edited and steps 1 and 2 repeated - until satisfied that the errors have been corrected.
  5. "Groomed", validated data loaded to database. Available for analysis:

    The files of clean, groomed, and validated data are inserted into the appropriate database and now become available for analysis.

    The clean digitised data files and raw paper data are then archived for safekeeping.


Previous | Next
Updated : 16 November 2007