3 Data Structures
3.1 Table relationships
This database contains several tables in 2 conceptually distinct schema for the boat ramp and diary surveys. The ERD for rec_data (Figure 1) shows the logical structure2 of the database and its entities (each entity is implemented as a database table), and the relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed.
Each table represents an object, event, or concept in the real world that has been represented in the database. Each attribute of a table is a defining property or quality of the table. All of the table’s attributes are shown in the ERD. The underlined attributes represent the table’s primary key3.
Note that Figure 1 shows the main tables only. Several of the tables in the rec_data database have foreign keys4 which contain standard NIWA/MFish fisheries codes, such as species. Foreign keys not only define the relationships between the tables in rec_data but also provide links to the rdb (research database) database, which contains the definitive list of these standard codes; e.g., species codes. An ERD for these tables (Figure 2) shows the relationships between rec_data and rdb.
All tables within external databases, such as those in rdb, are shown in the ERDs as being enclosed in dashed-line boxes.
The rec_data database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in rec_data: one-to-many5 These relationships can be seen in ERDs by connecting a single line (indicating ‘many’) from the child table; e.g., t_group, to the parent table; e.g., t_session, with an arrowhead (indicating ‘one’) pointing to the parent.
Every relationship has a mandatory or optional aspect to it. That is, if a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_group and it’s child table t_interview. The symbol ‘o’ by the child t_interview means that a group record can have zero or many interview records, while the bar by the parent t_group means that for every interview there must be a matching group record.
These links are enforced by referential constraints6 Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record. Constraints are shown in the table listings by the following format:
constraint name (attribute [, attribute] ) | INSERT |
parent table (attribute [, attribute] )
Note that the typographical convention for the above format is that square brackets  may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.
For example, consider the following constraint found in the table t_length:
invalid species (species) INSERT rdb : curr_spp (code)
This means that the value of the attribute species in the current record must already exist in the parent table curr_spp of the rdb database or the record will be rejected and the following message will be displayed:
*** User Error: insert constraint ‘invalid species’ violation
For tables residing in external databases, the parent table name will be prefixed by the name of the database.
Section 5 lists all the rec_data tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table’s primary key has an unique index on it. Primary keys are generally listed using the following format:
UNIQUE index_name ON (attribute[, attribute])
where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with a response number (resp) for that survey, in t_phone.
The database listing shows that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:
NORMAL (2, 15) index_name ON (attribute[, attribute])
Note that indices may be simple, pointing to one attribute or composite pointing to more that one attribute. The numbers “…(2, 15)…” in the syntax are Empress DBMS default values relating to the amount of space allocated for the index.
3.2 Database design
The structure of rec_data has 2 parallel branches of tables, which share some common code tables. The two ‘branches’ hold data on the 2 main survey types, boat ramp and diary surveys. The boat ramp tables also contain data from shellfish harvest surveys and observer surveys. In some years both boat ramp and diary surveys were conducted, for example 1996 when a national survey was conducted, or 1994 in the North region. In other years only one type of survey was conducted (in that region); for example, the North region boat ramp survey in 1991, or the South region diary survey in 1992.
3.2.1 Boat ramp and shellfish harvest tables
The ERD for these surveys is shown in Figure 3. For the boat ramp data the top-level table is t_session (Table 1, section 4.1). This holds information for each session at a boat ramp or beach etc. A session number, represented by the attribute sess_no, which is also the table’s primary key, identifies each record. This session number is a unique computer generated number assigned to each session at the time of loading the data to the rec_data database. The attribute survey is a foreign key that provides a link to t_survey_codes, and which can be used to distinguish shellfish harvest surveys from other surveys.
For the shellfish harvest data only, during the session, hourly counts were made of the number of pickers (or fishers) harvesting shellfish within the spatial strata into which the beach was divided. These data are stored in the table t_fisher_count (Table 2), with a primary key of sess_no, hr, and fish_loc. This table is a ‘dead end branch’ of the database structure; i.e., it has no child records. NB: The stratum number (i.e., ‘Strata No.’ as labelled on the data form) is recorded in the attribute fish_loc in the tables t_fisher_count and t_interview for the shellfish harvest data.
For each session, details about the group intercepted are stored in the table t_group, (Table 3) with a primary key of sess_no and group_no. Provided the interviewer is not busy interviewing another group, then the group is asked if they have been fishing, and if so if they consent to be interviewed. The intercept outcome; i.e., if the group were interviewed, not interviewed, or were engaged in other activities (water skiing, picnicking etc), is recorded along with the group type - namely the type of boat, or shore fishing. For most surveys the intercept time is recorded. This is defined as the time the group arrived at the boat ramp, or when the shellfish harvesters come off the beach.
Each interview from the group is stored in t_interview, with the attributes fisher_no and combo added to the primary key of t_group to become the primary key of this table. For the boat ramp surveys the interviews are conducted with each fisher separately, hence the attribute for fisher number. With the shellfish harvest surveys, the interviews are conducted with the group as a whole, and not separately by each fisher, and so a fisher number of -1 is assigned to these records.
The attribute combo, in conjunction with other attributes, is used to generate a primary key for interview records where a fisher used more than 1 combination, of target species, fishing method or location. For example, a fisher may have fished in the morning targeting snapper, and then in the afternoon targeted kahawai. Most fishing trips (i.e., 80-90%) are fairly standard and do not involve more than 1 combination. For the national 1996 boat ramp survey the combo was recoded as part of the checking and formatting process so that combo=A became combo=1, combo=B became combo=2 etc.
For the shellfish harvest survey data, an interviewer usually estimated and weighed the catch of each species harvested and these data are stored in the table t_weight. Species is added to the primary key of t_interview to create the primary key for t_weight. For the Kaikoura survey, catches as recorded at the boatramp are stored in t_weight.
Some recreational surveys included observations of fishing activity as part of the survey design. These observations were typically made from a distance, such as from an aircraft, a hill top or the coast observing fishing activity at sea. These observations have been made for surveys including those with the survey codes NOR94, KAI99, OTG01, KAI03 and STI03. Details of these observations are stored in the table t_observer. The table t_obs_count records the counts of respective fishing active in the attribute tally.
The table t_count_codes describes the attribute count_code in table t_obs_count, that is, the particular fishing activity being counted.
For both boat ramp and shellfish harvest data, individual fish or shellfish may be measured for length, or simply counted for some boat ramp data records, for example, if the fish was headed and gutted (see the attribute observd which is a foreign key to t_observd_codes, for the state of the fish recorded). These length or species count data are stored in the table t_length which does not have a primary key. The conceptual primary key would be sess_no, group_no, fisher_no, combo, species, lgth, and observd. As the attribute for fish length, lgth may legitimately be null if a species is only counted, it is not possible to enforce a primary key. This table holds individual weights for fish from the North region 1994 boat ramp survey.
3.2.2 Diary survey tables
The ERD for the diary surveys is shown in Figure 4. For these surveys, the table t_phone is the top-level table, which has a primary key of survey and resp where resp is the (potential) diarist’s respondent number. This table includes details about the person’s fishing practices over the past year and personal details including their sex, age, and ethnic group.
For the Central region diary survey in 1993 a supplementary telephone questionnaire was used asking details about scallop fishing. These data are stored in the table t_sca.
The table t_response holds data for the response status of each respondent; i.e., if the diarist went fishing in the quarter concerned etc. The primary key for this table is key, with a natural primary key of resp, survey, year_s and month_s.
The main diary survey table is t_effort, which has a primary key of trip_no. The trip number is an unique computer generated number assigned at the data checking and formatting stage. This table has four foreign keys to code tables in this database, all of which are shared with the boat ramp table structure.
The effort comprising a fishing trip may result in a catch, the results of which are stored in t_catch. Species is added to the primary key of t_effort to become the primary key for this table.
For the national 1996 diary survey a subset of diarists were asked to measure their catch of snapper, kahawai, and blue cod. These lengths are stored in t_lgth. Since the diarists were only asked to record the date of the trip that caught these species measured, and as some diarists occasionally make 2 or more ‘trips’ per day; e.g., by using different methods, it was not always possible to assign a trip number to records in t_lgth. Hence this table does not have a primary key, and there are some orphaned records in this table. Most records have a trip number assigned and can be joined to t_catch using the attributes trip_no and species, or trip_no to join to records in t_effort. The diarists in the Kaikoura survey also recorded lengths of their fish which are stored in t_lgth.
There are a number of views particularly of the tables t_effort and t_catch, with more or less 1 view per survey on each of these 2 tables. For example, v_sou92_effort filters the records of t_effort for those records where survey = ‘SOU92’. With v_sou92_catch providing the corresponding filter for the catch records. Similar views exist for the Central, North, and national surveys. These views only show the data available for any particular survey; i.e., they exclude attributes for which there are no data. For example, the attribute SCAarea is only shown in the CEN94 view, as this survey is the only one that collected these data.
Users are recommended to use these views, particularly if they want to extract data from one of these diary surveys, as these views in some cases also filter out records that are generally invalid for analysis; e.g. outside the date range.
3.2.3 Common code tables
Both table structures share 6 common code tables that define the codes used for the respective attributes in the main data tables. The table t_survey_codes describes the codes used for the attribute survey.
The table t_zonef_codes defines the fish_zone attribute in t_effort and t_interview. These fishing zones were defined for each diary survey, as subdivisions of Quota Management Areas (QMAs). These fishing zones were often assigned in each corresponding boat ramp survey where applicable. See Appendix 4 for maps of the fishing zones as used in the various surveys.
The table t_locality_codes holds descriptions and the fish_zone for the attribute fish_loc. This attribute is a 3-character code used to assign fishing locations, mostly by boat ramp interviewers, to small sections of coastline. Some of the trips in the national 1996 diary survey were assigned to a fish_loc based on the fishing locality descriptions given by the diarists.
The various surveys have led to the creation of several fishing method coding series, which are documented in t_fishmeth_codes. The attribute meth_type identifies which coding series is used for each fishing method as stored in the attribute fish_meth. The North region boat ramp surveys have used 2 character codes for fish_meth, while most of the other surveys have used a 2 numeric code. The North region diary survey separated the boat type out from the fishing method and recorded this separately (in t_effort.boat) whereas most method types include boat as part of the fishing method.
The table t_ramp_codes has the description for each ramp code, and the fishing zone from the national 1996 survey in which each ramp is located. This table documents the ramp attribute in the tables t_session and t_effort. For the North region boat ramp surveys a series of 2 character codes were used. These codes represent a coastal location with most of these 2 character codes representing boat ramps. A small number of codes are specifically for marinas. For the shellfish harvest survey ‘ramp’ codes were assigned to beaches surveyed. An additional attribute, iv_type in t_session has codes for: ramp, beach, roving boat, other, or marina as 1 to 5 respectively. For the national 1996 diary survey diarists were asked to specify the ‘boat departure point’. This description was assigned to a ‘ramp’ code, either one of the existing 2 character codes from the North region or a new 3 character code.
The table t_boat_codes documents the codes used with the attributes boat and group_type. Most of these codes have a 1 to 1 relationship between boat and group_type. The attribute boat describes the boat type used in the diary survey, currently only for the NOR94 survey. The boat type has been combined with the fishing method in other surveys. The attribute group_type describes the boat type, or if shore fishing, for the boat ramp surveys.