3 Data Structures

3.1 Table relationships

This database contains a number of tables. The ERD for regional_ce (Figure 1) shows the logical structure [Also known as a database schema.] of the database and it's entities (each entity is implemented as a database table) and the relationships between these tables. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is selected to be represented in the database. Each attribute of a table is a defining property or quality of the table. All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in regional_ce.

The regional_ce database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in regional_ce: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one fishing event in t_effort can have many catches in t_catch but one catch can only come from one fishing event. ]. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., t_catch, to the parent table; e.g., t_effort, with an arrowhead (indicating "one") pointing to the parent.

Every relationship has a mandatory or optional aspect to it. If a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_effort and it's child table t_catch. The symbol "O" by the child t_catch means that a fishing event can have zero or many catch records, while the bar by the parent t_effort means that for every catch there must be a matching fishing event record.

These links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record.

Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])




Note that the typographical convention for the above format is that square brackets "[ ]" may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table t_catch:

Referential:

Invalid trip stn ver (trip, station_no, version_no) INSERT t_effort (trip, station_no, version_no)



This means that the value of the combination of attributes trip, station_no and version_no in the current record must already exist in the parent table t_effort or the record will be rejected and the following message will be displayed:

*** User Error: insert constraint "Invalid trip stn ver" violation



Section 5 lists all the regional_ce tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the following format:

Indices:

UNIQUE index_name ON (attribute[, attribute])



where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing event key.

The database listing (Tables 1-8) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices:

NORMAL (2, 15) index_name ON (attribute[, attribute])



Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)..." in the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.



3.2 Database design

The design of the regional_ce database incorporates table structures and field names to allow data extracts to be compatible with data extracted from the Ministry of Fisheries Catch Effort warehou database. Each vessel is identified by the same unique vessel_key attribute, as would be used in the warehou database. The same concept of a trip is used, with the start and end dates of trips being provided by fishers, to define a trip. The trip attribute is assigned a "trip key" to link all records in the various tables for a single trip. All the main tables have a version_no field, to allow for meeting any requirements to update existing data supplied by fishers, although at the time of writing, only one version exists for all forms entered into the database.

The data from both the South Pacific Regional Purse-seine fishery and the CCAMLR electronic logbooks are submitted on a trip basis. The table t_trip stores all the constant information for a trip in one record, these details can be linked with various other tables in the regional_ce database for a given trip, using the trip (trip number) attribute. For the CCAMLR data set the trip number assigned by the observer programme has been used. The South Pacific Regional Purse-seine data do not have distinctive trip numbers associated with the individual trips received, therefore a system assigned number (started from 10 001) is issued in sequence of the data being loaded.

Each Purse-seine form contains header information that is constant on a trip basis, (repeated on each page used), except the "fishing permit or licence number" which will change if more than one permit fished within a trip, therefore all the header data is stored in the t_trip table, except for the "fishing permit" that is stored in a separate t_permits table. The permit number is linked by the trip attribute and to the catch, effort records by a system assigned "group number", stored in the grp_no attribute. The grp_no represents a group of stations recorded on a form, covered by the particular "permit number", as held in the permit_no attribute of the t_permits table. In this context, a group can consist of one or more pages, as labelled page _of_ on the paper forms.

The table t_trip_comment is used to store any remarks or comments relating to the overall trip.

The second main table is t_effort. Details stored include: dates, times, and location of fishing events, the fishing method used and various physical parameters about the gear used (e.g., no of hooks used, length of line, etc.). The Purse-seine logsheet includes an activity code that indicates either a fishing set was made, or defines the main activity for the day, if no fishing sets have been made. When the activity (stored in the activity attribute in t_effort), shows a fishing set has not been made by the vessel during that day, the position recorded should be the latitude and longitude at 01:00 UTC on the South Pacific Regional Purse-seine logsheet (form type = PUR) or noon for Kiribati waters (form type = KIR). Because t_effort has to handle several fishing types, the meaning of various t_effort attributes may change depending on the form type and fishing method used. In a number of cases in the t_effort table, attributes will only be populated for one method, e.g. num_hooks is only populated for the Bottom Longline data set. Further to this, some Bottom Longline effort data were collected for restricted periods only, e.g. Set direction for both the start and end of set was collected for a limited numbers of trips only (one season), most trips do not have an end set direction recorded. A method can be reported on several form types, hence the form_type is stored in the database, as is the case for the Purse-seine method.

Details of the catch by species, recorded by the vessel, are stored in the t_catch table. Details include species code, a catch weight and the number caught where this is provided. The status attribute is used to indicate the destination type of the fish recorded on this record for the set. For Purse-seine catch the discard code (stored in the attribute status), incorporates the reason for the discard, this is coded 1, 2, 3, 4 or 5 as defined in the appendix.

The species codes used in the CCAMLR catch data as supplied differ from the standard NZ species codes used by the Ministry of Fisheries and NIWA. For instance the CCAMLR code for Patagonian toothfish (Dissostichus eleginoides) is TOP, while the NZ code is PTO and TOP is the NZ code for Pale Toadfish (Ambophthalmos angustus). The original CCAMLR species code is retained in the t_catch table in the sp_ccamlr attribute.

Landing details are stored in the table t_landings (Table 4). These details come from the South Pacific Regional Purse-seine PUR and KIR form types. Details include species, and landed weight.

Landings for a single species may be split, for example between several canneries or destinations, therefore a system generated number is assigned to uniquely identify each landing record, this key is stored in the attribute land_key. The CCAMLR electronic logbook data for NZ bottom longline vessels does not include any landings information.


Updated : 16 November 2007