3 Data Structures
3.1 Table relationships
This database contains a number of tables. The ERD for regional_ce
(Figure 1) shows the logical structure [Also known as a database
schema.] of the database and it's entities (each entity is
implemented as a database table) and the relationships between these
tables. This schema is valid regardless of the database system
chosen, and it can remain correct even if the Database Management
System (DBMS) is changed. Each table represents an object, event, or
concept in the real world that is selected to be represented in the
database. Each attribute of a table is a defining property or quality
of the table. All of the table's attributes are shown in the ERD. The
underlined attributes represent the table's primary key [A primary
key is an attribute or a combination of attributes that contains an
unique value to identify that record.].
Note that Figure 1 shows the main tables only. Note that most
tables contain foreign keys [A foreign key is an attribute or a
combination of attributes that is a primary key in another table.].
These foreign keys define the relationships between the tables in
regional_ce.
The regional_ce database is implemented as a relational
database; i.e., each table is a special case of the mathematical
construct known as a relation and hence elementary relation theory is
used to deal with the data within tables and the relationships
between them. There are three types of relationships possible between
tables, but only one exists in regional_ce: one-to-many [A
one-to-many relationship is where one record (the parent) in a table
relates to one or many records (the child) in another table; e.g.,
one fishing event in t_effort can have many catches in t_catch
but one catch can only come from one fishing event. ]. These
relationships can be seen in ERDs by connecting a single line
(indicating "many") from the child table; e.g., t_catch,
to the parent table; e.g., t_effort, with an arrowhead
(indicating "one") pointing to the parent.
Every relationship has a mandatory or optional aspect to it. If a
relationship is mandatory, then it has to occur at least once, while
an optional relationship might not occur at all. For example, in
Figure 1, consider that relationship between the table t_effort
and it's child table t_catch. The symbol "O" by the
child t_catch means that a fishing event can have zero or many
catch records, while the bar by the parent t_effort means that
for every catch there must be a matching fishing event record.
These links are enforced by referential constraints [Also known as
integrity checks.]. Constraints do not allow orphans to exist in any
table; i.e., where a child record exists without a related parent
record. This may happen when: a parent record is deleted; the parent
record is altered so the relationship is lost; or a child record is
entered without a parent record.
Constraints are shown in the table listings by the following
format:
|
Referential:
|
constraint name (attribute[, attribute])
|
|INSERT|
|DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
Note that the typographical convention for the above format is
that square brackets "[ ]" may contain more than one item
or none at all. Items stacked between vertical lines | | are options
of which one must be chosen.
For example, consider the following constraint found in the table
t_catch:
|
Referential:
|
Invalid trip stn ver (trip, station_no, version_no)
INSERT t_effort (trip, station_no, version_no)
|
This means that the value of the combination of attributes trip,
station_no and version_no in the current record must
already exist in the parent table t_effort or the record will
be rejected and the following message will be displayed:
*** User Error: insert constraint "Invalid trip stn ver"
violation
Section 5 lists all the regional_ce tables as implemented
by the Empress RDBMS. As can be seen in the listing of the tables, a
table's primary key has an unique index on it. Primary keys are
generally listed using the following format:
|
Indices:
|
UNIQUE index_name ON (attribute[, attribute])
|
where attribute(s) make up the primary key and the index name is
the primary key name. These prevent records with duplicate keys from
being inserted into the tables; e.g., a record with an existing event
key.
The database listing (Tables 1-8) show that the tables also have
indices on many attributes. That is, attributes that are most likely
to be used as a searching key have like values linked together to
speed up searches. These indices are listed using the following
format:
|
Indices:
|
NORMAL (2, 15) index_name ON (attribute[, attribute])
|
Note that indices may be simple, pointing to one attribute or
composite pointing to more than one attribute. The numbers "...(2,
15)..." in the syntax are Empress RDBMS default values relating
to the amount of space allocated for the index.
3.2 Database design
The design of the regional_ce database incorporates table
structures and field names to allow data extracts to be compatible
with data extracted from the Ministry of Fisheries Catch Effort
warehou database. Each vessel is identified by the same unique
vessel_key attribute, as would be used in the warehou
database. The same concept of a trip is used, with the start and end
dates of trips being provided by fishers, to define a trip. The trip
attribute is assigned a "trip key" to link all records in
the various tables for a single trip. All the main tables have a
version_no field, to allow for meeting any requirements to
update existing data supplied by fishers, although at the time of
writing, only one version exists for all forms entered into the
database.
The data from both the South Pacific Regional Purse-seine fishery
and the CCAMLR electronic logbooks are submitted on a trip basis. The
table t_trip stores all the constant information for a trip in
one record, these details can be linked with various other tables in
the regional_ce database for a given trip, using the trip
(trip number) attribute. For the CCAMLR data set the trip number
assigned by the observer programme has been used. The South Pacific
Regional Purse-seine data do not have distinctive trip numbers
associated with the individual trips received, therefore a system
assigned number (started from 10 001) is issued in sequence of the
data being loaded.
Each Purse-seine form contains header information that is constant
on a trip basis, (repeated on each page used), except the "fishing
permit or licence number" which will change if more than one
permit fished within a trip, therefore all the header data is stored
in the t_trip table, except for the "fishing permit"
that is stored in a separate t_permits table. The permit
number is linked by the trip attribute and to the catch, effort
records by a system assigned "group number", stored in the
grp_no attribute. The grp_no represents a group of
stations recorded on a form, covered by the particular "permit
number", as held in the permit_no attribute of the
t_permits table. In this context, a group can consist of one
or more pages, as labelled page _of_ on the paper
forms.
The table t_trip_comment is used to store any
remarks or comments relating to the overall trip.
The second main table is t_effort. Details stored include:
dates, times, and location of fishing events, the fishing method used
and various physical parameters about the gear used (e.g., no of
hooks used, length of line, etc.). The Purse-seine logsheet includes
an activity code that indicates either a fishing set was made, or
defines the main activity for the day, if no fishing sets have been
made. When the activity (stored in the activity attribute in
t_effort), shows a fishing set has not been made by the vessel
during that day, the position recorded should be the latitude and
longitude at 01:00 UTC on the South Pacific Regional Purse-seine
logsheet (form type = PUR) or noon for Kiribati waters (form type =
KIR). Because t_effort has to handle several fishing types,
the meaning of various t_effort attributes may change
depending on the form type and fishing method used. In a number of
cases in the t_effort table, attributes will only be populated
for one method, e.g. num_hooks is only populated for the
Bottom Longline data set. Further to this, some Bottom Longline
effort data were collected for restricted periods only, e.g. Set
direction for both the start and end of set was collected for a
limited numbers of trips only (one season), most trips do not have an
end set direction recorded. A method can be reported on several form
types, hence the form_type is stored in the database, as is
the case for the Purse-seine method.
Details of the catch by species, recorded by the vessel, are
stored in the t_catch table. Details include species code, a
catch weight and the number caught where this is provided. The status
attribute is used to indicate the destination type of the fish
recorded on this record for the set. For Purse-seine catch the
discard code (stored in the attribute status), incorporates the
reason for the discard, this is coded 1, 2, 3, 4 or 5 as defined in
the appendix.
The species codes used in the CCAMLR catch data as supplied differ
from the standard NZ species codes used by the Ministry of Fisheries
and NIWA. For instance the CCAMLR code for Patagonian toothfish
(Dissostichus eleginoides) is TOP, while the NZ code is PTO and TOP
is the NZ code for Pale Toadfish (Ambophthalmos angustus). The
original CCAMLR species code is retained in the t_catch table
in the sp_ccamlr attribute.
Landing details are stored in the table t_landings (Table
4). These details come from the South Pacific Regional Purse-seine
PUR and KIR form types. Details include species, and landed weight.
Landings for a single species may be split, for example between
several canneries or destinations, therefore a system generated
number is assigned to uniquely identify each landing record, this key
is stored in the attribute land_key. The CCAMLR electronic
logbook data for NZ bottom longline vessels does not include any
landings information.