3 Data Structures
3.1 Table Relationships
This database encompasses several tables. The ERD for catch sampling schema within rlcs (Figure 1) shows the physical data model structure5 of the database and it’s entities (each entity is implemented as a database table) and relationships between these tables. Each table represents an object, event, or concept in the real world that has been represented in the database. Each attribute of a table is a defining property or quality of the table.
All of the table’s attributes are shown in the ER D. The underlined attributes represent the table’s primary key6. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Most of the tables in the rlcs database also contain special attributes, called foreign keys7.
Figure 1: Entity Relationship Diagram (ERD) for the catch sampling schema within the rlcs database
Section 5 shows a listing of all the rlcs tables as implemented by the Empress DBMS. As can be seen in the listing of the tables, a table’s primar y key has an unique index on it. Primary keys are generally listed using the format:
Indices: UNIQUE index_name ON (attribute [, attributes])
where the attribute(s) make up the primary key (the key attributes) and the index name is the primary key name. Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all. A unique index prevents records with duplicate key values from being inserted into the table; e.g., a sample with an existing sample number. Note that the table t_crayfish has no primary key.
The rlcs database is implemented as a relational database. That is, each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. All relationships in rlcs are of the type one-to-many8. This is shown in the ERD by connecting a single line (indicating ‘many’) from the child table; e.g., t_effort, to the parent table; e.g., t_sample, with an arrowhead (indicating ‘one’) pointing to the parent. For example, consider the relationship between the tables’ t_sample (the parent table) and t_effort (the child table). Any one trip in t_sample can have one or more stations in t_effort, but any one station can only be a part of one trip. Note that the word ‘many’ applies to the possible number of records another is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word ‘many’ to describe the association.
Note that the one-to-many relationships can be either mandatory or optional. The optional relationship, denoted in the ERD by the symbol “O” at one or both ends of the relationship line, means that a record does not have to have any associated records. Conversely, the mandatory relationship denoted in the
ERD by a bar symbol across the relationship line, means that a record has to have at least one associated record. For example, if we consider again the one-to-many relationship between the tables t_sample and t_effort, which has a mandatory ‘one’ and an optional ‘many’. This means that one sample record can have zero or more (many) unit efforts; e.g., pot lift, recorded in the effort tables, but one unit effort in the effort table must have one, and only one, associated record in the sample table.
These relationships are enforced in the database by the use of referential constraints9. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when:
- a parent record is deleted;
- the parent record is altered so the relationship is lost;
- or a child record is entered without a parent record.
All constraints in rlcs prevent the latter from occurring. Constraints are shown in the table listings by the following format:
| Referential: |
constraint_name (attribute[, attribute]) |
INSERT |
|
|
DELETE |
|
Parent_table (attribute[, attribute]) |
|
where a constraint can either be INSERT (prevents (iii) and (ii) from occurring) or DELETE (prevents (i) or (ii) from occurring). For example, consider the following constraint found in the table
data_sets:
For example, consider the following constraint found in the table t_effort:
Referential: cra pots loaded after cover (sample_no) INSERT t_sample (sample_no)
This means that the value of the attribute sample_no in the current record must already exist in the parent table t_sample or the record will be rejected and the following error message will be displayed:
*** User Error: insert constraint ‘pot catch loaded after cover’ violation
All tables in this database are indexed. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:
Indices: NORMAL (2, 15) index_name ON (attribute [, attributes])
Note that indices may be simple, pointing to one attribute, or composite pointing to more than one attribute. The numbers “…(2, 15)…” in th e syntax are Empress DBMS default values relating to the amount of space allocated to index storage.
3.2 Database Design
3.2.1 Rock lobster Sampling Base Tables
Conceptually, any rock lobster sampling program will involve at most 3 tables: a sample (t_sample, Table 1), which may or may not record effort data ( t_effort, Table 2), and generates data on individual rock lobster ( t_crayfish, Table 3). The attribute sample_type is included in each of these three tables to record which sampling program each record. At a physical implementation level, a fourth table, t_bait_codes (Table 4), is added to store all valid bait codes.
The ERD for these four base tables is shown in Figure 1.
Each type of sampling program can be accessed through views that filter the records for specific sample types and show attributes that are only relevant to that program.
Note that t_sample is only in a first normal form because the vessel name attribute vessel is not fully functionally dependent on the primary key sample_no. It is fully functionally dependent on vessel_no and should be in a vessel table with vessel_no as it’s primary key. However, in some instances only the vessel name is available or multiple vessels might be involved in a sample. It has therefore been decided to leave this table in a first normal form, realising that such a form can lead to update, insertion, and deletion anomalies. Regular data audits have been implemented to compensate for this.
3.2.2 Red Rock Lobster Catch Sampling Program
The ERD for this program is shown in Figure 2. The top-level view is v_lob_sample (View 1). This holds records for every catch sample taken, with each record denoted by the primary key sample_no. Other attributes in this table describe the who/what/where of the sample and the prevailing weather. The actual landed weight, if available, and the numbers of red rock lobsters landed to a LFR are also recorded. The number of red rock lobsters landed, as stored in the attribute keepers, can be obtained from the sum of all red rock lobsters with status equal to ‘1’ in view v_cra_lf only when the whole catch has been measured. As this is not always possible, a separate tally is kept to include those keepers from pots not measured.
From the 1993/94 fishing season, closed fishing seasons were introduced in some areas as part of a regional fisheries management plan. While normal fishing operations were banned for certain months of the year, catch sampling was permitted. The attribute season allows samples to be identified as being from an open or closed season. Note that season is an attribute common to v_lob_sample, v_cra_catch and v_cra_lf. In a third normal form database, this attribute should only be in v_lob_sample, as season is functionally dependent on only on sample_no. However, season was added to v_cra_catch and v_cra_lf to allow views of solely closed season data to be constructed on them.
The standard unit of effort is one pot lift and the effort details for CRA program are stored in the view v_cra_catch (View 2). For every catch sample, each pot lifted onboard is allocated a sequential pot (or set) number denoted by the attribute pot_no. When combined with the attribute sample_no this provides the primary key for v_cra_catch. For this catch sampling program, all rock lobsters caught in each pot are recorded in the attribute caught. If both red and packhorse rock lobsters are caught in one pot, caught represents the total of both species, and the attribute species is set to “RLM”. These effort tables all contain the attribute bait, which is a foreign key to link them to the table t_bait_codes. This table contains all bait codes used and their definitions.
An effort is made to measure all rock lobsters caught. These measurements result in records in the view v_cra_lf (View 3). Every rock lobster in any given pot is measured; however, not all pots lifted may be measured. The view v_cra_lf contains, as attributes, the primary keys for v_cra_catch, which enables each rock lobster that has been measured to be linked back to the pot from which it was caught. However, v_cra_lf does not have a unique primary key, as it is possible that any one pot may contain several rock lobsters with identical measurements.
Figure 2: ERD of the views for the red rock lobster catch sampling program.
3.2.3 Red Rock Lobster Market/Morphology Sampling
Since 1994 there has also been samples to exclusively collect morphology data including weight, tail width, and carapace length from individual red rock lobsters. Examples of this type of data are length and weight data collected either onboard a vessel or in a packing shed (as market samples). The ERD for this program is shown in Figure 3. Details for each morphometric/market sample are in the view v_morp_sample (View 4). Each sample should be from one fisher’s catch for one day of fishing. However, during market sampling, this has not always been possible, as catches may have been mixed while in the shed’s holding tanks.
The morphometric data for individual red rock lobsters are recorded in the view v_cra_wt (View 5).
Figure 3: ERD for the views of the market/morphology sampling program.
Figure 4: ERD of the views of the packhorse rock lobster catch sampling program.
3.2.4 Packhorse Rock Lobster Catch Sampling Program
During 1991/92, a special permit was granted to a commercial fisher to run a exploratory survey for packhorse rock lobsters. Observers in conjunction with this survey carried out a catch sampling program. The ERD for this program is shown in Figure 4.
The top-level view is v_phc_sample (View 6). This holds records for every catch sample taken, with each record denoted by the primary key sample_no.
Individual pot lift details were recorded for the first few samples of the packhorse rock lobster exploratory survey, but in subsequent samples the unit of effort was changed to a set of pot lifts, with 5-20 pot lifts per set. These effort data are stored in the view v_phc_catch (View 7) with the attribute set_no assigned sequentially to each set lifted and the attribute num_of_pots detailing the number of pots lifted per set. Where a record concerns a single pot lift the num_of_pots will be set to 1. For every catch sample, each pot (or set) lifted on board is allocated a sequential pot (or set) number denoted by the attribute set_no. When combined with the attribute sample_no this provides the primary key for v_phc_catch. The numbers of packhorse and red rock lobsters caught in each pot/set are stored in the attributes phc_caught and cra_caught respectively. These effort tables all contain the attribute bait, which is a foreign key to link them to the table t_bait_codes. This table contains all bait codes used and their definitions.
An effort is made to measure all packhorse rock lobsters caught. These measurements result in records in the table v_phc_lf (View 8). Every packhorse rock lobster in any given pot is measured; however, not all pots lifted may be measured.
The view v_phc_lf contains, as attributes, the primary keys for v_phc_catch so that each packhorse rock lobster measured can be linked back to the pot/set in which it was caught. However, v_phc_lf does not have a unique primary key, as it is possible that any one pot/set may contain several packhorse rock lobsters with identical measurements.
3.2.5 Minimum Legal Size (MLS) Investigation Sampling
There are two views, v_mls_sample (View 9) and v_mls_data (View 10), pertaining to historical (circa 1986) red rock lobster minimum legal size (MLS) investigation sampling. Details for each MLS sample are in v_mls_sample. The morphometric data for individual red rock lobsters are recorded in v_mls_data.
The ERD for this program is shown in Figure 5.
Figure 5: ERD of the views of the MLS Investigation sampling program.
3.2.6 Red Rock Lobster Tagging Sampling
This database can be linked to the tagging database tag via the table t_releases (attribute trip_code) in tag10 and the view v_tag_sample (View 11) (attribute sample_no) in rlcs. If the pot number has been recorded on the tagging form, then joining can also be done on station_code and pot_no in the respective databases. This would only be useful for getting general information on the day’s fishing, or information relating to a pot.
Figure 6: ERD of the rock lobster tagging program.
4 See local Unix manual page on checkq
5 Also known as database schema
6 A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.
7 A foreign key is any attribute, or a combination of attributes, in a table that is a primary key of another table. Tables are linked together through foreign keys.
8 A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one survey in t_sample can have many stations in t_effort but any one station can only come from one survey.
9Also known as integrity checks.
10 See WOOD, B. 1993: Marine Research database documentation. 10. Tag. MAF Fisheries Greta Point Internal Report No. 216. 13p.