3 Data Structures

3.1 Table relationships

This database contains several tables. The ERD for pseine (Figure 1) shows the logical structure [Also known as a database schema.] of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is selected to be represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains a unique value to identify that record.].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in pseine.

The pseine database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in pseine: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one unit of effort in effort can have many catches in catch but one catch can only come from one effort.]. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., catch, to the parent table; e.g., effort, with an arrowhead (indicating "one") pointing to the parent. Note that the word "many" applies to the possible number of records in one table that one record in another table is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

Every relationship has a mandatory or optional aspect to it. If a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table effort and it's child table catch. The symbol "O" by the child catch means that a unit effort can have zero or many catch records, while the bar by the parent effort means that for every catch there must be a matching effort record. So it is possible for a unit of effort to produce ;No ;catch.

Note that the word "many" applies to the possible number of records another is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

These links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record.

Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])




Note that the typographical convention for the above format is that square brackets "[ ]" may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table catch:

Referential: ;No ;such effort id (effort_id) INSERT effort (effort_id)



This means that the value of the attribute effort_id in the current record must already exist in the parent table effort or the record will be rejected and the following message will be displayed:



*** User Error: insert constraint ";No ;such effort id" violation



For tables residing in external databases, the parent table name will be prefixed by the name of the database.

Section 5 lists all the pseine tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the following format:

Indices: ;UNIQUE index_name ON (attribute[, attribute])



where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing effort identifier.

The database listing (Tables 1-5) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices: ;NORMAL (2, 15) index_name ON (attribute[, attribute])



Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)..." in the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.


3.2 Database design

Purse seine fishery data is source from the Ministry of Fisheries Catch and Effort database. Raw data are loaded into 3 main tables where further data grooming takes place. These tables are based on three basic entities which result from the Catch Effort Landing Return (CELR) form that fishers are required to complete at the end of each days fishing; i.e., fishing effort, catch, and landing.

The first of these tables is effort (Table 1). An effort record is the details of a number of purse seine shots taken within a 24-hour period. Each record is identified by a primary key effort_id. Fishing vessels are identified with a numeric code as provided by Ministry of Fisheries. Details stored within effort include: date, position or statistical area code, total search time during the day, the number of purse seine shots per day, total time spent fishing, target species, and total catch weight of all species combined.

Next, the table catch (Table 2), simply records the greenweight of each species caught by each effort record (as identified by effort_id).

At the end of a fishing trip, the catch is landing to a Licensed Fish Receiver. These LFR landing records are stored in the table landing (Table 3). Each landing record is identified by the vessel id and landing date. Each landing record contains the greenweight from a fishing trip for each quota code (a combination of species code and QMA code) and processed state.

It should be reiterated that these data in these tables are raw data loaded directly from the Ministry of Fisheries database. As such many of them contain inconsistencies, invalid fields, and duplicate records. These three tables are useful in that they allow researchers to go back to the raw data for a repeat analysis or a reinterpretation of values.

The final table catch_eff (Table 4) is a de-normalised amalgam of the first three tables (effort, catch, and landing) and contain cleaned and groomed data. As only the table catch_eff stores cleaned data, it is the only table to have relationships to the code tables area_codes and species_master in the rdb database (Figure 2). One catch_eff record represents one unit of effort (as per the effort table). Each record is identified by a combination of vessel identifier and catch date (not an arbitrary effort identifier as in the effort table). Details recorded include: catch date, position or statistical area code, total search time during the day, the number of purse seine shots per day, total time spend fishing during the day, target species, and total catch weight. For each day of purse seine effort, the top five species by catch weight have been calculated from the catch table and are stored into the attributes species1 to species5 and catch1 to catch5. As a result of kahawai being the main focus of this database, the actual kahawai catch landed for each unit of effort is recorded in the attribute kah_landing.



Updated : 16 November 2007