3 Data Structures

Initially, plankton surveys were included as part of the trawl database. However, it soon became apparent that the trawl data structures were not suitable for plankton catch data.

Therefore, a separate database, plankton, was created that, at the conceptual level, is very similar to trawl [Mackay, K. 1998: Marine Research database documentation. 6. trawl. NIWA Greta Point Internal Report No. 6. 35p.], but has attributes specific to plankton catches. This has the advantage of allowing easy access to related trawl survey data.

3.1 Table Relationships

This database contains several tables. The ERD for plankton (Figure 1) shows the physical data model structure [Also known as a database schema.] of the database and its entities (each entity is implemented as a database table) and relationships between these tables. Each table represents an object, event, or concept in the real world that has been represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unqiue value to identify that record.]. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed.

Note that Figure 1 shows the main tables only. Some of the tables in the plankton database have attributes, called foreign keys [A foreign key is any attribute, or a combination of attributes, in a table that is a primary key of another table. Tables are linked together through foreign keys.], which contain standard NIWA fisheries codes, such as species. These attributes provide links to tables in plankton and the rdb (research database) database, which contains the definitive list of standard codes. An expanded ERD for these tables is shown in Figure 2.

Section 5 shows a listing of all the plankton tables as implemented by the Empress DBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the format:

Indices: ;UNIQUE index_name ON ( attribute [ , attributes ])

where the attribute(s) make up the primary key and the index name is the primary key name. Note that the typographical convention for the above (and subsequent) format is the square brackets [ ] may contain an item that is repeated zero or more times.

This unique index prevents records with duplicate key values from being inserted into the table, e.g., a new trip with an existing trip code, and hence ensures that every record can be uniquely identified.

The plankton database is implemented as a relational database. That is, each table is a special case of a mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and their relationships between them. All relationships in plankton are of the type one-to-many [ A one-to-many relationship is where one record in a table (the parent) relates to one or many records in another table (the child).]. This is shown in the ERD by connecting a single line (indicating "many") from the child table (e.g., t_station) to the parent table (e.g., t_trip) with an arrowhead (indicating "one") pointing to the parent.

Every relationship has a mandatory or optional aspect to it. That is, if a relationship is mandatory, then it has to occur and least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_trip and it's child table t_trip_comm. The symbol "O" by the child t_trip_comm means that a fish can have zero or many trip comments, while the bar by the parent t_trip means that for every trip comment there must be a matching trip.

Most of the tables in plankton contain foreign keys, which link these tables to each other and to tables in the rdb database (Figure 1 and Figure 2). The majority of these links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table, i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so that the relationship is lost; or a child record is entered without a parent record. Constraints are shown in the table listings by the following format:

Referential: ;error message ( attribute[, attribute]) INSERT parent table ( attribute[, attribute])

For example, consider the following constraint found in the table t_station:

Referential: ;invalid trip code (trip_code) INSERT t_trip (trip_code)

This means that the value of the attribute trip_code of a record upon insert into t_station must already exist in the parent table t_trip or the record will be rejected and an error message will be displayed.

All tables in this database are indexed. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices: ;NORMAL (2, 15) index_name ON ( attribute[, attribute])

Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)..." in the syntax are Empress DBMS default values relating to the amount of space allocated for the index.

3.2 Database design

As reflected by the ERD, the highest level of a plankton survey is a research trip. Details for each trip are held in the table t_trip (Table 1). Each trip is uniquely identified by a trip code, stored as the attribute trip_code.

Note that the comments for a trip are held in a separate table t_trip_comm (Table 2). This means that one trip may have zero, one, or more than one comment associated with it. It can be argued that there is a one-to-one relationship between t_trip and t_trip_comm.

After all, all comments to be made about one trip can be made in one comment. However, the trip comments have been separated from the trip details to two reasons:

  1. Comments can be recorded at any time during a trip life-cycle. Rather than adding comments to those already recorded, it is easier just to create a new record.

  2. To optimise query times, attributes with long field sizes, such as comments, are placed in separate tables to avoid being hit during tables scans for a regular expression.

For stratified plankton surveys, stratum details, such as stratum code and area (in square kilometres) are stored in the table t_stratum (Table 3). These strata may not necessarily be the same as those used for trawling during the same trip (as recorded in the trawl database). Notice that there is an optional link from t_trip to t_stratum; this means that not all trips have to have strata, i.e., unstratified plankton surveys.

Any one trip also relates to many stations. This is a mandatory relationship: a trip has to have at least one station before it can be entered into the database. Generally, a station is the location at which the plankton gear was deployed. Details for the station, such as shot start and finish location, time, depth, gear performance and environment parameters are stored in the table t_station (Table 4).

This table differs from the standard station table in the trawl database because it allows for up to 3 positions to be recorded (shot, start, and finish) for a station: the shot time and position is taken where the gear is first deployed; the start time and position is taken where the gear has reached the target depth and has started to be hauled; and the finish time and position is taken when the net reaches the sea surface. This is illustrated in Figure 3.


Figure 3: Origin of the three sets of time and position for a station.

In some instances, the shot and start positions of the gear are one in the same thing.

However, usually a gear does not start to collect samples until is has reached it's target depth and opened by a remote trigger, or when it get forward motion from when then wire has stopped being paid out and the vessel moves forward and/or the wire has started to be hauled in.

Note that a station may or may not occur within a pre-defined stratum (the table t_station contains the attribute stratum) and that one stratum may or may not contain stations.

Therefore, there is a two-way optional many-to-one relationship between t_station and t_stratum.

Like the table t_trip, t_station has its own comments table t_stat_comm (Table 5). The same arguments that have been used for the creation of the t_trip_comm table also apply here.

For plankton stations prior to the introduction of net sondes and depth-meters, the depth and path of the plankton gear was estimated from the angle of the gear wires at certain intervals or wire length. These wire angle data are stored in the table t_wire_angle (Table 6), and are linked to t_station by the foreign keys trip_code and station_no.

Several types of plankton stations involve multiple units of gear, each with their own mesh size and wire winches. In such cases, details of these gears are recorded in the table t_gear_unit (Table 7), and are linked to t_station by the foreign keys trip_code and station_no. The default for a single gear unit is for all data to be recorded in t_station.

Samples of fish eggs are taken from each tow and are staged under a microscope (by the number of cell divisions that have taken place). The table t_eggs (Table 8) records the counts of the number of eggs for each stage as collected for each sample from each catch.

Samples of the total catch from the plankton tow are also sampled to determine the relative abundance of species. These abundance data are recorded in the table t_abund (Table 9). This includes abundance of fish, salps, zooplankton and microzooplankton.



Updated : 16 November 2007