3 Data Structures
Initially, plankton surveys were included as part of the trawl
database. However, it soon became apparent that the trawl data
structures were not suitable for plankton catch data.
Therefore, a separate database, plankton, was created that, at the
conceptual level, is very similar to trawl [Mackay, K. 1998: Marine
Research database documentation. 6. trawl. NIWA Greta Point Internal
Report No. 6. 35p.], but has attributes specific to plankton catches.
This has the advantage of allowing easy access to related trawl
survey data.
3.1 Table Relationships
This database contains several tables. The ERD for plankton
(Figure 1) shows the physical data model structure [Also known as a
database schema.] of the database and its entities (each entity is
implemented as a database table) and relationships between these
tables. Each table represents an object, event, or concept in the
real world that has been represented in the database. Each attribute
of a table is a defining property or quality of the table.
All of the table's attributes are shown in the ERD. The underlined
attributes represent the table's primary key [A primary key is an
attribute or a combination of attributes that contains an unqiue
value to identify that record.]. This schema is valid regardless of
the database system chosen, and it can remain correct even if the
Database Management System (DBMS) is changed.
Note that Figure 1 shows the main tables only. Some of the tables
in the plankton database have attributes, called foreign keys [A
foreign key is any attribute, or a combination of attributes, in a
table that is a primary key of another table. Tables are linked
together through foreign keys.], which contain standard NIWA
fisheries codes, such as species. These attributes provide links to
tables in plankton and the rdb (research database) database, which
contains the definitive list of standard codes. An expanded ERD for
these tables is shown in Figure 2.
Section 5 shows a listing of all the plankton tables as
implemented by the Empress DBMS. As can be seen in the listing of the
tables, a table's primary key has an unique index on it. Primary keys
are generally listed using the format:
Indices: ;UNIQUE index_name ON ( attribute [ , attributes
])
where the attribute(s) make up the primary key and the index name
is the primary key name. Note that the typographical convention for
the above (and subsequent) format is the square brackets [ ] may
contain an item that is repeated zero or more times.
This unique index prevents records with duplicate key values from
being inserted into the table, e.g., a new trip with an existing trip
code, and hence ensures that every record can be uniquely identified.
The plankton database is implemented as a relational database.
That is, each table is a special case of a mathematical construct
known as a relation and hence elementary relation theory is used to
deal with the data within tables and their relationships between
them. All relationships in plankton are of the type one-to-many [ A
one-to-many relationship is where one record in a table (the parent)
relates to one or many records in another table (the child).]. This
is shown in the ERD by connecting a single line (indicating "many")
from the child table (e.g., t_station) to the parent table (e.g.,
t_trip) with an arrowhead (indicating "one") pointing to
the parent.
Every relationship has a mandatory or optional aspect to it. That
is, if a relationship is mandatory, then it has to occur and least
once, while an optional relationship might not occur at all. For
example, in Figure 1, consider that relationship between the table
t_trip and it's child table t_trip_comm. The symbol "O" by
the child t_trip_comm means that a fish can have zero or many trip
comments, while the bar by the parent t_trip means that for every
trip comment there must be a matching trip.
Most of the tables in plankton contain foreign keys, which link
these tables to each other and to tables in the rdb database (Figure
1 and Figure 2). The majority of these links are enforced by
referential constraints [Also known as integrity checks.].
Constraints do not allow orphans to exist in any table, i.e., where a
child record exists without a related parent record. This may happen
when: a parent record is deleted; the parent record is altered so
that the relationship is lost; or a child record is entered without a
parent record. Constraints are shown in the table listings by the
following format:
Referential: ;error message ( attribute[, attribute])
INSERT parent table ( attribute[, attribute])
For example, consider the following constraint found in the table
t_station:
Referential: ;invalid trip code (trip_code) INSERT t_trip
(trip_code)
This means that the value of the attribute trip_code of a record
upon insert into t_station must already exist in the parent table
t_trip or the record will be rejected and an error message will be
displayed.
All tables in this database are indexed. That is, attributes that
are most likely to be used as a searching key have like values linked
together to speed up searches. These indices are listed using the
following format:
Indices: ;NORMAL (2, 15) index_name ON ( attribute[,
attribute])
Note that indices may be simple, pointing to one attribute or
composite pointing to more than one attribute. The numbers "...(2,
15)..." in the syntax are Empress DBMS default values relating
to the amount of space allocated for the index.
3.2 Database design
As reflected by the ERD, the highest level of a plankton survey is
a research trip. Details for each trip are held in the table t_trip
(Table 1). Each trip is uniquely identified by a trip code, stored as
the attribute trip_code.
Note that the comments for a trip are held in a separate table
t_trip_comm (Table 2). This means that one trip may have zero, one,
or more than one comment associated with it. It can be argued that
there is a one-to-one relationship between t_trip and t_trip_comm.
After all, all comments to be made about one trip can be made in
one comment. However, the trip comments have been separated from the
trip details to two reasons:
Comments can be recorded at any time
during a trip life-cycle. Rather than adding comments to those
already recorded, it is easier just to create a new record.
To optimise query times, attributes with
long field sizes, such as comments, are placed in separate tables to
avoid being hit during tables scans for a regular expression.
For stratified plankton surveys, stratum details, such as stratum
code and area (in square kilometres) are stored in the table
t_stratum (Table 3). These strata may not necessarily be the same as
those used for trawling during the same trip (as recorded in the
trawl database). Notice that there is an optional link from t_trip to
t_stratum; this means that not all trips have to have strata, i.e.,
unstratified plankton surveys.
Any one trip also relates to many stations. This is a mandatory
relationship: a trip has to have at least one station before it can
be entered into the database. Generally, a station is the location at
which the plankton gear was deployed. Details for the station, such
as shot start and finish location, time, depth, gear performance and
environment parameters are stored in the table t_station (Table 4).
This table differs from the standard station table in the trawl
database because it allows for up to 3 positions to be recorded
(shot, start, and finish) for a station: the shot time and position
is taken where the gear is first deployed; the start time and
position is taken where the gear has reached the target depth and has
started to be hauled; and the finish time and position is taken when
the net reaches the sea surface. This is illustrated in Figure 3.
Figure 3: Origin of the three sets of time and
position for a station.
In some instances, the shot and start positions of the gear are
one in the same thing.
However, usually a gear does not start to collect samples until is
has reached it's target depth and opened by a remote trigger, or when
it get forward motion from when then wire has stopped being paid out
and the vessel moves forward and/or the wire has started to be hauled
in.
Note that a station may or may not occur within a pre-defined
stratum (the table t_station contains the attribute stratum) and that
one stratum may or may not contain stations.
Therefore, there is a two-way optional many-to-one relationship
between t_station and t_stratum.
Like the table t_trip, t_station has its own comments table
t_stat_comm (Table 5). The same arguments that have been used for the
creation of the t_trip_comm table also apply here.
For plankton stations prior to the
introduction of net sondes and depth-meters, the depth and path of
the plankton gear was estimated from the angle of the gear wires at
certain intervals or wire length. These wire angle data are stored in
the table t_wire_angle (Table 6), and are linked to t_station by the
foreign keys trip_code and station_no.
Several types of plankton stations involve multiple units of gear,
each with their own mesh size and wire winches. In such cases,
details of these gears are recorded in the table t_gear_unit (Table
7), and are linked to t_station by the foreign keys trip_code and
station_no. The default for a single gear unit is for all data to be
recorded in t_station.
Samples of fish eggs are taken from each tow and are staged under
a microscope (by the number of cell divisions that have taken place).
The table t_eggs (Table 8) records the counts of the number of eggs
for each stage as collected for each sample from each catch.
Samples of the total catch from the plankton tow are also sampled
to determine the relative abundance of species. These abundance data
are recorded in the table t_abund (Table 9). This includes abundance
of fish, salps, zooplankton and microzooplankton.