3 Data structures
3.1 Table relationships
This database contains several tables. The ERD for oyster (Figure
1) shows the logical structure [Also known as a database schema.] of
the database and its entities (each entity is implemented as a
database table) and relationships between these tables. Each table
represents an object, event, or concept in the real world that has
been represented in the database. Each attribute of a table is a
defining property or quality of the table.
All of the table's attributes are shown in the ERD. The underlined
attributes represent the table's primary key [A primary key is an
attribute or a combination of attributes that contains an unique
value to identify that record.]. This schema is valid regardless of
the database system chosen, and it can remain correct even if the
Database Management System (DBMS) is changed. Most of the tables in
the oyster database also contain special attributes, called foreign
keys [A foreign key is any attribute, or a combination of attributes,
in a table that is a primary key of another table. Tables are linked
together through foreign keys.].
Section 5 shows a listing of all the oyster tables as implemented
by the EMPRESS DBMS. As can be seen in the listing of the tables, a
table's primary key has an unique index on it. Primary keys are
generally listed using the format:
Indices: UNIQUE index_name ON (attribute [, attributes ])
where the attribute(s) make up the primary key (the key
attributes) and the index name is the primary key name. This prevents
records with duplicate key values from being inserted into the table,
e.g., a new survey being inserted with an existing survey number.
The oyster database is implemented as a relational database. That
is, tables are linked to one another by their relationships. All
relationships in oyster are of the type one-to-many [A one-to-many
relationship is where one record (the parent) in a table relates to
one or many records (the child) in another table; e.g., one survey in
t_survey can have many stations in t_station but any one station can
only come from one survey.]. This is shown in the ERD by connecting a
single line (indicating "many") from the child table (e.g.,
t_station) to the parent table (e.g., t_survey) with an arrow-head
(indicating "one&quo4t;) pointing to the parent. For example,
consider the relationship between the tables t_survey (the parent
table) and t_station (the child table). Any one survey in t_survey
can have one or more stations in t_station, but any one station can
only be a part of one survey.
Every relationship has a mandatory or optional aspect to it. That
is, if a relationship is mandatory, then it has to occur and least
once, while an optional relationship might not occur at all. For
example, in Figure 1, consider that relationship between the table
t_survey and it's child table t_damage. The symbol "O" by
the child t_damage means that a survey record can have zero or many
damage records, while the bar by the parent t_survey means that for
every damage record there must be a matching survey record.
These relationships are enforced in the database by the use of
referential constraints [Also known as integrity checks.].
Constraints do not allow orphans to exist in any table; i.e., where a
child record exists without a related parent record. This may happen
when: a parent record is deleted; the parent record is altered so the
relationship is lost; or a child record is entered without a parent
record. Constraints are shown in the table listings by the following
format:
Referential: error message (attribute) INSERT parent table
(attribute)
For example, consider the following constraint found in the table
t_station:
Referential: No such survey (survey) INSERT t_survey
(survey)
This means that the value of the attribute survey in the current
record must already exist in the parent table t_survey or the record
will be rejected and the error message "No such survey"
will be displayed.
All tables are indexed. That is, attributes that are most likely
to be used for searching, such as survey, have like values linked
together to optimise search times. Such indices are shown in the
table listings (Section 5) by the following syntax:
Indices: NORMAL (2, 15) index_name ON (attribute{,
attributes})
Note that indices may be simple, pointing to just one attribute,
or composite, pointing to more than one attribute. The numbers
"...(2, 15)..." are Empress default values relating to the
amount of space allocated to index storage.
3.2 Database design
The genesis of oyster lies in the need for a central data
repository for data collected by the Foveaux Strait oyster survey
programme. The data were originally residing on several PCs and in a
variety of formats while analyses were carried out.
Once the analyses were completed, the groomed data, usually in the
form of spreadsheets, were made available for loading onto the
Empress RDBMS. These spreadsheets form the basis of the database
table design and on a conceptual level an attempt has been made to
make these tables as robust as possible for any further surveys that
may eventuate.
3.2.1 Oyster dredge surveys
Conceptually, a dredge survey has two main entities: a survey
entity (t_survey, Table 1) and a station entity (t_station, Table 4)
with one survey comprising of many stations. However, during
analysis, each survey's data was stored in it's own single
spreadsheet, with each record in the spreadsheet broadly
corresponding with a station (or dredge) site. Hence, these
spreadsheets collectively represent the station entity and were
loaded verbatim into the table t_station. The attribute survey was
added to t_station, as a number to identify which survey the station
was a part of. The table t_survey contains these codes and the survey
title that they represent.
Different surveys employed a variety of sampling methodologies,
from fixed grid stations to stratified random sampling. All these
methodologies are accommodated in the station table resulting in a
wide range of attributes, from station_no and sample_no, to grid_no.
However, for the stratified surveys any data defining strata was
seldom given for loading into the database. Any stratum data that has
been presented to the Mfish Data Manager has be loaded into the table
t_stratum (Table 3).
Spreadsheet design mentality dictates that all attributes are
represented in columns, with many of the columns being used to hold
results of various calculations. However, standard database design
theory dictates that tables should, where possible, conform to
certain normalization rules. The purpose of these normalization rules
is to produce a databases design that is highly flexible, allowing
the schema to be extended when needed to account for new attributes,
entities, and relationships. They can also reduce redundancy in the
database and make sure the database is free from certain anomalies.
Usually, the aim is to conform to the first three rules. That is, the
database should be in a 3rd Normal Form (3NF).
All oyster survey spreadsheets to date are in 1NF in that every
attribute is single-valued for each record. However, to be 2NF, all
the non-key attributes have to be fully dependent on the primary key;
i.e., each attribute in a record has to be dependant on the
attributes survey and station_no.
Here, the spreadsheets failed because the vessel name is dependent
on the vessel registration number, not the station primary key. To
conform to 2NF, the vessel names are therefore stored in their own
table t_vessel (Table 2) and are linked to t_station by the attribute
vess_id.
To make t_station 3NF requires that every non-key attribute is not
transitively dependent on the primary key. That is, there should be
no attributes that are the result of calculations of other attributes
in the table. The problems arise because any changes in the values of
the other attributes will not automatically change the value of the
transitively dependent attribute, resulting in data anomalies, unlike
spreadsheets, which can do this using formula. Because the original
spreadsheets were used for analyses, they have several attributes
that fall into this category (such as the attribute stand_live1 that
is calculated from the attributes no_live1 and distance). These
transitively dependent attributes should be removed from t_station,
but because oyster is used as a static data repository rather than a
dynamic data analysis tool, these anomalies will not appear.
Therefore, they remain in t_station.
The table t_station contains a number of attributes for absolute,
percentage, and standardised abundance of oysters dredged at each
station; e.g., no_live1, stand_live2, and bonamia3. A number
to denote the shell size range they represent suffixes these
attributes, with: 1 = takeable (legal size >= 58mm); 2 = sublegal
(< 58mm); 3 = pre-recruits (between 50 and 57mm); 4 = juveniles (<
50mm); 5 = combined pre-recruit and takeable (>= 50mm).
Measurements of individual oysters are stored in the table
t_biological (Table5). Measurements include: shell length, width and
height, shell weight, meat weight, and an index for the level of
bonamia infection.
3.2.2 Survey on the impact of dredging on oysters
The exception to the survey/station relationship is the 1997
survey on the impact of dredging on Foveaux Strait oysters, which
looked at differences of levels of damage sustained by oysters at
harvesting by different dredging methods. Conceptually, this survey
is a one-to-many relationship between the survey entity (t_survey)
and the damage entity (t_damage, Table 6). Individual records in
t_damage don"t represent a station, but a cluster of oysters,
where a cluster is defined as either one adult (legal size >=
58mm) oyster or a clump of juvenile spat formed around a single
nucleus.
Much of the data in t_damage are in the form of numeric codes.
Separate tables contain the definitive lists of these codes, and
their descriptions. The codes are for the types of damage to oysters
(t_damage_codes, Table 8), treatment (or method of dredging) of
oysters at harvesting (t_treatment_codes, Table 7), and cluster types
(t_cluster_codes, Table 9).