3 Data Structures
3.1 Table relationships
This database contains several tables. The ERD for kina (Figure 1)
shows the logical structure [ Also known as a database schema ] of
the database and it's entities (each entity is implemented as a
database table) and relationships between these tables and tables in
other databases. This schema is valid regardless of the database
system chosen, and it can remain correct even if the Database
Management System (DBMS) is changed. Each table represents an object,
event, or concept in the real world that is selected to be
represented in the database. Each attribute of a table is a defining
property or quality of the table.
All of the table's attributes are shown in the ERD. The underlined
attributes represent the table's primary key [A primary key is an
attribute or a combination of attributes that contains an unique
value to identify that record.].
Note that Figure 1 shows the main tables only. Note that most
tables contain foreign keys [A foreign key is an attribute or a
combination of attributes that is a primary key in another table.].
These foreign keys define the relationships between the tables in
kina.
The kina database is implemented as a relational database; i.e.,
each table is a special case of the mathematical construct known as a
relation and hence elementary relation theory is used to deal with
the data within tables and the relationships between them. There are
three types of relationships possible between tables, but only one
exists in kina: one-to-many [A one-to-many relationship is where one
record (the parent) in a table relates to one or many records (the
child) in another table; e.g., one landing in t_catch_site can have
many samples in t_sample but one sample can only come from one catch
site.]. These relationships can be seen in ERDs by connecting a
single line (indicating "many") from the child table; e.g.,
t_sample, to the parent table; e.g., t_catch_site, with an arrowhead
(indicating "one") pointing to the parent.
Every relationship has a mandatory or optional aspect to it. If a
relationship is mandatory, then it has to occur at least once, while
an optional relationship might not occur at all. For example, in
Figure 1, consider that relationship between the table t_catch_site
and it's child table t_sample. The symbol "o" by the child
t_sample means that a catch site can have zero or many sample
records, while the bar by the parent t_catch_site means that for
every sample there must be a matching catch site record.
These links are enforced by referential constraints [Also known as
integrity checks.]. Constraints do not allow orphans to exist in any
table; i.e., where a child record exists without a related parent
record. This may happen when: a parent record is deleted; the parent
record is altered so the relationship is lost; or a child record is
entered without a parent record Constraints are shown in the table
listings by the following format:
|
Referential:
|
constraint name (attribute[, attribute])
|
|INSERT|
|DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
Note that the typographical convention for the above format is
that square brackets [] may contain more than one item or none at
all. Items stacked between vertical lines | | are options of which
one must be chosen.
For example, consider the following constraint found in the table
t_biological:
|
Referential:
|
Sample no. not in t_sample (sample_no)
|
INSERT
|
|
|
t_sample (sample_no)
|
|
This means that the value of the attribute sample_no in the
current record must already exist in the parent table t_sample or the
record will be rejected and the following message will be displayed:
*** User Error: insert constraint 'sample no. not in t_sample"
violation
For tables residing in external databases, the parent table name
will be prefixed by the name of the database.
Section 5 lists all the kina tables as implemented by the Empress
RDBMS. As can be seen in the listing of the tables, a table's primary
key has an unique index on it. Primary keys are generally listed
using the following format:
|
Indices:
|
UNIQUE index_name ON (attribute[, attribute])
|
where attribute(s) make up the primary key and the index name is
the primary key name. These prevent records with duplicate keys from
being inserted into the tables; e.g., a record with an existing
sample number.
The database listing (Tables 1-5) show that the tables also have
indices on many attributes. That is, attributes that are most likely
to be used as a searching key have like values linked together to
speed up searches. These indices are listed using the following
format:
|
Indices:
|
NORMAL (2, 15) index_name ON ( attribute[, attribute])
|
Note that indices may be simple, pointing to one attribute or
composite pointing to more than one attribute. The numbers "...(2,
15)..." in the syntax are Empress RDBMS default values relating
to the amount of space allocated for the index.
3.2 Database design
The overall database design is influenced by the duality of its
function: to store commercial catch and effort data (from the SOP and
where necessary, the participants); and to store biological
information (from both the SOP and Marine Research).
3.2.1 Commercial catch and effort data structures
In the first case, the daily catch and effort data from catch log
sheets are represented by the table t_catch_site (Table 1). Each
record in the table contains catch and effort information by a
participant, for one day's fishing at a site by diver. So if a
participant has only one diver and the day's catch was from three
sites, then there would be three records in this table. If a
participant had two divers and each diver had three sites for the
day's catch, then there would be six records in this table. The catch
data (in kilograms) are held in the attribute site_catch_wt, and the
effort data (time diving in decimal hours) are held in the attribute
site_time_diving.
The primary key for this table is
actually quite complex, being a combination of participant,
fishing_date, divers_name, fma, and site_no. The site number
(site_no) is only a logical identifier of a diver's dive site per
day. Any one geographical site may have many logical site numbers
assigned to it over time.
There are two attributes, latitude and longitude, for recording
the geographical location of a dive site. However, this information
is going to be limited to those divers with GPS, therefore, in
addition to the latitude and longitude of a dive site, provided by
the daily catch log sheet, one other attribute site_name has been
included to increase the flexibility of catch data queries. This
attribute is designed to hold the common geographical name of the
dive site, e.g., the name of the bay, island, etc. It is not vital
for this attribute to be used at the time of data entry, but may
inserted at a much later date when necessary.
In order to enhance relationships between t_catch_site and other
tables within kina, two further attributes are added. The first,
grid_ref, records which grid reference a dive site is located. This
is vital in order to link to the catch sampling data, and will be
discussed in more detail later. The second attribute, stratum,
records which Research stratum area a dive site is located. This
allows commercial catches to be used along side Research data.
3.2.2 Biological data structures
Both SOP catch sampling and the Research dive sampling programmes
collect biological data.
Details of each sample collected produce a record in the table
t_sample (Table 2).
All samples have a few attributes in common. Each sample
collected, for both SOP and Research sampling, has its own unique
sample number as defined by sample_no, which becomes the primary key.
The date on which the sample was collected is recorded in the
attribute fishing_date. The attribute divers_names records the name
of the diver(s) who caught the sample. Both the SOP and Research
samples are taken from within various strata as defined by the Kina
Research Group, with the stratum number being recorded in the
attribute stratum.
The remaining attributes are relevant just to the SOP samples. The
attribute observer records the observers name; grid_ref records which
catch sampling grid reference the sample was taken from (to link the
sample back to a catch record); and percent_samp records what
percentage of the total catch from the site was sampled.
Each sample produces a test diameter frequency that is held in the
table t_test_dia (Table 3). Each record in this table contains a
diameter class (stored by the attribute test_dia) and the number of
kina that are in that class (stored by the attribute no_t).
A further 25 kina are taken from the sample for detailed
biological analysis. The details of these biologicals are held in the
table t_biological (Table 4). The attributes of this table are all
reasonably self-explanatory - whole_weight records the whole weight
(in grams) of the kina; test_diameter records the diameter (to the
nearest millimetre) of the kina's test; gonad_volume records the
volume (in millilitres) of the kina gonad; and gonad_colour records a
numeric code (up to two digits) for the colour (and hence the
quality) of the gonad. Note that SOP samples only have whole number
codes for colour, while Research samples have decimals for increased
precision.
3.2.3 Grid references
There is an additional table,
t_grid_ref (Table 5), which has two roles in this database.
The first is to act as a coarse error
check for invalid grid references. Before a catch or a sample record
can be entered into their respective tables, the grid reference
contained in the record must exist in t_grid_ref before it can be
entered. Also, many of the possible grid references are invalid for
such reasons as they occur completely on land or on water that is too
deep for divers to reach.
The attribute comment} is used to distinguish between valid and
invalid grid references. A null in the comment field means a valid
grid reference; otherwise the comment becomes an error message such
as "Grid reference on land!".
The second role that t_grid_ref has to play is to provide a link
between the SOP samples and the Research samples by recording which
Research stratum number each grid reference belongs to.
Note that not all grid references have a stratum number assigned
to them, e.g., where grid references are on land.