3 Data Structures
3.1 Table Relationships
This database contains several tables. The ERD for dive (Figure 1)
shows the logical structure2 of the database and it's entities (each
entity is implemented as a database table) and relationships
between these tables and tables in other databases. This schema is
valid regardless of the database system chosen, and it can remain
correct even if the Database Management System (DBMS) is changed.
Each table represents an object, event, or concept in the real world
that is selected to be represented in the database. Each attribute
of a table is a defining property or quality of the table. All of the
table's attributes are shown in the ERD. The underlined attributes
represent the table's primary key3.
Note that Figure 1 shows the main tables only. Note that most
tables contain foreign keys 4. These foreign keys define the
relationships between the tables in dive.
The dive database is implemented as a relational database;
i.e., each table is a special case of the mathematical construct
known as a relation and hence elementary relation theory is
used to deal with the data within tables and the relationships
between them. There are three types of relationships possible between
tables, but only one exists in dive: one-to-many5. These
relationships can be seen in ERDs by connecting a single line
(indicating "many") from the child table; e.g., catch, to
the parent table; e.g., t_lgth, to the parent table; e.g.,
t_dive, with an arrowhead (indicating "one")
pointing to the parent. For example, consider the relationship
between the tables t_dive (the parent table) and t_lgth
(the child table). Any one dive in t_dive can have one or more
length records in t_lgth, but any one length record can only
come from one dive. Note that the word "many" applies to
the possible number of records another is associated with. For a
given instance, there might be zero, one, two, or more associated
records, but if it is ever possible to have more than one, we use the
word "many" to describe the association.
Note that the one-to-many relationships can be either mandatory or
optional 6. The optional relationship, denoted in the ERD by the
symbol "O" at one or both ends of the relationship line,
means that a record does not have to have any associated records.
Conversely, the mandatory relationship denoted in the ERD by a bar
symbol across the relationship line, means that a record has to have
at least one associated record. For example, if we consider again the
one-to-many relationship between the tables t_dive and t_lgth, which
has a mandatory "one" and an optional "many".
This means that one dive record can have zero or more (many) length
records recorded, but one length record in the effort table must have
one, and only one, associated record in the dive table.
These links are enforced by referential constraints7. Constraints
do not allow orphans to exist in any table; i.e., where a child
record exists without a related parent record. This may happen when:
a parent record is deleted; the parent record is altered so the
relationship is lost; or a child record is entered without a parent
record.
Constraints are shown in the table listings by the following
format:
|
Referential:
|
constraint name (attribute[,
attribute])
|
|INSERT|
|DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
Note that the typographical convention for the above format is
that square brackets "[ ]" may contain more than one item
or none at all. Items stacked between vertical lines | | are options
of which one must be chosen.
For example, consider the following constraint found in the table
t_dive:
|
Referential:
|
Invalid target species code
(target_species)
|
INSERT
|
|
|
rdb : curr_spp (code)
|
|
This means that the value of the attribute target_spp in
the current record must already exist in the parent table curr_spp
in the rdb database or the record will be rejected and the following
message will be displayed:
*** User Error: insert constraint "Invalid species code"
violation
For tables residing in external databases, the parent table name
will be prefixed by the name of the database, as in the above
example.
Section 5 lists all the dive tables as implemented by the
Empress RDBMS. As can be seen in the listing of the tables, a table's
primary key has a unique index on it. Primary keys are generally
listed using the following format:
Indices: UNIQUE index_name ON (attribute[, attribute])
where attribute(s) make up the primary key and the index name is
the primary key name. These prevent records with duplicate keys from
being inserted into the tables; e.g., a record with an existing dive
number. Note that the table t_lgth has no primary key
The database listing (Tables 1-5) show that the tables also have
indices on many attributes. That is, attributes that are most likely
to be used as a searching key have like values linked together to
speed up searches. These indices are listed using the following
format:
Indices: NORMAL (2, 15) index_name ON (attribute[, attribute])
Note that indices may be simple, pointing to one attribute or
composite pointing to more than one attribute. The numbers "...2,
15)..." in the syntax are Empress RDBMS default values relating
to the amount of space allocated for the index.
3.2 Database Design
The core entity of the dive database is a single dive by a
single or buddy-pair of divers. Details of each dive are recorded in
the table t_dive (Table 1). Each dive is allocated a dive_no,
which is a unique number to identify that dive. Each dive record must
have the area code filled in to provide, at the very least, a
geo-spatial reference. This may be anything from a general region
(e.g., the Marlborough Sounds) to a particular geographic point
(e.g., the wharf off Halfmoon Bay). The attribute proj_code is
used to identify different dive survey data sets. Each dive site
within an area and/or stratum is given a number, denoted by the
attribute site. Dive sites may be repeatedly dived on. These repeats
are given a number, denoted by the attribute rep. The date of the
dive is stored in the attribute dive_date. However, for a
large amount of the historical data (pre 1998), the exact date is not
know. Therefore, the attribute d_date is a character field to
stored text describing the dive date, such as "Jun-97". An
additional date attribute, fishing_year, also stores the
fishing year of the survey, in the format yy1-yy2; e.g., the 1997-98
fishing year is stored as "97-98". One dive can also
involve more than one diver. The diver(s) names are recorded in the
attribute divers_names.
Usually, the survey design is that the area is divided into
sub-regions, such as strata or grids.
Thus, each dive may be in a stratum within an area, as recorded by
the attributes area and stratum. These attributes are a foreign key
to the table t_stratum (Table 2) contains the descriptions of these
strata. This is directly comparable to the table t_grid_ref in the
kina database8.
Some dive surveys using metal quadrats that are repeatedly laid
down along a transect line. All occurrences of the target species
within the quadrat are counted and recorded by the divers.
These quadrat counts are stored in the table t_counts (Table 3).
Often divers will use an abundance code to describe the numbers of
species, rather than an absolute number. Each record in t_counts is
therefore the number or abundance for each species counted or
measured within a single quadrat. If no quadrats are used in a dive,
then t_counts represented a summary of the length information
recorded in the t_lgth table. In such instances, the value of the
quadrat attribute is set to a default of 1.
All length data recorded by divers are stored in the table t_lgth
(Table 4). Each record on t_lgth corresponds to the
measurement of an individual animal, rather than a length frequency.
This structure allows t_lgth to be easily expanded to include
other biological attributes including greenweight, gonad state, etc.
For species such as shellfish; e.g., paua and kina, these lengths can
be recorded after the dive by measurement of the catch. For other
species, such as finfish, measurements are usually the result of
estimates of fish length by the divers.
Dive details for each diver that participated in the dive can be
recorded in the table t_dive_record (Table 5). This is
information recorded for the divers personal dive logbook and
includes such details as start, finish and total dive times, start
and finish SCUBA tank air pressures, residual nitrogen time, start,
finish and total surface intervals, and repeat groups as calculated
from the standard PADI dive tables. This table is not normalized
because several attributes are functionally dependant on other
attributes within the table, not just the primary key. Specifically,
the total_dive_time and total_si are simply the result
of summing other attributes, namely dive_down, dive_up and
dive_time, rnt respectively. Similarly, the PADI repeat groups
RG1 and RG2 are dependant on these dive time and any surface
intervals. The decision has been made to retain these attributes
within the table, reflecting what the divers themselves entered in
their dive logs. The calculations can be done later as a means of
crosschecking the divers entries.
2 Also known as a database schema.
3 A primary key is an attribute or a combination of attributes
that contains a unique value to identify that record.
4 A foreign key is an attribute or a combination of attributes
that is a primary key in another table.
5 A one-to-many relationship is where one record (the parent) in a
table relates to one or many records (the child) in another table; e.g., one dive in t_dive can have one or more
length records in t_lgth, but any one length record can only come
from one dive.
6 Known as the cardinality of a relationship " the constraint
on relationships so that the possible extensions of the relation
correspond to real-world associations.
7 Also known as integrity checks.
8 Mackay, K. and Fisher, D. 1993: Marine Research database
documentation. 7. Kina. MAF Fisheries Greta Point Internal Report No.
213. 19p.