3 Data structures
3.1 Table relationships
This database contains several tables. The ERD for recruit (Figure
1) shows the logical structure [ Also known as a database schema.] of
the database and it's entities (each entity is implemented as a
database table) and relationships between these tables and tables in
other databases. This schema is valid regardless of the database
system chosen, and it can remain correct even if the Database
Management System (DBMS) is changed. Each table represents an object,
event, or concept in the real world that is chosen to be represented
in the database. Each attribute of a table is a defining property or
quality of the table.
All of the table's attributes are shown in the ERD. The underlined
attributes represent the table's primary key [A primary key is an
attribute or a combination of attributes that contains an unique
value to identify that record.].
Note that Figure 1 shows the main tables only. Note that most
tables contain foreign keys [A foreign key is an attribute or a
combination of attributes that is a primary key in another table.].
These foreign keys define the relationships between the tables in
recruit.
The recruit database is implemented as a relational database;
i.e., each table is a special case of the mathematical construct
known as a relation and hence elementary relation theory is used to
deal with the data within tables and the relationships between them.
There are three types of relationships possible between tables, but
only one exists in recruit: one-to-many [A one-to-many relationship
is where one record (the parent) in a table relates to one or many
records (the child) in another table; e.g., one sample in t_site can
have many length frequency records in t_lgth but one length frequency
records can only come from one sample site.]. This is shown in the
ERD by connecting a single line (indicating "many") from
the child table (e.g., t_lgth) to the parent table (e.g., t_site)
with an arrow-head (indicating "one") pointing to the
parent. For example, consider the relationship between the tables
t_site (the parent table) and t_lgth (the child table).
Any one sample in t_site can generate one or more length frequency
records in t_lgth, but any one length frequency record can only come
from one sample. Note that the word "many" applies to the
possible number of records in one table that one record in another
table is associated with. For a given instance, there might be zero,
one, two, or more associated records, but if it is ever possible to
have more than one, we use the word "many" to describe the
association.
Every relationship has a mandatory or optional aspect to it. That
is, if a relationship is mandatory, then it has to occur at least
once, while an optional relationship might not occur at all. For
example, in Figure 1, consider again that relationship between the
table t_site and it's child table t_lgth.
The symbol "O" by the child t_lgth means that a sample
record may have zero or many length frequency records, while the bar
by the parent t_site means that for every length frequency record
there must be a matching sample record.
These relationships are enforced in the database by the use of
referential constraints[Also known as integrity checks.]. Constraints
do not allow orphans to exist in any table; i.e., where a child
record exists without a related parent record. This may happen when:
a parent record is deleted; the parent record is altered so the
relationship is lost; or a child record is entered without a parent
record. All constraints in recruit prevent the latter from occurring.
Constraints are shown in the table listings by the following format:
|
Referential:
|
constraint name (attribute[, attribute])
|
|INSERT| |DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
For example, consider the following constraint found in the table
t_lgth:
|
Referential:
|
invalid id (id) INSERT t_site (id)
|
This means that the value of the attribute id in the current
record must already exist in the parent table t_site or the record
will be rejected and the following message will be displayed:
*** User Error: insert constraint "invalid id" violation
For tables residing in external databases, the parent table name
will be prefixed by the name of the database.
Section 5 details a listing of all the recruit tables as
implemented by the EMPRESS DBMS. These table show that a table's
primary key has an unique index on it. Primary keys are generally
listed using the format:
|
Indices:
|
UNIQUE index_name ON ( attribute [ , attributes])
|
where the attribute(s) make up the primary key (the key
attributes), and the index name is the primary key name. Note that
the typographical convention for the above format is that square
brackets [] may contain more than one item or none at all.
The unique index prevents records with duplicate key values from
being inserted into the table; e.g., a new sample being inserted with
an existing id number, and hence ensures that every record can be
uniquely identified.
The recruit database is implemented as a relational database. That
is, each table is a special case of the mathematical construct known
as a relation and hence elementary relation theory is used to deal
with the data within tables and the relationships between them. All
relationships in recruit are of the type one-to-many [A one-to-many
relationship is where one record (the parent) in a table relates to
one or many records (the child) in another table; e.g., one sample in
t_site can have many length frequency records in t_lgth but any one
length frequency record can only come from one sample.].
All tables are indexed. That is, attributes that are most likely
to be used for searching, such as id, have like values linked
together to optimise search times. Such indices are shown in the
table listings (Section 5) by the following syntax:
|
Indices:
|
NORMAL (2, 15) index_name ON ( attribute{, attributes})
|
Note that indices may be simple, pointing to just one attribute,
or composite, pointing to more than one attribute. The numbers
"...(2, 15)..." are EMPRESS default values relating to the
amount of space allocated to index storage.
3.2 Database design
The details for each sample undertaken are recorded in the table
t_site (Table 1). Each sample is allocated an identification number,
id, which is unique for every sample. The site sampled is recorded as
a 3-character code and stored in the attribute site. Codes are used,
rather than full spelling of the site names, to prevent problems
caused by spelling mistakes and case differences.
The attribute site is also a foreign key to the table
t_site_codes. The actual catch of the target species is recorded in
the attribute no_target_sp, with the target species code recorded in
the attribute target_sp. Any bycatch is recorded in the attribute
bycatch in a semi-descriptive text format, usually describing bycatch
species by their 3-character species codes. For example, the value of
bycatch of "22 YEM 1 PUF 2 PIL" means 22 yellow-eyed
mullet, 1 pufferfish, and 2 pilchards. Other information recorded in
this table includes sampling date and time, and the time of the
nearest high tide.
Details for individual sites used during sampling are stored in
the table t_site_codes (Table 2). Each record in this table is
uniquely identified by a 3-character code in the attribute site. Each
site record contains a full description of the site and the general
geographic region, and is accurately located by a NZMS 260 map number
and grid reference.
Length frequencies of fish caught are stored in the table t_lgth
(Table 3). Each record contains: no_a, the number of fish (frequency)
caught at a length class; lgth, the length class (mm); and the
1-character code measure_meth to identify the method used to measure
the fish (e.g., total length, fork length, etc).
Figure 1: Entity Relationship Diagram (ERD) for
the recruit database.