3 Data Structures
3.1 Table Relationships
This database encompasses several tables. The ERD for beach
(Figure 1) shows the physical data model structure [ Also known as
database schema ] of the database and it's entities (each entity is
implemented as a database table) and relationships between these
tables. Each table represents an object, event, or concept in the
real world that has been represented in the database. Each attribute
of a table is a defining property or quality of the table.
All of the table's attributes are shown in the ERD. The underlined
attributes represent the table's primary key [A primary key is an
attribute or a combination of attributes that contains an unique
value to identify that record.]. This schema is valid regardless of
the database system chosen, and it can remain correct even if the
Database Management System (DBMS) is changed. Most of the tables in
the beach database also contain special attributes, called foreign
keys [A foreign key is any attribute, or a combination of attributes,
in a table that is a primary key of another table. Tables are linked
together through foreign keys.].
Section 5 shows a listing of all the beach tables as implemented
by the Empress DBMS. As can be seen in the listing of the tables, a
table's primary key has an unique index on it. Primary keys are
generally listed using the format:
|
Indices:
|
UNIQUE index_name ON
|
(attribute [, attributes ])
|
where the attribute(s) make up the primary key (the key
attributes) and the index name is the primary key name. Note that the
typographical convention for the above format is that square brackets
[] may contain more than one item or none at all. A unique index
prevents records with duplicate key values from being inserted into
the table; e.g., a sample with an existing sample number.
The beach database is implemented as a relational database. That
is, each table is a special case of the mathematical construct known
as a relation and hence elementary relation theory is used to deal
with the data within tables and the relationships between them. All
relationships in beach are of the type one-to-many [A one-to-many
relationship is where one record (the parent) in a table relates to
one or many records (the child) in another table; e.g., one survey in
t_survey can have many strata in t_stratum but any one stratum can
only come from one survey.]. This is shown in the ERD by connecting a
single line (indicating "many") from the child table; e.g.,
t_quadrat, to the parent table; e.g., t_transect, with an arrowhead
(indicating "one") pointing to the parent. For example,
consider the relationship between the tables' t_transect (the parent
table) and t_quadrat (the child table). Any one transect in
t_transect can have one or more quadrats in t_quadrat, but any one
quadrat can only be a part of one transect. Note that the word "many"
applies to the possible number of records another is associated with.
For a given instance, there might be zero, one, two, or more
associated records, but if it is ever possible to have more than one,
we use the word "many" to describe the association.
Note that the one-to-many relationships can be either mandatory or
optional. The optional relationship, denoted in the ERD by the symbol
"0" at one or both ends of the relationship line, means
that a record does not have to have any associated records.
Conversely, the mandatory relationship denoted in the ERD by a bar
symbol across the relationship line, means that a record has to have
at least one associated record. For example, if we consider again the
one-to-many relationship between the tables t_transect and t_quadrat,
which has a mandatory "one" and an optional "many".
This means that one transect record can have zero or more (many)
quadrats within it, but one quadrat must have one, and only one,
associated record in the transect table.
These relationships are enforced in the database by the use of
referential constraints [Also known as integrity checks.].
Constraints do not allow orphans to exist in any table; i.e., where a
child record exists without a related parent record. This may happen
when:
a parent record is deleted;
the parent record is altered so the
relationship is lost;
or a child record is entered without a
parent record.
All constraints in beach prevent these from occurring. Constraints
are shown in the table listings by the following format:
|
Referential:
|
error message (attribute[, attribute])
|
|INSERT|
|DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
Items stacked between vertical lines | | are options of which one
must be chosen.
In the Empress RDBMS a constraint can either be INSERT (prevents
(ii) and (iii) from occurring) or DELETE (prevents (i) or (ii) from
occurring). For example, consider the following constraint found in
the table t_quadrat:
|
Referential:
|
Invalid transect number (survey)
|
INSERT t_transect (survey, stratum, transect)
|
This means that the value of the combination of attributes survey,
stratum, and transect (that is, one transect) in the current quadrat
record must already exist in the parent table t_transect or the
record will be rejected and the following message will be displayed:
|
*** User Error:
|
insert constraint
|
'Invalid transect number' violation
|
All tables in this database are indexed. That is, attributes that
are most likely to be used as a searching key have like values linked
together to speed up searches. These indices are listed using the
following format:
|
Indices:
|
NORMAL (2, 15)
|
index_name ON
|
(attribute [, attributes ])
|
Note that indices may be simple, pointing to one attribute, or
composite pointing to more than one attribute. The numbers ..."(2,
15)"... in the syntax are Empress DBMS default values relating
to the amount of space allocated to index storage.
3.2 Database Design
At the top level of beach is the table
t_survey (Table 1), which contains all details of individual beach
surveys including, dates, areas, and species involved. Individual
surveys are identified by a unique attribute survey, which comprises
of an abbreviated form of the beach name followed by a 2-digit year
and a 2-digit sequential number; e.g., how9501 is the first survey at
Howick Beach in 1993. Both the areas covered by a survey, and the
species involved are recorded in attributes areas 7 Also known as
integrity checks.and mainspp respectively. These attributes are a
comma-separated list of area and species codes.
Descriptions of the codes used can be found in the table
area_codes for areas, and curr_spp for species in the rdb database.
Each survey has at least one stratum record in the table t_stratum
(Table 2). Strata are each given a stratum code, recorded as the
attribute stratum, which is unique within a survey. Non-stratified
surveys effectively become single stratum surveys and are given a
default stratum record.
Within each stratum, transect lines are established for sampling
and these transect details are recorded in the table t_transect
(Table 3). Transect are identified by the attribute transect that is
unique within a stratum. Positions of the start of transects are
recorded in the attributes latitude, longitude, and E_W. The
attribute phase records the phase number for transects in two-phase
surveys.
Along these transect lines quadrats are sampled. Quadrats are
usually a box-like frame that is pressed into the sand until the top
is level with the beach surface. This defines a fixed area and depth
that is then sampled. Quadrat records are recorded in the table
t_quadrat (Table 4). Individual quadrats are identified either as a
sequential number along a transect line, or as a distance along the
transect line. In some surveys, the position of each quadrat is
recorded as x and y coordinates from some fixed point in the stratum.
These coordinates are recorded in the attributes x_coord and y_coord
respectively.
Quadrats may require several replicates, for example when sand is
sieved in two different grades of sieve mesh. Replicate details,
including date sampled and mesh size, are recorded in the table
t_replicate (Table 5).
The catch for each species, as a result of sieving, is recorded in
the table t_catch (Table 6). The catch of a species may be
represented by: a number of specimens caught (as represented by the
attribute no_fish); and/or the total species weight (as represented
by the attribute weight). The attribute lf_taken is a Y/N flag to
indicate whether length frequencies were taken for that particular
species. There are three attributes within t_catch, which are codes
to define various methodologies, and as a result, have referential
constraints to tables in the rdb database (Figure 2):
samp_meth - a code to indicate the method used to select
which shellfish were selected for further sampling.
meas_meth - a code to indicate the method used to measure the
lengths of individual shellfish for length frequencies.
wt_meth - a code to indicate the method by which the species
weight was calculated.
Definitions of these three codes can be found in the tables
t_samp_sel_codes, t_fish_meas_codes, and t_wgt_meth_codes
respectively in the rdb database. Similarly, the attribute species is
also constrained by a referential to the table curr_spp in the rdb
database.
Length frequencies are recorded in the table t_length (Table 7)
with the attribute no_a recording the number of shellfish at lgth,
the millimetre class The attribute percent_samp represents the
percentage of the total catch, as recorded in t_catch, which was
measured.
Lengths and weights of individual
shellfish are recorded in the table t_lgth_wgt (Table 8). These
lengths are included in the length frequencies as recorded in
t_length.
Often, beach surveys target discrete beds of a particular flora or
fauna species; e.g. Ulva (sea lettuce) or toheroa beds. These beds
are usually identified by previous surveys or by earlier phases of
the same survey. The dimension and location details for such beds are
recorded in the table t_bed (Table 9). The densities of these species
within these beds are recorded in the table t_bed_density (Table 10).