3 Data Structures

3.1 Table Relationships

This database encompasses several tables. The ERD for beach (Figure 1) shows the physical data model structure [ Also known as database schema ] of the database and it's entities (each entity is implemented as a database table) and relationships between these tables. Each table represents an object, event, or concept in the real world that has been represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.]. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Most of the tables in the beach database also contain special attributes, called foreign keys [A foreign key is any attribute, or a combination of attributes, in a table that is a primary key of another table. Tables are linked together through foreign keys.].

Section 5 shows a listing of all the beach tables as implemented by the Empress DBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the format:

Indices:

UNIQUE index_name ON

(attribute [, attributes ])

where the attribute(s) make up the primary key (the key attributes) and the index name is the primary key name. Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all. A unique index prevents records with duplicate key values from being inserted into the table; e.g., a sample with an existing sample number.

The beach database is implemented as a relational database. That is, each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. All relationships in beach are of the type one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one survey in t_survey can have many strata in t_stratum but any one stratum can only come from one survey.]. This is shown in the ERD by connecting a single line (indicating "many") from the child table; e.g., t_quadrat, to the parent table; e.g., t_transect, with an arrowhead (indicating "one") pointing to the parent. For example, consider the relationship between the tables' t_transect (the parent table) and t_quadrat (the child table). Any one transect in t_transect can have one or more quadrats in t_quadrat, but any one quadrat can only be a part of one transect. Note that the word "many" applies to the possible number of records another is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

Note that the one-to-many relationships can be either mandatory or optional. The optional relationship, denoted in the ERD by the symbol "0" at one or both ends of the relationship line, means that a record does not have to have any associated records. Conversely, the mandatory relationship denoted in the ERD by a bar symbol across the relationship line, means that a record has to have at least one associated record. For example, if we consider again the one-to-many relationship between the tables t_transect and t_quadrat, which has a mandatory "one" and an optional "many". This means that one transect record can have zero or more (many) quadrats within it, but one quadrat must have one, and only one, associated record in the transect table.

These relationships are enforced in the database by the use of referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when:

  1. a parent record is deleted;

  2. the parent record is altered so the relationship is lost;

  3. or a child record is entered without a parent record.

All constraints in beach prevent these from occurring. Constraints are shown in the table listings by the following format:

Referential:

error message (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])


Items stacked between vertical lines | | are options of which one must be chosen.

In the Empress RDBMS a constraint can either be INSERT (prevents (ii) and (iii) from occurring) or DELETE (prevents (i) or (ii) from occurring). For example, consider the following constraint found in the table t_quadrat:

Referential:

Invalid transect number (survey)

INSERT t_transect (survey, stratum, transect)

This means that the value of the combination of attributes survey, stratum, and transect (that is, one transect) in the current quadrat record must already exist in the parent table t_transect or the record will be rejected and the following message will be displayed:

*** User Error:

insert constraint

'Invalid transect number' violation

All tables in this database are indexed. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices:

NORMAL (2, 15)

index_name ON

(attribute [, attributes ])

Note that indices may be simple, pointing to one attribute, or composite pointing to more than one attribute. The numbers ..."(2, 15)"... in the syntax are Empress DBMS default values relating to the amount of space allocated to index storage.

3.2 Database Design

At the top level of beach is the table t_survey (Table 1), which contains all details of individual beach surveys including, dates, areas, and species involved. Individual surveys are identified by a unique attribute survey, which comprises of an abbreviated form of the beach name followed by a 2-digit year and a 2-digit sequential number; e.g., how9501 is the first survey at Howick Beach in 1993. Both the areas covered by a survey, and the species involved are recorded in attributes areas 7 Also known as integrity checks.and mainspp respectively. These attributes are a comma-separated list of area and species codes.

Descriptions of the codes used can be found in the table area_codes for areas, and curr_spp for species in the rdb database.

Each survey has at least one stratum record in the table t_stratum (Table 2). Strata are each given a stratum code, recorded as the attribute stratum, which is unique within a survey. Non-stratified surveys effectively become single stratum surveys and are given a default stratum record.

Within each stratum, transect lines are established for sampling and these transect details are recorded in the table t_transect (Table 3). Transect are identified by the attribute transect that is unique within a stratum. Positions of the start of transects are recorded in the attributes latitude, longitude, and E_W. The attribute phase records the phase number for transects in two-phase surveys.

Along these transect lines quadrats are sampled. Quadrats are usually a box-like frame that is pressed into the sand until the top is level with the beach surface. This defines a fixed area and depth that is then sampled. Quadrat records are recorded in the table t_quadrat (Table 4). Individual quadrats are identified either as a sequential number along a transect line, or as a distance along the transect line. In some surveys, the position of each quadrat is recorded as x and y coordinates from some fixed point in the stratum. These coordinates are recorded in the attributes x_coord and y_coord respectively.

Quadrats may require several replicates, for example when sand is sieved in two different grades of sieve mesh. Replicate details, including date sampled and mesh size, are recorded in the table t_replicate (Table 5).

The catch for each species, as a result of sieving, is recorded in the table t_catch (Table 6). The catch of a species may be represented by: a number of specimens caught (as represented by the attribute no_fish); and/or the total species weight (as represented by the attribute weight). The attribute lf_taken is a Y/N flag to indicate whether length frequencies were taken for that particular species. There are three attributes within t_catch, which are codes to define various methodologies, and as a result, have referential constraints to tables in the rdb database (Figure 2):

  • samp_meth - a code to indicate the method used to select which shellfish were selected for further sampling.

  • meas_meth - a code to indicate the method used to measure the lengths of individual shellfish for length frequencies.

  • wt_meth - a code to indicate the method by which the species weight was calculated.

Definitions of these three codes can be found in the tables t_samp_sel_codes, t_fish_meas_codes, and t_wgt_meth_codes respectively in the rdb database. Similarly, the attribute species is also constrained by a referential to the table curr_spp in the rdb database.

Length frequencies are recorded in the table t_length (Table 7) with the attribute no_a recording the number of shellfish at lgth, the millimetre class The attribute percent_samp represents the percentage of the total catch, as recorded in t_catch, which was measured.

Lengths and weights of individual shellfish are recorded in the table t_lgth_wgt (Table 8). These lengths are included in the length frequencies as recorded in t_length.

Often, beach surveys target discrete beds of a particular flora or fauna species; e.g. Ulva (sea lettuce) or toheroa beds. These beds are usually identified by previous surveys or by earlier phases of the same survey. The dimension and location details for such beds are recorded in the table t_bed (Table 9). The densities of these species within these beds are recorded in the table t_bed_density (Table 10).

Updated : 16 November 2007