3 Data Structures

3.1 Table Relationships

This database contains several tables. The ERD for dive (Figure 1) shows the logical structure2 of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is selected to be represented in the database. Each attribute of a table is a defining property or quality of the table. All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key3.

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys 4. These foreign keys define the relationships between the tables in dive.

The dive database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in dive: one-to-many5. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., catch, to the parent table; e.g., t_lgth, to the parent table; e.g., t_dive, with an arrowhead (indicating "one") pointing to the parent. For example, consider the relationship between the tables t_dive (the parent table) and t_lgth (the child table). Any one dive in t_dive can have one or more length records in t_lgth, but any one length record can only come from one dive. Note that the word "many" applies to the possible number of records another is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

Note that the one-to-many relationships can be either mandatory or optional 6. The optional relationship, denoted in the ERD by the symbol "O" at one or both ends of the relationship line, means that a record does not have to have any associated records. Conversely, the mandatory relationship denoted in the ERD by a bar symbol across the relationship line, means that a record has to have at least one associated record. For example, if we consider again the one-to-many relationship between the tables t_dive and t_lgth, which has a mandatory "one" and an optional "many". This means that one dive record can have zero or more (many) length records recorded, but one length record in the effort table must have one, and only one, associated record in the dive table.

These links are enforced by referential constraints7. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record.

Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])


Note that the typographical convention for the above format is that square brackets "[ ]" may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table t_dive:

Referential:

Invalid target species code (target_species)

INSERT


rdb : curr_spp (code)




This means that the value of the attribute target_spp in the current record must already exist in the parent table curr_spp in the rdb database or the record will be rejected and the following message will be displayed:

*** User Error: insert constraint "Invalid species code" violation

For tables residing in external databases, the parent table name will be prefixed by the name of the database, as in the above example.

Section 5 lists all the dive tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has a unique index on it. Primary keys are generally listed using the following format:



Indices: UNIQUE index_name ON (attribute[, attribute])

where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing dive number. Note that the table t_lgth has no primary key

The database listing (Tables 1-5) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices: NORMAL (2, 15) index_name ON (attribute[, attribute])

Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...2, 15)..." in the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.

3.2 Database Design

The core entity of the dive database is a single dive by a single or buddy-pair of divers. Details of each dive are recorded in the table t_dive (Table 1). Each dive is allocated a dive_no, which is a unique number to identify that dive. Each dive record must have the area code filled in to provide, at the very least, a geo-spatial reference. This may be anything from a general region (e.g., the Marlborough Sounds) to a particular geographic point (e.g., the wharf off Halfmoon Bay). The attribute proj_code is used to identify different dive survey data sets. Each dive site within an area and/or stratum is given a number, denoted by the attribute site. Dive sites may be repeatedly dived on. These repeats are given a number, denoted by the attribute rep. The date of the dive is stored in the attribute dive_date. However, for a large amount of the historical data (pre 1998), the exact date is not know. Therefore, the attribute d_date is a character field to stored text describing the dive date, such as "Jun-97". An additional date attribute, fishing_year, also stores the fishing year of the survey, in the format yy1-yy2; e.g., the 1997-98 fishing year is stored as "97-98". One dive can also involve more than one diver. The diver(s) names are recorded in the attribute divers_names.

Usually, the survey design is that the area is divided into sub-regions, such as strata or grids.

Thus, each dive may be in a stratum within an area, as recorded by the attributes area and stratum. These attributes are a foreign key to the table t_stratum (Table 2) contains the descriptions of these strata. This is directly comparable to the table t_grid_ref in the kina database8.

Some dive surveys using metal quadrats that are repeatedly laid down along a transect line. All occurrences of the target species within the quadrat are counted and recorded by the divers.

These quadrat counts are stored in the table t_counts (Table 3). Often divers will use an abundance code to describe the numbers of species, rather than an absolute number. Each record in t_counts is therefore the number or abundance for each species counted or measured within a single quadrat. If no quadrats are used in a dive, then t_counts represented a summary of the length information recorded in the t_lgth table. In such instances, the value of the quadrat attribute is set to a default of 1.

All length data recorded by divers are stored in the table t_lgth (Table 4). Each record on t_lgth corresponds to the measurement of an individual animal, rather than a length frequency. This structure allows t_lgth to be easily expanded to include other biological attributes including greenweight, gonad state, etc. For species such as shellfish; e.g., paua and kina, these lengths can be recorded after the dive by measurement of the catch. For other species, such as finfish, measurements are usually the result of estimates of fish length by the divers.

Dive details for each diver that participated in the dive can be recorded in the table t_dive_record (Table 5). This is information recorded for the divers personal dive logbook and includes such details as start, finish and total dive times, start and finish SCUBA tank air pressures, residual nitrogen time, start, finish and total surface intervals, and repeat groups as calculated from the standard PADI dive tables. This table is not normalized because several attributes are functionally dependant on other attributes within the table, not just the primary key. Specifically, the total_dive_time and total_si are simply the result of summing other attributes, namely dive_down, dive_up and dive_time, rnt respectively. Similarly, the PADI repeat groups RG1 and RG2 are dependant on these dive time and any surface intervals. The decision has been made to retain these attributes within the table, reflecting what the divers themselves entered in their dive logs. The calculations can be done later as a means of crosschecking the divers entries.

2 Also known as a database schema.
3 A primary key is an attribute or a combination of attributes that contains a unique value to identify that record.
4 A foreign key is an attribute or a combination of attributes that is a primary key in another table.
5 A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one dive in t_dive can have one or more length records in t_lgth, but any one length record can only come from one dive.
6 Known as the cardinality of a relationship " the constraint on relationships so that the possible extensions of the relation correspond to real-world associations.
7 Also known as integrity checks.
8 Mackay, K. and Fisher, D. 1993: Marine Research database documentation. 7. Kina. MAF Fisheries Greta Point Internal Report No. 213. 19p.


Updated : 16 November 2007