3 Data structures

3.1 Table relationships

This database contains several tables. The ERD for oyster (Figure 1) shows the logical structure [Also known as a database schema.] of the database and its entities (each entity is implemented as a database table) and relationships between these tables. Each table represents an object, event, or concept in the real world that has been represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.]. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Most of the tables in the oyster database also contain special attributes, called foreign keys [A foreign key is any attribute, or a combination of attributes, in a table that is a primary key of another table. Tables are linked together through foreign keys.].

Section 5 shows a listing of all the oyster tables as implemented by the EMPRESS DBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the format:

Indices: UNIQUE index_name ON (attribute [, attributes ])



where the attribute(s) make up the primary key (the key attributes) and the index name is the primary key name. This prevents records with duplicate key values from being inserted into the table, e.g., a new survey being inserted with an existing survey number.

The oyster database is implemented as a relational database. That is, tables are linked to one another by their relationships. All relationships in oyster are of the type one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one survey in t_survey can have many stations in t_station but any one station can only come from one survey.]. This is shown in the ERD by connecting a single line (indicating "many") from the child table (e.g., t_station) to the parent table (e.g., t_survey) with an arrow-head (indicating "one&quo4t;) pointing to the parent. For example, consider the relationship between the tables t_survey (the parent table) and t_station (the child table). Any one survey in t_survey can have one or more stations in t_station, but any one station can only be a part of one survey.

Every relationship has a mandatory or optional aspect to it. That is, if a relationship is mandatory, then it has to occur and least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_survey and it's child table t_damage. The symbol "O" by the child t_damage means that a survey record can have zero or many damage records, while the bar by the parent t_survey means that for every damage record there must be a matching survey record.

These relationships are enforced in the database by the use of referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record. Constraints are shown in the table listings by the following format:

Referential: error message (attribute) INSERT parent table (attribute)



For example, consider the following constraint found in the table t_station:

Referential: No such survey (survey) INSERT t_survey (survey)



This means that the value of the attribute survey in the current record must already exist in the parent table t_survey or the record will be rejected and the error message "No such survey" will be displayed.

All tables are indexed. That is, attributes that are most likely to be used for searching, such as survey, have like values linked together to optimise search times. Such indices are shown in the table listings (Section 5) by the following syntax:

Indices: NORMAL (2, 15) index_name ON (attribute{, attributes})



Note that indices may be simple, pointing to just one attribute, or composite, pointing to more than one attribute. The numbers "...(2, 15)..." are Empress default values relating to the amount of space allocated to index storage.

3.2 Database design

The genesis of oyster lies in the need for a central data repository for data collected by the Foveaux Strait oyster survey programme. The data were originally residing on several PCs and in a variety of formats while analyses were carried out.

Once the analyses were completed, the groomed data, usually in the form of spreadsheets, were made available for loading onto the Empress RDBMS. These spreadsheets form the basis of the database table design and on a conceptual level an attempt has been made to make these tables as robust as possible for any further surveys that may eventuate.

3.2.1 Oyster dredge surveys

Conceptually, a dredge survey has two main entities: a survey entity (t_survey, Table 1) and a station entity (t_station, Table 4) with one survey comprising of many stations. However, during analysis, each survey's data was stored in it's own single spreadsheet, with each record in the spreadsheet broadly corresponding with a station (or dredge) site. Hence, these spreadsheets collectively represent the station entity and were loaded verbatim into the table t_station. The attribute survey was added to t_station, as a number to identify which survey the station was a part of. The table t_survey contains these codes and the survey title that they represent.

Different surveys employed a variety of sampling methodologies, from fixed grid stations to stratified random sampling. All these methodologies are accommodated in the station table resulting in a wide range of attributes, from station_no and sample_no, to grid_no. However, for the stratified surveys any data defining strata was seldom given for loading into the database. Any stratum data that has been presented to the Mfish Data Manager has be loaded into the table t_stratum (Table 3).

Spreadsheet design mentality dictates that all attributes are represented in columns, with many of the columns being used to hold results of various calculations. However, standard database design theory dictates that tables should, where possible, conform to certain normalization rules. The purpose of these normalization rules is to produce a databases design that is highly flexible, allowing the schema to be extended when needed to account for new attributes, entities, and relationships. They can also reduce redundancy in the database and make sure the database is free from certain anomalies. Usually, the aim is to conform to the first three rules. That is, the database should be in a 3rd Normal Form (3NF).

All oyster survey spreadsheets to date are in 1NF in that every attribute is single-valued for each record. However, to be 2NF, all the non-key attributes have to be fully dependent on the primary key; i.e., each attribute in a record has to be dependant on the attributes survey and station_no.

Here, the spreadsheets failed because the vessel name is dependent on the vessel registration number, not the station primary key. To conform to 2NF, the vessel names are therefore stored in their own table t_vessel (Table 2) and are linked to t_station by the attribute vess_id.

To make t_station 3NF requires that every non-key attribute is not transitively dependent on the primary key. That is, there should be no attributes that are the result of calculations of other attributes in the table. The problems arise because any changes in the values of the other attributes will not automatically change the value of the transitively dependent attribute, resulting in data anomalies, unlike spreadsheets, which can do this using formula. Because the original spreadsheets were used for analyses, they have several attributes that fall into this category (such as the attribute stand_live1 that is calculated from the attributes no_live1 and distance). These transitively dependent attributes should be removed from t_station, but because oyster is used as a static data repository rather than a dynamic data analysis tool, these anomalies will not appear. Therefore, they remain in t_station.

The table t_station contains a number of attributes for absolute, percentage, and standardised abundance of oysters dredged at each station; e.g., no_live1, stand_live2, and bonamia3. A number


to denote the shell size range they represent suffixes these attributes, with: 1 = takeable (legal size >= 58mm); 2 = sublegal (< 58mm); 3 = pre-recruits (between 50 and 57mm); 4 = juveniles (< 50mm); 5 = combined pre-recruit and takeable (>= 50mm).

Measurements of individual oysters are stored in the table t_biological (Table5). Measurements include: shell length, width and height, shell weight, meat weight, and an index for the level of bonamia infection.

3.2.2 Survey on the impact of dredging on oysters

The exception to the survey/station relationship is the 1997 survey on the impact of dredging on Foveaux Strait oysters, which looked at differences of levels of damage sustained by oysters at harvesting by different dredging methods. Conceptually, this survey is a one-to-many relationship between the survey entity (t_survey) and the damage entity (t_damage, Table 6). Individual records in t_damage don"t represent a station, but a cluster of oysters, where a cluster is defined as either one adult (legal size >= 58mm) oyster or a clump of juvenile spat formed around a single nucleus.

Much of the data in t_damage are in the form of numeric codes. Separate tables contain the definitive lists of these codes, and their descriptions. The codes are for the types of damage to oysters (t_damage_codes, Table 8), treatment (or method of dredging) of oysters at harvesting (t_treatment_codes, Table 7), and cluster types (t_cluster_codes, Table 9).



Updated : 16 November 2007