3 Data structures

3.1 Table relationships

This database contains several tables. The ERD for recruit (Figure 1) shows the logical structure [ Also known as a database schema.] of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is chosen to be represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in recruit.

The recruit database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in recruit: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one sample in t_site can have many length frequency records in t_lgth but one length frequency records can only come from one sample site.]. This is shown in the ERD by connecting a single line (indicating "many") from the child table (e.g., t_lgth) to the parent table (e.g., t_site) with an arrow-head (indicating "one") pointing to the parent. For example, consider the relationship between the tables t_site (the parent table) and t_lgth (the child table).

Any one sample in t_site can generate one or more length frequency records in t_lgth, but any one length frequency record can only come from one sample. Note that the word "many" applies to the possible number of records in one table that one record in another table is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

Every relationship has a mandatory or optional aspect to it. That is, if a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider again that relationship between the table t_site and it's child table t_lgth.

The symbol "O" by the child t_lgth means that a sample record may have zero or many length frequency records, while the bar by the parent t_site means that for every length frequency record there must be a matching sample record.

These relationships are enforced in the database by the use of referential constraints[Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record. All constraints in recruit prevent the latter from occurring. Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT| |DELETE|


parent table (attribute[, attribute])




For example, consider the following constraint found in the table t_lgth:

Referential:

invalid id (id) INSERT t_site (id)



This means that the value of the attribute id in the current record must already exist in the parent table t_site or the record will be rejected and the following message will be displayed:

*** User Error: insert constraint "invalid id" violation



For tables residing in external databases, the parent table name will be prefixed by the name of the database.

Section 5 details a listing of all the recruit tables as implemented by the EMPRESS DBMS. These table show that a table's primary key has an unique index on it. Primary keys are generally listed using the format:

Indices:

UNIQUE index_name ON ( attribute [ , attributes])



where the attribute(s) make up the primary key (the key attributes), and the index name is the primary key name. Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all.

The unique index prevents records with duplicate key values from being inserted into the table; e.g., a new sample being inserted with an existing id number, and hence ensures that every record can be uniquely identified.

The recruit database is implemented as a relational database. That is, each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. All relationships in recruit are of the type one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one sample in t_site can have many length frequency records in t_lgth but any one length frequency record can only come from one sample.].

All tables are indexed. That is, attributes that are most likely to be used for searching, such as id, have like values linked together to optimise search times. Such indices are shown in the table listings (Section 5) by the following syntax:

Indices:

NORMAL (2, 15) index_name ON ( attribute{, attributes})

Note that indices may be simple, pointing to just one attribute, or composite, pointing to more than one attribute. The numbers "...(2, 15)..." are EMPRESS default values relating to the amount of space allocated to index storage.



3.2 Database design

The details for each sample undertaken are recorded in the table t_site (Table 1). Each sample is allocated an identification number, id, which is unique for every sample. The site sampled is recorded as a 3-character code and stored in the attribute site. Codes are used, rather than full spelling of the site names, to prevent problems caused by spelling mistakes and case differences.

The attribute site is also a foreign key to the table t_site_codes. The actual catch of the target species is recorded in the attribute no_target_sp, with the target species code recorded in the attribute target_sp. Any bycatch is recorded in the attribute bycatch in a semi-descriptive text format, usually describing bycatch species by their 3-character species codes. For example, the value of bycatch of "22 YEM 1 PUF 2 PIL" means 22 yellow-eyed mullet, 1 pufferfish, and 2 pilchards. Other information recorded in this table includes sampling date and time, and the time of the nearest high tide.

Details for individual sites used during sampling are stored in the table t_site_codes (Table 2). Each record in this table is uniquely identified by a 3-character code in the attribute site. Each site record contains a full description of the site and the general geographic region, and is accurately located by a NZMS 260 map number and grid reference.

Length frequencies of fish caught are stored in the table t_lgth (Table 3). Each record contains: no_a, the number of fish (frequency) caught at a length class; lgth, the length class (mm); and the 1-character code measure_meth to identify the method used to measure the fish (e.g., total length, fork length, etc).


Figure 1: Entity Relationship Diagram (ERD) for the recruit database.



Updated : 16 November 2007