3 Data Structures

3.1 Table relationships

This database contains several tables. The ERD for kina (Figure 1) shows the logical structure [ Also known as a database schema ] of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is selected to be represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in kina.

The kina database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in kina: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one landing in t_catch_site can have many samples in t_sample but one sample can only come from one catch site.]. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., t_sample, to the parent table; e.g., t_catch_site, with an arrowhead (indicating "one") pointing to the parent.

Every relationship has a mandatory or optional aspect to it. If a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_catch_site and it's child table t_sample. The symbol "o" by the child t_sample means that a catch site can have zero or many sample records, while the bar by the parent t_catch_site means that for every sample there must be a matching catch site record.

These links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record Constraints are shown in the table listings by the following format:



Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])




Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table t_biological:



Referential:

Sample no. not in t_sample (sample_no)

INSERT


t_sample (sample_no)




This means that the value of the attribute sample_no in the current record must already exist in the parent table t_sample or the record will be rejected and the following message will be displayed:



*** User Error: insert constraint 'sample no. not in t_sample" violation

For tables residing in external databases, the parent table name will be prefixed by the name of the database.

Section 5 lists all the kina tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the following format:



Indices:

UNIQUE index_name ON (attribute[, attribute])



where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing sample number.

The database listing (Tables 1-5) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:



Indices:

NORMAL (2, 15) index_name ON ( attribute[, attribute])



Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)..." in the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.

3.2 Database design

The overall database design is influenced by the duality of its function: to store commercial catch and effort data (from the SOP and where necessary, the participants); and to store biological information (from both the SOP and Marine Research).

3.2.1 Commercial catch and effort data structures

In the first case, the daily catch and effort data from catch log sheets are represented by the table t_catch_site (Table 1). Each record in the table contains catch and effort information by a participant, for one day's fishing at a site by diver. So if a participant has only one diver and the day's catch was from three sites, then there would be three records in this table. If a participant had two divers and each diver had three sites for the day's catch, then there would be six records in this table. The catch data (in kilograms) are held in the attribute site_catch_wt, and the effort data (time diving in decimal hours) are held in the attribute site_time_diving.

The primary key for this table is actually quite complex, being a combination of participant, fishing_date, divers_name, fma, and site_no. The site number (site_no) is only a logical identifier of a diver's dive site per day. Any one geographical site may have many logical site numbers assigned to it over time.

There are two attributes, latitude and longitude, for recording the geographical location of a dive site. However, this information is going to be limited to those divers with GPS, therefore, in addition to the latitude and longitude of a dive site, provided by the daily catch log sheet, one other attribute site_name has been included to increase the flexibility of catch data queries. This attribute is designed to hold the common geographical name of the dive site, e.g., the name of the bay, island, etc. It is not vital for this attribute to be used at the time of data entry, but may inserted at a much later date when necessary.

In order to enhance relationships between t_catch_site and other tables within kina, two further attributes are added. The first, grid_ref, records which grid reference a dive site is located. This is vital in order to link to the catch sampling data, and will be discussed in more detail later. The second attribute, stratum, records which Research stratum area a dive site is located. This allows commercial catches to be used along side Research data.

3.2.2 Biological data structures

Both SOP catch sampling and the Research dive sampling programmes collect biological data.

Details of each sample collected produce a record in the table t_sample (Table 2).

All samples have a few attributes in common. Each sample collected, for both SOP and Research sampling, has its own unique sample number as defined by sample_no, which becomes the primary key. The date on which the sample was collected is recorded in the attribute fishing_date. The attribute divers_names records the name of the diver(s) who caught the sample. Both the SOP and Research samples are taken from within various strata as defined by the Kina Research Group, with the stratum number being recorded in the attribute stratum.

The remaining attributes are relevant just to the SOP samples. The attribute observer records the observers name; grid_ref records which catch sampling grid reference the sample was taken from (to link the sample back to a catch record); and percent_samp records what percentage of the total catch from the site was sampled.

Each sample produces a test diameter frequency that is held in the table t_test_dia (Table 3). Each record in this table contains a diameter class (stored by the attribute test_dia) and the number of kina that are in that class (stored by the attribute no_t).

A further 25 kina are taken from the sample for detailed biological analysis. The details of these biologicals are held in the table t_biological (Table 4). The attributes of this table are all reasonably self-explanatory - whole_weight records the whole weight (in grams) of the kina; test_diameter records the diameter (to the nearest millimetre) of the kina's test; gonad_volume records the volume (in millilitres) of the kina gonad; and gonad_colour records a numeric code (up to two digits) for the colour (and hence the quality) of the gonad. Note that SOP samples only have whole number codes for colour, while Research samples have decimals for increased precision.

3.2.3 Grid references

There is an additional table, t_grid_ref (Table 5), which has two roles in this database.

The first is to act as a coarse error check for invalid grid references. Before a catch or a sample record can be entered into their respective tables, the grid reference contained in the record must exist in t_grid_ref before it can be entered. Also, many of the possible grid references are invalid for such reasons as they occur completely on land or on water that is too deep for divers to reach.

The attribute comment} is used to distinguish between valid and invalid grid references. A null in the comment field means a valid grid reference; otherwise the comment becomes an error message such as "Grid reference on land!".

The second role that t_grid_ref has to play is to provide a link between the SOP samples and the Research samples by recording which Research stratum number each grid reference belongs to.

Note that not all grid references have a stratum number assigned to them, e.g., where grid references are on land.


Updated : 16 November 2007