3 Data Structures

3.1 Table relationships

This database contains several tables. The ERD for hmetal (Figure 1) shows the logical structure1 of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that is selected to be represented in the database. Each attribute of a table is a defining property or quality of the table.

All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key [A primary key is an attribute or a combination of attributes that contains an unique value to identify that record ].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in hmetal.

The hmetal database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in hmetal: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one site in sites can have many fish samples in fish_samples but one fish sample can only come]. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., fish_samples, to the parent table; e.g., sites, with an arrowhead (indicating "one") pointing to the parent. Note that the word "many" applies to the possible number of records in one table that one record in another table is associated with. For a given instance, there might be zero, one, two, or more associated records, but if it is ever possible to have more than one, we use the word "many" to describe the association.

Every relationship has a mandatory or optional aspect to it. If a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table sites and it's child table fish_samples. The symbol "O" by the child fish_samples means that a catch site can have zero or many sample records, while the bar by the parent sites means that for every sample there must be a matching catch site record.

These links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record 1 Also known as a database schema.

Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])




Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table fish_samples:

Referential:

Invalid sample location (samp_location)

INSERT


sites (samp_location)




This means that the value of the attribute samp_location in the current record must already exist in the parent table sites or the record will be rejected and the following message will be displayed:

*** User Error: insert constraint "Invalid sample location" violation

For tables residing in external databases, the parent table name will be prefixed by the name of the database.

Section 5 lists all the hmetal tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it. Primary keys are generally listed using the following format:

Indices: UNIQUE index_name ON (attribute[, attribute])

where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing area code.

The database listing (Tables 1-5) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:

Indices: NORMAL (2, 15) index_name ON ( attribute[, attribute])

Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)..." in the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.

3.2 Database design

The hmetal database is constructed around two main entities: a sample site and a fish sample. A third entity has been added to record a spatial component to these data.

The first table sites (Table 1) contains area codes and location details of sites from which fish samples were taken for analysis.

In order to allow spatial analysis of the data, a third table of positions areas (Table 2) was created.

This table assigns each of the sample sites with a latitude and longitude based on the value of the area attribute.

Individual fish samples are recorded in the fish_samples table (Table 3). Details recorded include sample date and location, species code, fish length, weight, sex, and amounts of various organochlorine insecticides and heavy metals such as DDT, zinc, arsenic, lead, and mercury. All amounts were recorded in as total mg per kg-1 muscle tissue. Fish samples are linked to the sites table by the attribute samp_location.

A view v_main (Figure 2) allows users to perform a denormalised multi-table join on all these tables to select complete records, which include details fish samples, their sample sites, and positions.



Updated : 16 November 2007