3 Data Structures

3.1 Table relationships

This database contains several tables. The ERD for CTD (Figure 1) shows the logical structure [Also known as a database schema.] of the database and it's entities (each entity is implemented as a database table) and relationships between these tables and tables in other databases. This schema is valid regardless of the database system chosen, and it can remain correct even if the Database Management System (DBMS) is changed. Each table represents an object, event, or concept in the real world that has been selected to be represented in the database. Each attribute of a table is a defining property or quality of the table. All of the table's attributes are shown in the ERD. The underlined attributes represent the table's primary key[A primary key is an attribute or a combination of attributes that contains an unique value to identify that record.].

Note that Figure 1 shows the main tables only. Note that most tables contain foreign keys [A foreign key is an attribute or a combination of attributes that is a primary key in another table.]. These foreign keys define the relationships between the tables in CTD.

The CTD database is implemented as a relational database; i.e., each table is a special case of the mathematical construct known as a relation and hence elementary relation theory is used to deal with the data within tables and the relationships between them. There are three types of relationships possible between tables, but only one exists in CTD: one-to-many [A one-to-many relationship is where one record (the parent) in a table relates to one or many records (the child) in another table; e.g., one trip in t_trip can have many stations in t_station but one station can only come from one trip.]. These relationships can be seen in ERDs by connecting a single line (indicating "many") from the child table; e.g., t_bottle_casts, to the parent table; e.g., t_station, with an arrowhead (indicating "one") pointing to the parent.

Every relationship has a mandatory or optional aspect to it. If a relationship is mandatory, then it has to occur at least once, while an optional relationship might not occur at all. For example, in Figure 1, consider that relationship between the table t_station and it's child table t_bottle_casts.

The symbol "0" by the child t_bottle_casts means that a station can have zero or many bottle cast records, while the bar by the parent t_station means that for every bottle cast record there must be a matching station record.

These links are enforced by referential constraints [Also known as integrity checks.]. Constraints do not allow orphans to exist in any table; i.e., where a child record exists without a related parent record. This may happen when: a parent record is deleted; the parent record is altered so the relationship is lost; or a child record is entered without a parent record Constraints are shown in the table listings by the following format:

Referential:

constraint name (attribute[, attribute])

|INSERT|

|DELETE|


parent table (attribute[, attribute])




Note that the typographical convention for the above format is that square brackets [] may contain more than one item or none at all. Items stacked between vertical lines | | are options of which one must be chosen.

For example, consider the following constraint found in the table t_station:



Referential:

voycheck (trip_code)

INSERT t_trip (trip_code)



This means that the value of the attribute trip_code in the current record must already exist in the parent table t_trip or the record will be rejected and the following message will be displayed:



*** User Error:

insert constraint "voycheck"

violation



For tables residing in external databases, the parent table name will be prefixed by the name of the database.

Section 5 lists all the CTD tables as implemented by the Empress RDBMS. As can be seen in the listing of the tables, a table's primary key has an unique index on it which means that no duplicate values or null values are allowed for the attribute(s). Primary keys are generally listed using either of the following formats:



Indices:

UNIQUE index_name ON

(attribute[, attribute])

Or:

Indices:

PRIMARY KEY BTREE ON

(attribute[, attribute])



where attribute(s) make up the primary key and the index name is the primary key name. These prevent records with duplicate keys from being inserted into the tables; e.g., a record with an existing sample number.

The database listing (Tables 1-5) show that the tables also have indices on many attributes. That is, attributes that are most likely to be used as a searching key have like values linked together to speed up searches. These indices are listed using the following format:



Indices:

NORMAL (2, 15) index_name ON

(attribute[, attribute])



Note that indices may be simple, pointing to one attribute or composite pointing to more than one attribute. The numbers "...(2, 15)...n the syntax are Empress RDBMS default values relating to the amount of space allocated for the index.

3.2 Database design

Physical oceanography data are collected from casts [A cast is an event where oceanographic instruments are deployed to obtain measurements at depth.] conducted at stations during a trip.

The table t_trip (Table 1) lists trips during which oceanographic data were collected, with a brief description and the processing status for each voyage. The processing status indicates whether all relevant data that can be recovered from a voyage have been loaded into the database.

There is generally only one cast at each station, but occasionally, such as when there are comparisons between instruments or where there are gear problems, there can be more than one.

For each cast data such as station position, date, time, gear methods used, bottom depth, maximum probe depth and mixed layer depth are recorded. These data are contained in table t_station (Table 2).Records exist on t_station for almost every trip in t_trip. However, there are trips from which the data could not be recovered so there are no records on t_station for these.

The type, amount and accuracy of data collected depend on the gear used in a cast. Vertical profiles of temperature can be obtained from each cast. Mechanical bathythermographs (MBTs) and expendable bathythermographs (XBTs) give continuous temperature profiles. From casts where a ctd probe is used or where water samples are collected, vertical profiles of salinity as well a temperature can be obtained.

The gear methods used in a cast are held in gear_meth on t_station. This attribute is defined as a string of up to three two-digit code numbers, each code number being equal to and having the same meaning as a value of code in table meth_codes, the method codes table in the rdb database. This allows for situations where different gear methods were used during one cast. However, gear_meth cannot be a foreign key to meth_codes because it is defined as a string and may contain more than one code. This is a departure from third normal form. It would be possible to create a table with one record for each cast and gear method, but there would be difficulty in the selection of data where gear methods were specified and more than one method was used in a cast.

If a method of position fixing is known for a station the attribute fix_method must be a valid code equal to a value of fix_method_code in the table t_fix_method_codes on the rdb database. Similarly, if an area exists on a record on t_station, then it must be a valid area code as listed in the table area_codes on the rdb database.

CTD probes provide effectively continuous temperature and conductivity data throughout a cast. Summarised temperature, conductivity, salinity, specific gravity anomaly ( t s ) and speed of sound data at 1-metre depth intervals are on table t_metres (Table 3). This table is linked to t_station and has a primary key of trip_code, station_no, cast, and depth. It is a large table, with over 300,000 records. In many cases it is sufficient to obtain data from standard depths (ie. 10, 20, 30, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1750 and 2000 metres). To obtain these data more quickly a view, v_std_depths (View 3.1), has been created on the table t_metres that contains the same data as t_metres but only at the standard depths listed above.

Discrete temperature and salinity data can be obtained by bottle hydrocasts. These casts are listed in t_bottle_casts (Table 4), which is linked to the table t_station and has a primary key of trip_code, station_no, and cast. Temperature data are collected using reversing thermometers. The mercury column of a reversing thermometer is broken by suddenly flipping the thermometer when it has reached the desired depth, thereby recording temperature at that depth. On many casts a ctd probe was also deployed so the more accurate temperature measurements would have come from the ctd. The more accurate method used to measure temperature in each bottle cast is held in t_bottle_casts. The accuracy of the measurement of the salinity (or conductivity) of water samples depends on the salinometer used, so this information is also contained in the table.

Temperature and conductivity values in bottle casts and the method for measuring depth at which each sample was taken are listed in t_bot_values (Table 5). This table has a primary key of trip_code, station_no, cast, depth and sample, and is linked to the table t_bottle_casts. The accuracy of measurement of depth depends on the method by which depth was determined. The method of measuring depth may change during a cast so each record on t_bot_values has a code for depth measurement method. These methods are discussed in Appendix A. If a ctd probe was used with bottles, then ctd temperature and conductivity values corresponding to the depths at which water samples were taken are also listed on this table as are the conductivity values of the water samples, back-calculated to the pressures and temperatures at which the samples were obtained.

For trips where casts used a ctd probe in association with bottle sampling, ctd conductivity can be calibrated against conductivity of the water samples, using values from t_bot_values. These calibrations may change between stations because occasionally the conductivity sensor response may change suddenly. For this reason t_station has a foreign key, cond_cal_no, to table t_cond_cals (Table 6). The table t_cond_cals lists the date and method of each calibration of each conductivity sensor and is linked to t_con_coeffs (Table 7), which shows the conductivity correction coefficients derived from each calibration.

Pressure and temperature sensors are periodically calibrated in a laboratory. The correction coefficients from these calibrations are applied to the pressure and temperature as measured by the CTD probe when the data are being initially processed.

If a CTD probe was used in the cast, pressure and temperature calibration numbers (press_cal_no and temp_cal_no) will be listed. These are foreign keys to t_press_cals (Table 8) and t_temp_cals (Table 9) respectively. These tables provide information on calibration date and method and provide links to the tables t_pre_coeffs (Table 10) and t_tem_coeffs (Table 11), which list the pressure and temperature correction coefficients respectively. These coefficients are used to correct raw CTD pressure and temperature data when they are being summarised.



Updated : 16 November 2007