3 Data Structures
3.1 Table relationships
This database contains several tables. The ERD for CTD
(Figure 1) shows the logical structure [Also known as a database
schema.] of the database and it's entities (each entity is
implemented as a database table) and relationships between these
tables and tables in other databases. This schema is valid regardless
of the database system chosen, and it can remain correct even if the
Database Management System (DBMS) is changed. Each table represents
an object, event, or concept in the real world that has been selected
to be represented in the database. Each attribute of a table is a
defining property or quality of the table. All of the table's
attributes are shown in the ERD. The underlined attributes represent
the table's primary key[A primary key is an attribute or a
combination of attributes that contains an unique value to identify
that record.].
Note that Figure 1 shows the main tables only. Note that most
tables contain foreign keys [A foreign key is an attribute or a
combination of attributes that is a primary key in another table.].
These foreign keys define the relationships between the tables in
CTD.
The CTD database is implemented as a relational database;
i.e., each table is a special case of the mathematical construct
known as a relation and hence elementary relation theory is used to
deal with the data within tables and the relationships between them.
There are three types of relationships possible between tables, but
only one exists in CTD: one-to-many [A one-to-many
relationship is where one record (the parent) in a table relates to
one or many records (the child) in another table; e.g., one trip in
t_trip can have many stations in t_station but one station can only
come from one trip.]. These relationships can be seen in ERDs by
connecting a single line (indicating "many") from the child
table; e.g., t_bottle_casts, to the parent table; e.g., t_station,
with an arrowhead (indicating "one") pointing to the
parent.
Every relationship has a mandatory or optional aspect to it. If a
relationship is mandatory, then it has to occur at least once, while
an optional relationship might not occur at all. For example, in
Figure 1, consider that relationship between the table t_station and
it's child table t_bottle_casts.
The symbol "0" by the child t_bottle_casts means that a
station can have zero or many bottle cast records, while the bar by
the parent t_station means that for every bottle cast record there
must be a matching station record.
These links are enforced by referential constraints [Also known as
integrity checks.]. Constraints do not allow orphans to exist in any
table; i.e., where a child record exists without a related parent
record. This may happen when: a parent record is deleted; the parent
record is altered so the relationship is lost; or a child record is
entered without a parent record Constraints are shown in the table
listings by the following format:
|
Referential:
|
constraint name (attribute[,
attribute])
|
|INSERT|
|DELETE|
|
|
|
parent table (attribute[, attribute])
|
|
Note that the typographical convention for the above format is
that square brackets [] may contain more than one item or none at
all. Items stacked between vertical lines | | are options of which
one must be chosen.
For example, consider the following constraint found in the table
t_station:
|
Referential:
|
voycheck (trip_code)
|
INSERT t_trip (trip_code)
|
This means that the value of the attribute trip_code in the
current record must already exist in the parent table t_trip or the
record will be rejected and the following message will be displayed:
|
*** User Error:
|
insert constraint "voycheck"
|
violation
|
For tables residing in external databases, the parent table name
will be prefixed by the name of the database.
Section 5 lists all the CTD tables as implemented by the
Empress RDBMS. As can be seen in the listing of the tables, a table's
primary key has an unique index on it which means that no duplicate
values or null values are allowed for the attribute(s). Primary keys
are generally listed using either of the following formats:
|
Indices:
|
UNIQUE index_name ON
|
(attribute[, attribute])
|
Or:
|
Indices:
|
PRIMARY KEY BTREE ON
|
(attribute[, attribute])
|
where attribute(s) make up the primary key and the index name is
the primary key name. These prevent records with duplicate keys from
being inserted into the tables; e.g., a record with an existing
sample number.
The database listing (Tables 1-5) show that the tables also have
indices on many attributes. That is, attributes that are most likely
to be used as a searching key have like values linked together to
speed up searches. These indices are listed using the following
format:
|
Indices:
|
NORMAL (2, 15) index_name ON
|
(attribute[, attribute])
|
Note that indices may be simple, pointing to one attribute or
composite pointing to more than one attribute. The numbers "...(2,
15)...n the syntax are Empress RDBMS default values relating to the
amount of space allocated for the index.
3.2 Database design
Physical oceanography data are collected from casts [A cast is an
event where oceanographic instruments are deployed to obtain
measurements at depth.] conducted at stations during a trip.
The table t_trip (Table 1) lists trips during which oceanographic
data were collected, with a brief description and the processing
status for each voyage. The processing status indicates whether all
relevant data that can be recovered from a voyage have been loaded
into the database.
There is generally only one cast at each station, but
occasionally, such as when there are comparisons between instruments
or where there are gear problems, there can be more than one.
For each cast data such as station position, date, time, gear
methods used, bottom depth, maximum probe depth and mixed layer depth
are recorded. These data are contained in table t_station (Table
2).Records exist on t_station for almost every trip in t_trip.
However, there are trips from which the data could not be recovered
so there are no records on t_station for these.
The type, amount and accuracy of data collected depend on the gear
used in a cast. Vertical profiles of temperature can be obtained from
each cast. Mechanical bathythermographs (MBTs) and expendable
bathythermographs (XBTs) give continuous temperature profiles. From
casts where a ctd probe is
used or where water samples are collected, vertical profiles of
salinity as well a temperature can be obtained.
The gear methods used in a cast are held in gear_meth on
t_station. This attribute is defined as a string of up to three
two-digit code numbers, each code number being equal to and having
the same meaning as a value of code in table meth_codes, the method
codes table in the rdb database. This allows for situations where
different gear methods were used during one cast. However, gear_meth
cannot be a foreign key to meth_codes because it is defined as a
string and may contain more than one code. This is a departure from
third normal form. It would be possible to create a table with one
record for each cast and gear method, but there would be difficulty
in the selection of data where gear methods were specified and more
than one method was used in a cast.
If a method of position fixing is known for a station the
attribute fix_method must be a valid code equal to a value of
fix_method_code in the table t_fix_method_codes on the rdb database.
Similarly, if an area exists on a record on t_station, then it must
be a valid area code as listed in the table area_codes on the rdb
database.
CTD probes provide
effectively continuous temperature and conductivity data throughout a
cast. Summarised temperature, conductivity, salinity, specific
gravity anomaly ( t s ) and speed of sound data at 1-metre depth
intervals are on table t_metres (Table 3). This table is linked to
t_station and has a primary key of trip_code, station_no, cast, and
depth. It is a large table, with over 300,000 records. In many cases
it is sufficient to obtain data from standard depths (ie. 10, 20, 30,
50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900,
1000, 1100, 1200, 1300, 1400, 1500, 1750 and 2000 metres). To obtain
these data more quickly a view, v_std_depths (View 3.1), has been
created on the table t_metres that contains the same data as t_metres
but only at the standard depths listed above.
Discrete temperature and salinity data can be obtained by bottle
hydrocasts. These casts are listed in t_bottle_casts (Table 4), which
is linked to the table t_station and has a primary key of trip_code,
station_no, and cast. Temperature data are collected using reversing
thermometers. The mercury column of a reversing thermometer is broken
by suddenly flipping the thermometer when it has reached the desired
depth, thereby recording temperature at that depth. On many casts a
ctd probe was also deployed
so the more accurate temperature measurements would have come from
the ctd. The more accurate
method used to measure temperature in each bottle cast is held in
t_bottle_casts. The accuracy of the measurement of the salinity (or
conductivity) of water samples depends on the salinometer used, so
this information is also contained in the table.
Temperature and conductivity values in bottle casts and the method
for measuring depth at which each sample was taken are listed in
t_bot_values (Table 5). This table has a primary key of trip_code,
station_no, cast, depth and sample, and is linked to the table
t_bottle_casts. The accuracy of measurement of depth depends on the
method by which depth was determined. The method of measuring depth
may change during a cast so each record on t_bot_values has a code
for depth measurement method. These methods are discussed in Appendix
A. If a ctd probe was used
with bottles, then ctd
temperature and conductivity values corresponding to the depths at
which water samples were taken are also listed on this table as are
the conductivity values of the water samples, back-calculated to the
pressures and temperatures at which the samples were obtained.
For trips where casts used a ctd
probe in association with bottle sampling, ctd
conductivity can be calibrated against conductivity of the water
samples, using values from t_bot_values. These calibrations may
change between stations because occasionally the conductivity sensor
response may change suddenly. For this reason t_station has a foreign
key, cond_cal_no, to table t_cond_cals (Table 6). The table
t_cond_cals lists the date and method of each calibration of each
conductivity sensor and is linked to t_con_coeffs (Table 7), which
shows the conductivity correction coefficients derived from each
calibration.
Pressure and temperature sensors are periodically calibrated in a
laboratory. The correction coefficients from these calibrations are
applied to the pressure and temperature as measured by the CTD
probe when the data are being initially processed.
If a CTD probe was used in the cast, pressure and
temperature calibration numbers (press_cal_no and temp_cal_no) will
be listed. These are foreign keys to t_press_cals (Table 8) and
t_temp_cals (Table 9) respectively. These tables provide information
on calibration date and method and provide links to the tables
t_pre_coeffs (Table 10) and t_tem_coeffs (Table 11), which list the
pressure and temperature correction coefficients respectively. These
coefficients are used to correct raw CTD pressure and
temperature data when they are being summarised.