3 Data Structures
3.1 Table relationships
The acoustic database comprises various related tables (see Figure 1 for an ERD of the entire database). The overall structure is centered on the files table (see Figure 2), with foreign key relationships to the tables associated with experiments (Figure 4), equipment (Figure 3), echo integration (Figure 5). In addition there are some temporary tables used to store information used during data loading, shown in Figure 6. All of the table attributes are shown in the ERDs.
Figure 1: ERD of entire database
(click to enlarge)
Each table represents an object, event, or concept in the real world that has been selected to be represented in the database. Each attribute of a table is a defining property or quality of the table.
The acoustic database is implemented as a relational database, and has one fundamental relationship that is repeated throughout the database — the one-to-many relationship. This is shown in the ERD by connecting a single line (indicating ’one’) from the parent table to the child table with a crows foot (indicating ’many’). For example, consider the relationship between experiments and files. This means that any one record in experiments relates to or more records in files, and one record in files must relate to only one record in experiments. The child end is indicated by a ‘crows-foot’ symbol.
Note that the one-to-many relationships can be either mandatory or optional. The optional relationship, denoted in the ERD by the symbol ’o’ at one or both ends of the relationship line, means that a record does not have to have any associated records. Conversely, the mandatory relationship denoted in the ERD by a bar symbol across the relationship line, means that a record has to have at least one associated record. For example, an experiment in experiments does not have to have a log book, but if it does, it can have many. Conversely a log book, in logbooks must relate to a record in experiments.
A key symbol to the left of the attribute name represents the primary key for the table.
Most of the tables in the acoustic database have some attributes, called foreign keys1, which define the relationships between the tables in the acoustic database.
Figure 2: ERD for the files and related tables
(click to enlarge)
Figure 3: ERD for the tables associated with the echosounder equipment
(click to enlarge)
Figure 4: ERD for the tables associated with an experiment
(click to enlarge)
Figure 5: ERD for the tables associated with echo integration analysis of the acoustic data
(click to enlarge)
Figure 6: ERD for temporary tables used to provide information during data loading, as well as some tables that contain global information.
(click to enlarge)
3.2 Database Design
The acoustic database was originally implemented as an object-oriented database using the ObjectStore product from eXcelon Corporation. However, for a number of reasons, the database has been moved to a relational model on the PostgreSQL database management system. Care has been taken to only use ‘standard’ database features to allow the database to be easily moved to other systems if required. In the process, the database structure was modified to incorporate a more complete set of the ancillary data that is necessary to interpret the raw acoustic data. This includes the results of calibration of the acoustic equipment, details on the acoustic equipment, water property information from survey areas, and details on the acoustic echo partitioning done as part of the production of biomass estimates from the acoustic data. The intention is to have a database that contains all the information required to correctly interpret the acoustic data without recourse to external sources of information.
The database uses several concepts to structure the acoustic data, and to provide intuitive access pathways for the database user. An experiment uses equipment, collects data, is performed during a certain time period and is performed in a certain location. In the past, an experiment has been taken to be synonymous with a voyage of a vessel conducting a survey. However, surveys have extended over several voyages. Equally, several experiments have been carried out during a single voyage. Data collection has always been organised primarily around voyages and stored in the first instance as sets of directories and files in a voyage directory. This is a convenient organisation for voyage oriented processing tasks such as the production of biomass estimates; it is easy to understand and relates in a straightforward manner to other fisheries data (e.g. trawl data) which are also organised around voyages. The database structure has been designed to reflect this useful organisational structure, but to also allow access to the data in other ways that suit other analyses.
The concept of an experiment is represented in the database with an experiment table, and that of a file of acoustic data with a file table. There is a one-to-many relationship between the experiment and file tables. The equipment that was used to collect a file of acoustic data is represented with an equipment table, and a one-tomany relationship is maintained between the experiment table and the file table. All other tables in the database are directly or indirectly connected to these three tables.
Aside from the actual acoustic data collected for each ping of the echosounder, a set of ancillary data is also collected. These data are stored in the ping table, and include such data as the vessel and echosounder latitude and longitude, the sea and air temperature, etc. Periodically, the echosounder operator can enter electronic notes into the data collection system and these are stored in the filenote table. Both the ping and filenote tables have many-to-one relationships with the file table.
The components that make up an echosounder are represented in the components table, which covers both the physical equipment that constituted the echosounder and the software settings that define its behaviour. This information is represented as (parameter, value) pairs, with a classification applied to the parameter values to allow selection of different parts of the echosounder. There is a one-to-many relationship between the equipment and components table. The value for some of the parameters has special meaning and two auxiliary tables exist to provide further data. These are the transducer and padcs table. For example, if component id is ’transducer id’, the corresponding component_value will exist in the transducers table as the transducer id column.
Each acoustic data file that is collected can potentially have a calibration associated with it. These are usually obtained from experiments conducted with acoustic targets of known reflectively and are typically done before, during, and after a voyage of experiment. See [Coombs et al. 2003] for details on the procedures used. The resulting G, Vt, and C values are stored in the slplussrt table, and a one-to-many relationship is maintained between this table and the equipment table.
The raw acoustic data collected by an echosounder is voluminous, and is usually only of interest when viewed with a software programme designed for visualising and analysing such data. For these reasons, the actual acoustic echo data is not stored in the database, but as files external to the database management system. The DBMS stores all other data pertaining to the acoustic data.
3.3 Biomass estimation
Producing biomass estimates from acoustic survey data requires that echoes from fish close to the seabed be separated from echoes from the bottom, and that aggregations of fish be marked by a region, in which echo integration is done. The bottoms and regions produced during an analysis are stored in the database, along with information on how they are to be combined to produce the biomass estimates. The tables associated with this section of the database link to the rest of the database via the files table only.
3.4 Spatial attributes
The core database design does not include any spatial attributes, even though much of the data has a spatial component - latitude/longitude pairs are stored as plain floating point numbers. However, the PostGIS extensions available in the PostgreSQL database have been used to add a spatial element to the acoustic database, and hence to provide the means for spatially aware software packages to treat the acoustic data in a spatial manner. In particular, the files table has an extra column containing a linestring datatype that stores the start and finish position of the data in a file. An extra table (files_transect_detailed) has been created that contains the ping-by-ping position collected by acoustic echosounders in the form of a linestring. Another table (files_transect_simplified) contains a simplified, but still representative, version of this line obtained by using the Douglas-Peuker algorithm. Several views have been defined that combine these detailed and simplified tables and the data in the files table (note that all view names in the database start with v_ ). Details on the extra tables and views are contained in Section 5.
A foreign key is any attribute, or a combination of attributes, in a table that is a primary key of another table. Tables are linked together through foreign keys