4. Database

This chapter explains how to work with a Geopsy Database. Creating a database on the disk is not a mandatory step. You can use Geopsy to view and to process signals on single file basis. However, you would miss a number of interesting features such as grouping of signals and automatic storage of modified signals.

This chapter contains the following sections:

A bit of theory: the internal database structure

Geopsy Database is not build on an existing database engine such as MySQL. The database is only made of a list of signals. A signal is a vector of numbers (double floating point real numbers, 64 bits) documented by a collection of information fields, typically the information extracted from file headers.

Information fields of a signal
Field name Type Description
Component string Name of the component, it must be one of the following keywords: Vertical, North or East
Count2Volt double The conversion factor between 'counts' and 'volts'. 'counts' are divided by this factor to obtain 'volts'. This value is read from the majority of signal files. By default this value is 1.0. When the factor is equal to unity, the amplitude scale of the signal is automatically considered as counts, else as volts. This parameter is read-only. Conversions to acceleration, velocity or displacements are currently not handled. Further improvements of the database will probably include this feature with a secondary table containing the responses of common sensors.
DeltaT double It is the sampling period expressed in period. This property, as well as SampFreq, can be modified. However, make sure that it corresponds exactly to the true recording frequency sampling rate.
DupAverage[ ] double This parameter is useful for travel time tomography analysis. n varies from 1 to 10. For phase picking (see PICKn), a travel time from source to receiver should be equivalent to a travel time from receiver to source observed on another signal. Couples of PICKn are identified by a unique DupID. The average and the standard deviation for each couple is calculated in DupAverage{1...10} and DupStdErr{1...10}, respectively. This parameter is read-only.
DupID integer This parameter is useful for travel-time tomography analysis. Identifies uniquely a couple of signals with the same ray path, where sources and receivers are just swapped. This parameter is read-only.
DupStdErri[ ] double This parameter is useful for travel time tomography analysis. n varies from 1 to 10. See DupAverage{1...10} for details.
Duration double Time elapsed between the first and the last sample of the signal (in seconds). This parameter is not saved in the structure but calculated from DeltaT and NSamples. This parameter is read-only.
EndTime double Time elapsed between the first and the last sample of the signal (in seconds). This parameter is not saved in the structure but calculated from Duration and T0. This parameter is read-only.
FileName string The complete name of the signal file to which the signal belongs, including its path. This parameter is read-only.
FileNumber integer The number affected to the signal file to which the signal belongs. This number depends upon the order of loading files into the database. This parameter is read-only.
ID integer Unique number to reference the signal, used by groups. This value is read-only.
IsOriginalFile string Contains "Original" if the signal samples have not been affected by any signal processing. Otherwise, the field is blank. See saving a database for details. This value is read-only.
MaxAmplitude double It is the maximum amplitude reached by the signal for the whole Duration. The units depends upon the value of Count2Volts. Contrary to the other fields, calculating this value requires the signal samples to be loaded into memory. Hence using this field may slow down Geopsy. We advise using it only if necessary. This value is read-only.
Name string Arbitrary name to identify the signal, usually it is set to the name of the recording station.
NSamples integer The number of samples in the signal. This value is read-only. You can change the length of a signal by cutting it.
NumberInFile integer A signal file may contain various signals. This parameter is this index of this signal in its file. This parameter is read-only.
Pick[ ] double It is a time value that can be modified by the user, either by editing the field or by picking phases (with the mouse) on a graphic representation of the signal. This parameter is useful for travel time tomography analysis, to define time limits of a taper or of a signal cut, or for any processing that requires phase picking. Additionally, these values are frequently used as temporary storage for header edition.
Receiver{X,Y,Z} double The coordinates of the receiver where the signal was recorded (Cartesian system expressed in metres).
SampFreq double It is the sampling frequency expressed in Hz. This parameter is not saved in the structure but calculated from DeltaT. You can modified it, DeltaT is changed accordingly.
SignalPtr address The memory address of the block containing the signal information. This parameter is read-only and reserved for debug purpose.
Source{X,Y,Z} double The coordinates of the source for which the signal was recorded (Cartesian system expressed in metres). These fields are relevant to records where the source is clearly identified. It is generally useful for refraction and travel time tomography analysis.
T0 double The delay (in seconds) between the time reference and the first sample of the signal. It can be either positive or negative. Various formats are accepted: "[...]ss s", "[...]ss", or "". For all formats, a '-' sign can be added as a prefix.
TimeReference string It is the time reference with the format "DD/MM/YYYY hh:mm:ss". All signals recorded synchronously must have the same time reference. The T0 takes the distinct start-up times into account with an arbitrary precision in the time scale (time reference is limited to seconds). A good practice is to set the time reference to the day of acquisition and at midnight (19/05/2005 00:00:00). All T0 are then the number of seconds since the beginning of the day. The visualisation modules can handle such time to convert it into "hh:mm:ss" which corresponds to the true time of measurement.
Type char It is a single character that records the current type of signal: 'w' for waveforms, 's' for frequency spectra, and 't' for arrival time without signal. The type is read-only, you cannot modify it directly. Conversion from 'w' to 's' and vice-versa is done after a Fourier transform.

Notes: "double" means double floating point real numbers coded on 64 bits, "integer" means a positive or negative integer, "string" means any string of characters, and "char" is a single character. The fields marked in bold represent the most important parameters that must be correctly defined to allow a visualisation of the signal.

When a new signal file is loaded into the database, a new memory structure is allocated for the signal and the fields listed above are filled in from the information contained in the file header. The information extracted from the file header depends upon the file format (see Load signal files).

The signal samples are never directly read on opening a file which greatly speeds up the signal handling for the user comfort. According to the user actions (e.g. visualisation of traces), it might be necessary to load the samples into memory. In Geopsy core engine (library geopsycore), a special mechanism has been developed to cache the signal vectors (keep signals in memory as long as possible until no space is left, then purge rationally according to space needed). From the user point of view, it might be noticed that the first time a signal is visualised, it may be slower than for any later access.

Any subset of the total ensemble of signals can be created. The information is never duplicated because subset are defined by pointers to the original signal structures. The subsets are visualised through tables, graphics and maps detailed in other sections.

Why creating a database on disk?

The various signal file formats available in seismology and geophysical prospecting generally include a header which contain heterogeneous information. There was a need to store in a uniform format basic information useful for the data processing implemented in Geopsy (e.g. picks of events, source and receiver coordinates, ...).

Some signal file formats can store various signals in a single file, others not. Signal processing, such as array computations, may be applied to only a part of a file or to signals located in various files. There was a need for grouping signals independently of the original file organisation. Exporting signals of interest to a temporary file before processing is not a satisfactory way of doing things because it duplicates the data on disk and there is a risk of altering information from the file conversion. Furthermore, confusion is likely to occur between true original signals and pre-processed signals (e.g. filtering, DC removal, ...).

Geopsy proposes an alternative with the concept of groups. A group is a list of signal ID (identification number). A name is given to each group which explains its content. The ID are automatically affected to each signal when loading files into Geopsy. Hence, the affectation depends upon the order of loading files. The database concept ensures that all files are loaded in the same order each time the signals are accessed, and consequently, that each ID effectively corresponds to a unique and well defined signal.

External signal processing tools (e.g. command line softwares like Cap) can access a geopsy database to retrieve the signals of interest. The ensemble of signals is generally referenced by the name of a group previously created in Geopsy's main frame. The command line tools have access to the signal samples with no care about the original file format. The geopsy core engine handles all file access and memory allocations to ease the development of processing tools based on signals.

Each time a command line tool is started with access to a Geopsy database all the header information is loaded into memory. This step is only based on the database's internal files. There is no access to the original file which ensures a very quick start-up of any database even if it contains a lot of signals (thousands). First versions of Geopsy required reading of file headers which is sometimes long (e.g. GSE format with multiple signals in a file).

The file structure of a Geopsy database is described in section File structure.