Difference between revisions of "Geopsy: Custom ASCII formats"
(18 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
[[Image:CustomASCIIFormats.png|thumb|right|200px|Custom ASCII format dialog box]] | [[Image:CustomASCIIFormats.png|thumb|right|200px|Custom ASCII format dialog box]] | ||
− | + | Geopsy allows you to load ASCII signal files with a custom header, automatically assigning header information when loading the signal. In order to manage the header formats, go to [[Geopsy: Preferences|Geopsy Preferences]], tab ''Load'', box ''File Format'' and push the button ''Custom ASCII format''. | |
− | Format specifications are saved to and restored from files with extension '[[Xml files|ascfmt]]' (''Import'' and ''Export'' buttons). It is recommended to export your format specifications before a software upgrade or when resetting user settings. If you constructed a specification for a format that can be of interest for other users, you can submit it in the [[#Examples|Examples section]] (or by | + | A list of custom formats is maintained and stored in the general [[User settings|user settings]]. These formats are included in the list of accepted formats for [[Geopsy: Supported file formats#Automatic recognition|''automatic recognition'']] and for importation. They cannot be used for export. |
+ | |||
+ | Format specifications are saved to and restored from files with extension '[[Xml files|ascfmt]]' (''Import'' and ''Export'' buttons). It is recommended to export your format specifications before a software upgrade or when resetting user settings. If you constructed a specification for a format that can be of interest for other users, you can submit it in the [[#Examples|Examples section]] (or by e-mail). | ||
== Creating a new format == | == Creating a new format == | ||
Line 11: | Line 13: | ||
[[Image:CustomASCIIFormatEditor.png|thumb|left|300px|Custom ASCII format editor]] | [[Image:CustomASCIIFormatEditor.png|thumb|left|300px|Custom ASCII format editor]] | ||
− | If there are already some formats in the list, select the closest to the new one you are going to create. All the attributes of this model format are loaded. | + | If there are already some formats in the list, select the closest to the new one you are going to create, before clicking on ''New''. All the attributes of this model format are loaded. |
− | To ease adjustment of attributes of a format specification it is easier to work on a file example. Enter a file name in '' | + | To ease adjustment of attributes of a format specification it is easier to work on a file example. Enter a file name in ''File example'' or click on the browse ("...") button on the right side of the ''Example'' field. There is also a ''Load'' button to run the file loader with the current Custom ASCII format on the example file. The log view is showing a detailed report with more verbosity than usual file load. Revise this report carefully, even if there are no critical error, there might be some warnings about undefined header data. |
− | Set a descriptive ''Name'' and eventually add | + | Set a descriptive ''Name'' and eventually add a list of extensions separated by comas. The file format extension is not mandatory. Do not include the dot. Extensions are used for automatic file format recognition. If there is a match on the file name extension, the format is identified even without opening the file and checking its content. Add extensions only if they are specific to your format. Avoid common extensions like 'txt'. Custom ASCII file format extensions are checked after all other format extensions but before checking file contents, hence there is no risk to supersede standard extensions but a high risk to shortcut the automatic recognition based of file content even for standard formats. |
− | There are basically | + | There are basically five things to do in this order: |
* define the header size ([[#Header definition|Header definition]]) | * define the header size ([[#Header definition|Header definition]]) | ||
* map header information to signal properties ([[#Rules|Rules]]) | * map header information to signal properties ([[#Rules|Rules]]) | ||
* define component naming convention ([[#Component table|Components]]) | * define component naming convention ([[#Component table|Components]]) | ||
* set the time format ([[#Time format|Time format]]) | * set the time format ([[#Time format|Time format]]) | ||
+ | * specify data part properties ([[#Data separators|Data separators]]) | ||
=== Header definition === | === Header definition === | ||
+ | |||
+ | There are four ways to define the header size: | ||
+ | * ''No header'': signals values are encountered yet on the first line of the file. A custom file format without header cannot be automatically recognized from its content, only the extension can be used. The [[#Rules|Rules]] can set constant values to signal properties even without information from the header. Hence you can associate a particular file extension to signal properties (e.g. sampling frequency). | ||
+ | * ''Fixed header'': header is limited by a fixed number of lines. Right after the last header line, the signal values in columns are expected. | ||
+ | * ''Header pattern'': all lines of the header contain a text pattern (e.g. '#' as the first character). The pattern is a [[Regular Expressions|regular expression]]. When a line does not contain this pattern it is assumed that signal values start. | ||
+ | * ''End header pattern'': the end of the header is marked by a special text pattern (e.g. '########'). The pattern is a [[Regular Expressions|regular expression]]. The pattern must include all characters until the first signal value. | ||
+ | |||
+ | ==== Examples ==== | ||
+ | |||
+ | ^#{40,}$ | ||
+ | |||
+ | Matches all lines containing at least 40 times '#'. '^' marks the starting of a line and '$' the end of a line. For ''End header pattern'', there can be several consecutive lines matching the pattern, the data are assumed to start after the last one. | ||
+ | |||
+ | ^# | ||
+ | |||
+ | Matches all lines that start with '#' | ||
+ | |||
+ | In the above figure (GeoSIG format), the line starting by "Time" marks the end of the header. | ||
+ | |||
+ | ^Time | ||
=== Rules === | === Rules === | ||
+ | |||
+ | [[Image:CustomASCIIFormatRules.png|thumb|left|300px|Custom ASCII format editor]] | ||
+ | |||
+ | Rules are edited in a table. Each row is a distinct rule. They are executed in order. You can change the order with the buttons ''Down'' and ''Up''. To create a new rule based on an existing one, select the rule to be copied and click on ''Add''. To remove a rule click on ''Remove''. | ||
+ | |||
+ | A rule can be ''Constant'' or not. Constant rules are defined only by their ''Value'', ''Channel'', ''Data'' and ''Operation''. ''Value'' is assigned to ''Data'' (or added, subtracted, multiplied or divided according to ''Operation'') of all channels (''Channel''==-1) or of one channel (''Channel''=index). Channel indexes are counted from 0 for the first column. | ||
+ | |||
+ | For variable rules, ''Value'' is defined according to the pattern and its captured texts. Captured texts are all expressions of the pattern within parenthesis. ''Index'' indicates which captured text has to be assigned to ''Value''. If ''Index'' (right after ''Pattern'') is 0, the whole pattern match is returned in ''Value''. ''Mandatory'' flag is to issue warnings to the user when loading a file without the requested information and also to automatically recognize the file format based on the file content. ''Factor'' can be enventually used to correct the obtained value (only for numerical values). The second ''Index'' right after ''Data'' is designed for data fields which support indexes (e.g. ''TimePick''). | ||
+ | |||
+ | The above figure shows the rules defined for format GeoSIG. Values for ''Component'' data are handled in a particular way. The value, either constant or captured from the pattern, is translated to a standard component thanks to a look-up table descrided in the next section. This table can be also used to ignore some columns or to specify a ''Time'' column. The latter is for formats that do not provide explicit information about the starting time in the header, instead they provide one time stamp per sample. | ||
+ | |||
+ | In this example the pattern for ''Component'' identification is | ||
+ | |||
+ | Time +([A-Za-z\.]+),[a-zA-Z]+ +([A-Za-z\.]+)+,[a-zA-Z]+ +([A-Za-z\.]+),[a-zA-Z]+ | ||
+ | |||
+ | which matches | ||
+ | |||
+ | Time Long.,g Tran.,g Vert.,g | ||
+ | |||
+ | The same pattern is used N times, with N being the number of interesting channels. For each ''Channel'', there is a corresponding pattern ''Index''. The obtained value is then translated into a standard component with the look-up table described hereinafter. | ||
=== Component table === | === Component table === | ||
+ | |||
+ | The component table is a look-up table to translate values assigned to ''Component'' (see above section about ''Rules'') into standard components. Among them there are two special types: ''Ignore'' and ''Time''. If the first one is used, the text column is simply ignored. If ''Time'' is used, ''StartTime'' and ''SamplingFrequency'' are deduced from sample time stamps. | ||
=== Time format === | === Time format === | ||
+ | |||
+ | Time format is used by any rule that deals with ''StartTime'' or if you specify a ''Time'' column (see above section). The specification of time and date is described in [[Time and date specification]]. | ||
== Editing an existing format == | == Editing an existing format == | ||
− | It works the same way as for the creation of a new format. Select it in the list, click on ''Edit'',edit the attributes and click on '' | + | It works the same way as for the creation of a new format. Select it in the list, click on ''Edit'', edit the attributes and click on ''OK'' from the editor dialog box. Changes are committed to the main internal list only after clicking on ''OK'' from the list dialog box. |
== Examples == | == Examples == | ||
− | + | This section contains several examples. Feel free to submit more formats (email or forum), they might be useful for others. | |
+ | |||
+ | * Campbel from CR6 [[Media:CampbelCR6.ascfmt|CampbelCR6.ascfmt]] | ||
+ | * GeoSIG [[Media:GeoSIG.ascfmt|GeoSIG.ascfmt]] | ||
* Tromino ... [does anyone have a file example?] | * Tromino ... [does anyone have a file example?] |
Latest revision as of 14:16, 21 November 2022
Contents
Introduction
Geopsy allows you to load ASCII signal files with a custom header, automatically assigning header information when loading the signal. In order to manage the header formats, go to Geopsy Preferences, tab Load, box File Format and push the button Custom ASCII format.
A list of custom formats is maintained and stored in the general user settings. These formats are included in the list of accepted formats for automatic recognition and for importation. They cannot be used for export.
Format specifications are saved to and restored from files with extension 'ascfmt' (Import and Export buttons). It is recommended to export your format specifications before a software upgrade or when resetting user settings. If you constructed a specification for a format that can be of interest for other users, you can submit it in the Examples section (or by e-mail).
Creating a new format
If there are already some formats in the list, select the closest to the new one you are going to create, before clicking on New. All the attributes of this model format are loaded.
To ease adjustment of attributes of a format specification it is easier to work on a file example. Enter a file name in File example or click on the browse ("...") button on the right side of the Example field. There is also a Load button to run the file loader with the current Custom ASCII format on the example file. The log view is showing a detailed report with more verbosity than usual file load. Revise this report carefully, even if there are no critical error, there might be some warnings about undefined header data.
Set a descriptive Name and eventually add a list of extensions separated by comas. The file format extension is not mandatory. Do not include the dot. Extensions are used for automatic file format recognition. If there is a match on the file name extension, the format is identified even without opening the file and checking its content. Add extensions only if they are specific to your format. Avoid common extensions like 'txt'. Custom ASCII file format extensions are checked after all other format extensions but before checking file contents, hence there is no risk to supersede standard extensions but a high risk to shortcut the automatic recognition based of file content even for standard formats.
There are basically five things to do in this order:
- define the header size (Header definition)
- map header information to signal properties (Rules)
- define component naming convention (Components)
- set the time format (Time format)
- specify data part properties (Data separators)
Header definition
There are four ways to define the header size:
- No header: signals values are encountered yet on the first line of the file. A custom file format without header cannot be automatically recognized from its content, only the extension can be used. The Rules can set constant values to signal properties even without information from the header. Hence you can associate a particular file extension to signal properties (e.g. sampling frequency).
- Fixed header: header is limited by a fixed number of lines. Right after the last header line, the signal values in columns are expected.
- Header pattern: all lines of the header contain a text pattern (e.g. '#' as the first character). The pattern is a regular expression. When a line does not contain this pattern it is assumed that signal values start.
- End header pattern: the end of the header is marked by a special text pattern (e.g. '########'). The pattern is a regular expression. The pattern must include all characters until the first signal value.
Examples
^#{40,}$
Matches all lines containing at least 40 times '#'. '^' marks the starting of a line and '$' the end of a line. For End header pattern, there can be several consecutive lines matching the pattern, the data are assumed to start after the last one.
^#
Matches all lines that start with '#'
In the above figure (GeoSIG format), the line starting by "Time" marks the end of the header.
^Time
Rules
Rules are edited in a table. Each row is a distinct rule. They are executed in order. You can change the order with the buttons Down and Up. To create a new rule based on an existing one, select the rule to be copied and click on Add. To remove a rule click on Remove.
A rule can be Constant or not. Constant rules are defined only by their Value, Channel, Data and Operation. Value is assigned to Data (or added, subtracted, multiplied or divided according to Operation) of all channels (Channel==-1) or of one channel (Channel=index). Channel indexes are counted from 0 for the first column.
For variable rules, Value is defined according to the pattern and its captured texts. Captured texts are all expressions of the pattern within parenthesis. Index indicates which captured text has to be assigned to Value. If Index (right after Pattern) is 0, the whole pattern match is returned in Value. Mandatory flag is to issue warnings to the user when loading a file without the requested information and also to automatically recognize the file format based on the file content. Factor can be enventually used to correct the obtained value (only for numerical values). The second Index right after Data is designed for data fields which support indexes (e.g. TimePick).
The above figure shows the rules defined for format GeoSIG. Values for Component data are handled in a particular way. The value, either constant or captured from the pattern, is translated to a standard component thanks to a look-up table descrided in the next section. This table can be also used to ignore some columns or to specify a Time column. The latter is for formats that do not provide explicit information about the starting time in the header, instead they provide one time stamp per sample.
In this example the pattern for Component identification is
Time +([A-Za-z\.]+),[a-zA-Z]+ +([A-Za-z\.]+)+,[a-zA-Z]+ +([A-Za-z\.]+),[a-zA-Z]+
which matches
Time Long.,g Tran.,g Vert.,g
The same pattern is used N times, with N being the number of interesting channels. For each Channel, there is a corresponding pattern Index. The obtained value is then translated into a standard component with the look-up table described hereinafter.
Component table
The component table is a look-up table to translate values assigned to Component (see above section about Rules) into standard components. Among them there are two special types: Ignore and Time. If the first one is used, the text column is simply ignored. If Time is used, StartTime and SamplingFrequency are deduced from sample time stamps.
Time format
Time format is used by any rule that deals with StartTime or if you specify a Time column (see above section). The specification of time and date is described in Time and date specification.
Editing an existing format
It works the same way as for the creation of a new format. Select it in the list, click on Edit, edit the attributes and click on OK from the editor dialog box. Changes are committed to the main internal list only after clicking on OK from the list dialog box.
Examples
This section contains several examples. Feel free to submit more formats (email or forum), they might be useful for others.
- Campbel from CR6 CampbelCR6.ascfmt
- GeoSIG GeoSIG.ascfmt
- Tromino ... [does anyone have a file example?]