Format

A few general remarks regarding the format of the info files.

Criteria for the file format

  • Machine readable and human readable
  • Pure text (ASCII)
  • Uniquely identifiable
    • Identifier string in the first or second line of the file

General description of the file format

For those who like to see something, rather than reading (dull and) lengthly description: an example can be found further below.

Following a general description of the file format.

  • Format
    • The file format is ASCII (7-bit)
    • Restriction to 7 bit ASCII character table guarantees interoperability between different operating systems.
    • Therefore: No umlauts or other special characters.
  • File name and extension
    • The file extension is “.info”.
    • The filename is identical with the file basename of the corresponding data file.1)
  • The first line of the file consists of an identifier string.
    • Makes it possible to uniquely identify the file format during parsing.
    • Separated by an empty line from the remaining file.
    • Should contain a version number (of the format).
  • Field names
    • Field names may contain spaces, but no special characters and no colons2).
    • Field names must start with a character (no numerical!).
    • Every field name ends with a colon.
    • Field names shall not contain/repeat the block name.3)
  • Values
    • Values always follow a field name
    • Within a block, all values get intended that way that they are left-aligned. Therefore, the longest field name defines the indentation.
    • Values may contain special characters4) and colons.
    • NEW Values may span several lines. In this case, each new line needs to start with a whitespace character (such as space or tab).
  • Use of colons
    • Colons are used to separate field names and values
    • At all other places (e.g., after a block heading) no colon is allowed. Only exception are values.
    • Colons are used internally during parsing to separate field names and values (in Matlab: regexp with option split).
  • The info file is divided into several blocks.
    • Blocks are introduced by block names in capital letters.
  • All field names and descriptions within the file should be in English to guarantee international usability.
  • Blocks and fields may be optional, as long as there are certain fields (acting as “switches”) that can be used while parsing to determine whether these blocks/fields exist in the file.
    • Is no value available for a field, but removing the field seems not reasonable5), the value is “N/A”.
  • About the blocks:
    • Each block starts with a heading (block name) in capital letters.
    • Each block gets separated from the previous part of the file by an empty line.
    • Blocks contain key-value pairs consisting of a field name, followed by its respective value.
1)
This is a rather ideal case. Therefore, in reality it is not meant to be a strict rule, but rather a recommendation.
2)
The only exception are currently (round) brackets. If there is need for further special characters, parsing of those could be implemented as well. Reason for the restriction is the (direct) conversion of the field names in MATLAB® structs.
3)
Example: “Preparation” rather than “Sample preparation” in block “SAMPLE”. On the one hand, field names are much shorter this way, on the other hand, the file becomes easier to read.
4)
In general, whenever possible one should avoid using special characters, as they normally don't survive different file encodings. Until very recently, Matlab used different encodings with different operating systems.
5)
E.g., a field that might be relevant depending on the type of experiment, and only for certain experiments contains no value. In such case it is highly recommended to keep the field, as otherwise one forgets about it.
en/software/info/format.txt · Last modified: 2017/12/09 20:09 (external edit)