User Tools

Site Tools


en:software:matlab:trepr:dev:i:dataset

Dataset as fundamental unit

The whole idea of the toolbox is to process two and one-dimensional datasets. Therefore, most of the functions operate on such datasets, meaning that they read a dataset and return a modified one.

Constraints

As many operations rely on additional information about the data (“metadata”), the term “dataset” in context of this toolbox means not only the actual data (i.e., a n×m matrix of numbers), but as well a whole bunch of additional information that is stored in a defined way in the toolbox data structure.

Therefore, each dataset that gets loaded into the toolbox eventually has to comply to the toolbox data structure. Please, have this in mind if you write your own importer routines for as yet unsupported file types.

Interfaces

As the dataset is the fundamental unit of the toolbox, there needs to be interfaces for importing, saving, and exporting data to and from the toolbox. This gets done by a set of general routines that are designed to import, save, and export data, namely:

  • trEPRload
  • trEPRsave
  • trEPRexport1D

The latter, trEPRexport1D, is a rather special case, as it is normally used to export the currently visible dataset in a GUI to a simple ASCII file with one to three columns. Please note that by exporting data in that way you loose all the additional information (metadata) that are otherwise part of a dataset.

Once you imported data into the toolbox via trEPRload, you can save the resulting dataset together with all its metadata in a toolbox-specific format. Even though this format is toolbox-specific, it is rather standard:

  • The numerical data themselves get saved as standard binary.
  • The metadata are stored in an XML file (that preserves the Matlab™ variable types).
  • Both, numerical and metadata are stored in a ZIP-compressed archive.

The XML file is a compromise between depending too much on a proprietary software (as Matlab™) and convenience in handling the data. Although only hardly human-readable (it might do with a proper XML editor), all your data are stored in a standardised format that is basically independent of the toolbox.

Storing the actual data as (standard) binary rather than ASCII (as early versions of the toolbox did) has two reasons: file size and I/O speed. Clearly, loading binary data in Matlab™ is tremendously faster than trying to read an ASCII file (e.g. with the load command), especially if you have to handle large datasets.

en/software/matlab/trepr/dev/i/dataset.txt · Last modified: 2020/09/30 21:35 by 127.0.0.1