User Tools

Site Tools


en:software:matlab:ta:dev:import

Import data

How to write an importer for data formats not yet supported by the toolbox?

Introduction

The TA Toolbox is modular by design. That means that it is fairly easy to add your own importer for whatever data format you intend to read. Of course, it is your responsibility to decide whether the toolbox fits reasonably well to the type of data you want to import. Besides that, if you follow a few relatively simple requirements (as the importer is the one interface between the real world and the self-contained and idyllic world of the toolbox), it should be pretty straight-forward.

Please note: Although being far from complete, the following might help developers that want to write importers for data formats that are not yet supported by the toolbox.

A few facts about the toolbox that are relevant for writing an importer:

  • The toolbox uses a clearly defined structure (Matlab™ struct) to save the data internally.
  • All importers should output such a structure.
  • The wrapper functions should be placed in the “IO” directory of the toolbox.
  • The “wrapper” for all importer functions is TAload.
  • Add a new entry in the file TAload.ini to add your new importer to the list of supported formats. That makes it as well appearing inside the GUI (as the GUI load panel automatically reads the supported formats from the TAload.ini file). How to add such an entry is explained in that ini file.

There are a few tools that help you to get used to the data structure. See the next section for that.

<note important>Please note: The TA Toolbox data structure was designed having TA data in mind (both with and without magnetic fields applied). Therefore, it might not be ideal for other types of data as well. But most certainly there will be ways to extend the data structure if necessary/useful. In such case, please let me know in advance, so that we can agree about the changes to make, to keep it as consistent as possible.</note>

Tools

There are a few tools that might help to get used to the data structure, to write importer routines, and even to validate their output.

TAdataStructure

The function TAdataStructure helps you with getting to know the TA Toolbox data structure.

To get to know how to use this function:

help TAdataStructure

TA Toolbox data structure

To get a Matlab™ struct with all the fields of the TA Toolbox data structure, type:

dataStructure = TAdataStructure('structure')

If you want to have something similar, i.e. a Matlab™ struct with all the fields, but each field with a string that tells you what type the actual field is of (in case that it is not a struct itself), use the following syntax:

dataModel = TAdataStructure('model')

Validate your own data structure

Suppose that you have already written an importer (here named myImporter) that returns some struct with data. To test whether your struct complies to the TA Toolbox data model, use the following syntax:

% Read your data
data = myImporter(...);
 
% Validate your data structure for complying with TA Toolbox data structure
[missingFields,wrongType] = TAdataStructure('check',data)

Now, missingFields should contain the fields missing in your structure, and wrongType should list you each field in your struture that appears to have the wrong data type.

Tips

A few hopefully helpful ideas for writing importers.

Enhancing compatibility with further toolbox development

A tip for maximum compatibility of your importer to the TA Toolbox data structure: Inside your importer, get the TA Toolbox data structure by calling once the TAdataStructure function and selectively fill this structure with your data.

That helps you being as compatible as possible with the toolbox data structure, as this may slightly change over time.

Meta data

If you are interested a bit more in all the informations that get stored in the data structure of the toolbox and where all the fields may come from, have a look at the info file structure.

The idea behind such “info files” is to mimic a labbook record. Admittedly, the author of the TA Toolbox is rather lazy in this respect, and therefore, to make life easier, the info files provide you with a scaffold of all the necessary informations that you should collect during the experiment, especially when working with a lab-built setup that gets changed quite often over time.1)

Template for a new importer

To make life easier, following is a list of important things and design principles of importer routines of the toolbox, and below that, a skeleton of a Matlab™ function for a new importer.

Please note: This is not to restrict your way of coding, it is just to make life easier to start with, and it may lead to a more consistent code of the toolbox.

Of course, the TIMTOWTDI2) principle still applies.

Design principles and tips

A few rather important things to note:

  • Both input and output parameters are fixed.
    • Input: fileName, varargin
    • Output: data, warnings
  • varargin represents a list of optional parameters, two of them are particularly important:3)
    • combine (logical)
      Whether to combine files.
    • checkFormat (logical)
      Whether to check for the correct format of the file.
  • If something goes wrong while reading the file, the importer should do the following:
    • Return empty numerical value [] in data
    • Return a string/cell array in warnings explaining what went wrong (e.g., wrong format, file does not exist, no (valid) filename given).
  • If everything went well, data should be according to the TA Toolbox data structure and warnings empty.
  • If more than one dataset has been loaded (if this is supported by the importer), data is a cell array of structures that validate against the TA Toolbox data structure.
  • Generally, all importers should be able to handle cell arrays, structures and strings as file names.
    • A code listing of how to handle this can be found below.
  • It is good practice to use a subfunction “loadFunction” for the actual loading. Have a look at the source code of TAOXload.m for an example.

Coding example

Following a skeleton of a Matlab™ function for a new importer. To get an idea of how an actual importer routine can look like, have a look at TAOXread.m for example.

Please don't be scared by what looks like a tremendous overhead. Most parts are reasonably well documented that it should be obvious what they are good for. And please have in mind: As the importer gets used in combination with the GUI, a certain level of failsave behaviour is required.

"TAxyformatRead.m"
function [data,warnings] = TAxyFormatRead(fileName,varargin)
% TAXYFORMATREAD Read xy format files (binary)
%
% Usage
%   data = TAxyFormatRead(fileName)
%   [data,warnings] = TAxyFormatRead(fileName)
%   data = TAxyFormatRead(fileName,...)
%
% ... add description of parameters here...
%
% See also: TAload, TAdataStructure
 
% (c) 20xx, <Developer's name>
% 20xx-xx-xx
 
% Parse input arguments using the inputParser functionality
p = inputParser;   % Create an instance of the inputParser class.
p.FunctionName = mfilename; % Function name to be included in error messages
p.KeepUnmatched = true; % Enable errors on unmatched arguments
p.StructExpand = true; % Enable passing arguments in a structure
 
p.addRequired('fileName', @(x)ischar(x) || iscell(x) || isstruct(x));
% p.addOptional('parameters','',@isstruct);
p.addParamValue('combine',logical(false),@islogical);
p.addParamValue('sortfiles',logical(true),@islogical);
% Note, this is to be compatible with TAload - currently without function!
p.addParamValue('checkFormat',logical(true),@islogical);
p.parse(fileName,varargin{:});
 
% Assign optional arguments from parser
combine = p.Results.combine;
 
warnings = cell(0);
 
% If no filename given
if isempty(fileName)
    data = [];
    warnings{end+1} = 'No filename.';
    return;
end
 
% Handling different data types of fileName parameter
if iscell(fileName)
    if sortfiles
        sort(fileName);
    end
elseif isstruct(fileName)
    % That might be the case if the user uses "dir" as input for the
    % filenames, as this returns a structure with fields as "name"
    if ~isfield(fileName,'name')
        data = [];
        warnings{end+1} = 'Cannot determine filename(s).';
        return;
    end        
    % Convert struct to cell
    fileName = struct2cell(fileName);
    fileName = fileName(1,:)';
    % Remove files with leading '.', such as '.' and '..'
    fileName(strncmp('.',fileName,1)) = [];
    if sortfiles
        sort(fileName);
    end
else
    % If filename is neither cell nor struct
    % Given the input parsing it therefore has to be a string
    if exist(fileName,'dir')
        % Read directory
        fileName = dir(fileName);
        % Convert struct to cell
        fileName = strut2cell(fileName);
        fileName = fileName(1,:)';
        % Remove files with leading '.', such as '.' and '..'
        fileName(strncmp('.',fileName,1)) = [];
        if sortfiles
            sort(fileName);
        end
    elseif exist(fileName,'file')
        % For convenience, convert into cell array
        fn = fileName;
        fileName = cell(0);
        fileName{1} = fn;
    else
        % If "filename" is neither a directory nor a file...
        % Check whether it's only a basename
        fileName = dir([fileName '*']);
        if isempty(fileName)
            data = [];
            warnings{end+1} = 'No valid filename.';
            return;
        end
        % Convert struct to cell
        fileName = struct2cell(fileName);
        fileName = fileName(1,:)';
        % Remove files with leading '.', such as '.' and '..'
        fileName(strncmp('.',fileName,1)) = [];
        if sortfiles
            sort(fileName);
        end
    end
end
 
% Add your code here
 
 
function [data,warnings] = loadFile(fileName)
% LOADFILE Load file and return contents. 
%
% fileName    - string
%               Name of a file (normally including full path)
%
% data        - structure
%               According to the toolbox data structure
%
% warnings    - cell array of strings
%               Contains warnings if there are any, otherwise empty.
 
% A few important settings
% Name of the format as it appears in the file.format field
formatNameString = '<your format specifier string - may contain spaces>';
 
% Add code for actually importing your data here

Testing your new importer

Before you add your new file format to the toolbox (see below), please test your importer thoroughly, e.g. by validating the output it creates.

See above for how to validate your output using TAdataStructure.

Other things you may (and should) test for:

  • fileName being a string, a structure (such as returned when using dir), a cell array
  • Test what happens if you try to load files with a different format than what you actually wrote your importer for.
  • Use invalid file names.

In every case, the routine should “exit gracefully”, meaning that if something goes wrong, it should still return, with an empty vector as data and a string/cell array in warnings that tells the user what may have gone wrong.

Given enough time (and the need for it), there might even be a test routine for new importers at one point in the future4). Until then, please help yourself following the tips layed out above.

Adding your importer to TAload.ini

Once you've written your importer routine and thoroughly tested it to comply to the TA Toolbox data structure, you may add it to the TAload.ini file to make it accessible from within the GUI.

Following is an excerpt of the TAload.ini file describing an entry for a supported file format:

% Configuration file for the TAload function of the TA toolbox
%
% (c) 2011-12, Till Biskup <till@till-biskup.de>
%
% Each file format that is recognized by the TAload function
% has its own entry in this file. The format of this entry is as follows:
%
% [<file format>]
% name = short name of the format (used to identify it)
% description = more detailed description
% type = <ascii|binary>
% identifierString = <string that can be used to identify the file>
% fileExtension = file extension(s) (if a list, separate by "|")
% function = <function that is used to handle the file>
% multipleFiles = <true|false> whether format consists of multiple files
% parameters = <additional parameters passed to the function>
% combineMultiple = <true|false> whether routine can combine multiple files

If you are still in doubt what several of these fields may be used for, have a look at the complete TAload.ini file as such, or, if that doesn't help, ask the toolbox author.

Please note: The “<file format>” identifier has to be unique and a single word.

Every file format defined in the TAload.ini file gets automatically recognised by the GUI. That means that you can select it in the Load panel. The string that appears in the popup menu in that panel is determined by the field “name” in the TAload.ini file.

1)
If you ever had that experience to look at data after a year or two trying to figure out which exact piece of equipment you used at one particular point of your setup, only to realise that you didn't make a note regarding that in your labbook, you might know what I'm talking about.
2)
“There is more than one way to do it”
3)
There may be more than those two with time, but as of now (2012-02), these two are the important ones, as they got passed to the importer by the TAload routine, at least if called from inside the GUI.
4)
Keyword: test driven development
en/software/matlab/ta/dev/import.txt · Last modified: 2020/09/30 21:35 by 127.0.0.1