4.77778

4.8 | 18 ratings Rate this file 115 downloads (last 30 days) File Size: 29.89 KB File ID: #18430

txt2mat

by Andres

 

23 Jan 2008 (Updated 17 Nov 2009)

Code covered by the BSD License  

fast and versatile ascii data import capable of handling large text files

Download Now | Watch this File

File Information
Description

As txt2mat basically is a wrapper for sscanf, it quickly converts ascii files containing m-by-n numeric data, allowing for header lines. When encountering lines with different numbers of data elements, it will work line-by-line and thus slow down somewhat.

You may let txtmat carry out an automatic data layout analysis on comparatively 'simple' text files (header lines + decimal number data with common delimiters). By this analysis it is able to directly import most .csv-files, for instance.

As txt2mat can perform string replacements before the numeric conversion is done, it can cope with many irregularities within the data. By that it is also capable of detecting and handling commas as decimal characters (common german notation).

You can filter lines by keywords, provide appropriate format strings (as for sscanf), or split up the import process for huge files if you encounter memory problems (e.g. a typical regular 40MB file should be imported as a whole within less than 10s on most computers).

txt2mat is intended to work on Matlab R13 and newer versions.

Comments and suggestions welcome.

Andres

Acknowledgements
This submission has inspired the following:
readMM_2D
MATLAB release MATLAB 7.7 (R2008b)
Zip File Content  
Other Files license.txt,
txt2mat.m
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (22)
29 Jan 2008 vincenzo ficco  
29 Jan 2008 Florian H

Excellent! Finally a function that handles textfiles without the hassle of the builtin MATLAB ones.

I've combined it with
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15294
to get a really usefull drag-and-drop import funtion for various textformats.

24 Apr 2008 Wladimir Alonso

It worked beautifully when I needed to read solar data tables (http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2005/list_by_state.html) that contained headers, text and numbers (in this case using txt2mat(name_of_the_file,1,43,'ConvString',['%d-%d-%d,%d:%f' repmat(',%f',1,38)]) as kindly suggested by the author of the function)
thanks Andres!

23 May 2008 K AM

Fantastic script. Worked right out of the box. I used it bring in oddly shaped data files and it worked like a dream! Thanks Andres!

01 Jun 2008 wu zhiyong

Great work!
Thanks for your suggestion!
It helps me read the data from text files with ignoring the characters.But I fail to ignore the blank lines.
Do you have any suggestion?

02 Jun 2008 Andres T.

@wu zhiyong
Please see my posting in the newsgroup.

27 Aug 2008 rodrigo abarca del rio

Excellent work. It helped me in reading data from different format, and skipping the headlines ... in just a second, after having tried over more than 4 hours by different methods. we are close to fortran now :-)
thanks so much a lot for your work.

17 Oct 2008 djr djr  
03 Nov 2008 Ralf  
11 Nov 2008 Zahra

A powerful code. I am using it (with some help from the author of the code) to read very complicated data files with headerlines throughout the file as well as data lines with different lengths (line folding).

And it is quite fast too.

Thanks Andres for your great work!

Zahra

14 Jan 2009 John McArthur

Wondering if there's a way to get the program to recognize and import data in the header line. For instance, if you have the test conditions as header lines, like this:

nTimeInc, 1001, TotalTime, 400
ForcingFreq, 40, ForcingAmp, 15
T, X, Y, Z, Theta, Zeta
0, 0, 0, 0, 0, 1.0
0.4, 1.2, 0, 0, 0.01, 1.2
....

So, the header files have some useful info that would be nice to have accessible and can be added to plots and analysis.

Any thoughts?
johnnyfisma@hotmail.com

14 Jan 2009 Andres

@ John McArthur
The header itself *is* accessible as a string, e.g.

>> [A,ffn,nh,SR,hl] = txt2mat('myfile.txt');
>> headerNumbers = sscanf(hl,'%*s%f,%*s%f')

headerNumbers =
        1001
         400
          40
          15

(generally it is hard to guess which information in the header is useful to the user)

22 Jan 2009 Fredrik  
10 Feb 2009 Jose Miguel Jauregui

A great job,
It works perfectly when it comes to load very extensive text files, a very fast code. If you work with extensive text files, this is the code to use.

In case you need a modification Andres is always willing to provide help

Josemi

24 Mar 2009 Bas

Needed to have a file import script that could handle a comma as the decimal separator. This worked instantly.

22 May 2009 Gabriel Vézina

great code and fast to execute

05 Aug 2009 Val Schmidt

This seems like a terrifically useful tool.

One feature request. In addition to being able to specify characters that, if found in a line, mark the line for omission, it would be nice to be able to instead skip everything by default and specify characters of lines that are to be included.

Suppose, for example, you want to parse a log file. You'll want to extract all the log entries of a particular type. Rather than having to specify every other type of entry for omission, it'd be nice to be able to specify just the ones you want.

Thanks,
Val

09 Nov 2009 Leonard

txt2mat is excellent by being very straightforward in it's implementation. Could be the defacto standard within Matlab. Thanks Andreas for this.

If you are considering compiling a standalone application and deploying it using the MCR, you may want to consider the following:
I did not encounter any errors using this .m file while using it as long as MATLAB was installed on my machine (ver 7.5, BTW).
When deploying my executable to another machine and using the MCR (ver 7.7), the command line indicated the following:

Too many objects requested. Most likely cause is missing [ ] around left hand side that has a comma separated list expansion.

Error in ==> txt2mat at 519

515 %% Definitions
516
517 % find out matlab version as a decimal, up to the second dot:
518 v = ver('matlab');
519 vs= v.Version;
520 vsDotPos = [strfind(vs,'.'), Inf, Inf];
521 vn= str2double(vs(1:min(numel(vs),vsDotPos(2)-1)));

The .m file halted execution of my program because it was looking for a version # for MATLAB, which was not installed on the target machine. I patched the code by determining the matlab ver and editing the txt2mat.m as follows:

518 %v = ver('matlab');
519 vs = 7.5; %vs= v.Version;
520 vsDotPos = [strfind(vs,'.'), Inf, Inf];
521 vn= str2double(vs(1:min(numel(vs),vsDotPos(2)-1)));

Maybe it's possible to check for 'Matlab' on the target machine and/or the MCR and then handle this line appropriately.

Regards,
Len

11 Nov 2009 Andres

@ Leonard
Thanks for pointing to the version number issue with the MCR (and that part of code you mention should have been replaced anyhow...). I've just sent you an email regarding a possible solution; if it got lost, please use the 'Contact Author' link on my author page.
Regards
Andres

16 Nov 2009 Stefanie Peer  
23 Dec 2009 achus Pujante

Great function, you saved me a lot of time, thank you very much
Then I used your function to simplify my own case

04 Jan 2010 Pavan

Superb function. Worked without a problem. Great stuff.
Thanks Andres

Please login to add a comment or rating.
Updates
08 Feb 2008

v05.61

03 Mar 2008

v05.62

07 Apr 2008

v05.86

20 May 2008

v05.86.1

21 May 2008

v05.90

27 May 2008

v05.96

26 Jun 2008

v05.97

28 Aug 2008

v06.00 · introduction of read mode 'block' · 'MemPar' buffer value changed to scalar · reduced memory demand · modified help

03 Nov 2008

v06.01 · fixed bug: possible error message in file analysis when only header line number is given

02 Dec 2008

v06.04 · better handling of replacement strings containing line breaks (initiated by DS) · allow '*' wildcard in file name

09 Aug 2009

v06.12 · added line filter as requested by Val Schmidt

17 Nov 2009

v06.17.3 · new read modes 'char' and 'cell' to provide txt2mat's preprocessing without numerical conversion · enable 'good line' filtering during file analysis · better version detection, (hopefully) suitable for MCR (thanks to Len for his remark)

Tag Activity for this File
Tag Applied By Date/Time
data import Andres 22 Oct 2008 09:44:01
read Andres 22 Oct 2008 09:44:01
ascii Andres 22 Oct 2008 09:44:01
file Andres 22 Oct 2008 09:44:02
data Andres 22 Oct 2008 09:44:02
import Andres 22 Oct 2008 09:44:02
csv Andres 22 Oct 2008 09:44:02
conversion Andres 22 Oct 2008 09:44:02
numeric Andres 22 Oct 2008 09:44:02
data export Cristina McIntire 07 Nov 2008 12:17:56
data import Cristina McIntire 10 Nov 2008 11:01:59
import Cristina McIntire 10 Nov 2008 11:02:02
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com