Documentation Center

  • Trials
  • Product Updates

textscan

Read formatted data from text file or string

Syntax

  • C = textscan(fileID,formatSpec) example
  • C = textscan(fileID,formatSpec,N) example
  • C = textscan(str,formatSpec) example
  • C = textscan(str,formatSpec,N)
  • C = textscan(___,Name,Value) example
  • [C,position] = textscan(___) example

Description

example

C = textscan(fileID,formatSpec) reads data from an open text file into cell array, C. The text file is indicated by the file identifier, fileID. Use fopen to open the file and obtain the fileID value. When you finish reading from a file, close the file by calling fclose(fileID).

textscan attempts to match the data in the file to formatSpec, which is a string of conversion specifiers.

example

C = textscan(fileID,formatSpec,N) reads file data, using the formatSpec N times, where N is a positive integer. To read additional data from the file after N cycles, call textscan again using the original fileID. If you resume a text scan of a file by calling textscan with the same file identifier (fileID), then textscan automatically resumes reading at the point where it terminated the last read.

example

C = textscan(str,formatSpec) reads data from a string, str into cell array C. For strings, repeated calls to textscan restart the scan from the beginning each time. To restart a scan from the last position, request a position output.

textscan attempts to match the data in the string, str, to formatSpec, which is a string of conversion specifiers.

C = textscan(str,formatSpec,N) reads string data, using the formatSpec N times, where N is a positive integer.

example

C = textscan(___,Name,Value) specifies options using one or more Name,Value pair arguments, in addition to any of the input arguments in the previous syntaxes.

example

[C,position] = textscan(___) returns the file or string position at the end of the scan as the second output argument, using any of the input arguments in the previous syntaxes. For a file, this is the value that ftell(fileID) would return after calling textscan. For a string, position indicates how many characters textscan read.

Examples

expand all

Read a String

Read a string of floating-point numbers.

str = '0.41 8.24 3.57 6.24 9.27';

C = textscan(str,'%f');

The formatSpec string '%f' tells textscan to match each field in str to a double-precision floating-point number.

Display the contents of cell array C.

celldisp(C)
C{1} =
 
    0.4100
    8.2400
    3.5700
    6.2400
    9.2700

Read the same string, truncating each value to one decimal digit.

C = textscan(str,'%3.1f %*1d');

The specifier %3.1f indicates a field width of 3 digits and a precision of 1. textscan reads a total of 3 digits, including the decimal point and the 1 digit after the decimal point. The specifier, %*1d, tells textscan to skip the remaining digit.

Display the contents of cell array C.

celldisp(C)
C{1} =
 
    0.4000
    8.2000
    3.5000
    6.2000
    9.2000

Read Different Types of Data

Using a text editor, create a file scan1.dat that contains data in the following form:

09/12/2005 Level1 12.34 45 1.23e10 inf Nan Yes 5.1+3i
10/12/2005 Level2 23.54 60 9e19 -inf  0.001 No 2.2-.5i
11/12/2005 Level3 34.90 12 2e5   10  100   No 3.1+.1i

Open the file, and read each column with the appropriate conversion specifier.

fileID = fopen('scan1.dat');
C = textscan(fileID,'%s %s %f32 %d8 %u %f %f %s %f');
fclose(fileID);
celldisp(C)
C{1}{1} =
 
09/12/2005
 
 
C{1}{2} =
 
10/12/2005
 
 
C{1}{3} =
 
11/12/2005
 
 
C{2}{1} =
 
Level1
 
 
C{2}{2} =
 
Level2
 
 
C{2}{3} =
 
Level3
 
 
C{3} =
 
   12.3400
   23.5400
   34.9000

 
 
C{4} =
 
   45
   60
   12

 
 
C{5} =
 
  4294967295
  4294967295
      200000

 
 
C{6} =
 
   Inf
  -Inf
    10

 
 
C{7} =
 
       NaN
    0.0010
  100.0000

 
 
C{8}{1} =
 
Yes
 
 
C{8}{2} =
 
No
 
 
C{8}{3} =
 
No
 
 
C{9} =
 
   5.1000 + 3.0000i
   2.2000 - 0.5000i
   3.1000 + 0.1000i

textscan returns a 1-by-9 cell array C.

View the MATLAB® data type of each of the cells in C.

C
C = 

  Columns 1 through 5

    {3x1 cell}    {3x1 cell}    [3x1 single]    [3x1 int8]    [3x1 uint32]

  Columns 6 through 9

    [3x1 double]    [3x1 double]    {3x1 cell}    [3x1 double]

For example, C{1} and C{2} are cell arrays. C{5} is of data type uint32, so the first two elements of C{5} are the maximum values for a 32-bit unsigned integer, or intmax('uint32').

Remove a Literal String

Remove the text 'Level' from each field in the second column of the data from the previous example.

Match the literal string in the formatSpec input.

fileID = fopen('scan1.dat');
C = textscan(fileID,'%s Level%d %f32 %d8 %u %f %f %s %f');
fclose(fileID);
C{2}
ans =

           1
           2
           3

View the MATLAB data type of the second cell in C.

class(C{2})
ans =

int32

The second cell of the 1-by-9 cell array, C, is now of data type int32.

Skip the Remainder of a Line

Read the first column of the file in the previous example into a cell array, skipping the rest of the line.

fileID = fopen('scan1.dat');
dates = textscan(fileID,'%s %*[^\n]');
fclose(fileID);
dates{1}
ans = 

    '09/12/2005'
    '10/12/2005'
    '11/12/2005'

textscan returns a 1-by-1 cell array dates.

Specify Delimiter and Empty Value Conversion

Using a text editor, create a comma-delimited file, data.csv, that contains

1,  2,  3,  4,   ,  6
7,  8,  9,   , 11, 12

Read the file, converting empty cells to -Inf.

fileID = fopen('data.csv');
C = textscan(fileID,'%f %f %f %f %u8 %f',...
'delimiter',',','EmptyValue',-Inf);
fclose(fileID);
column4 = C{4}, column5 = C{5}
column4 =

     4
  -Inf


column5 =

    0
   11

textscan returns a 1-by-6 cell array, C. The textscan function converts the empty value in C{4} to -Inf, where C{4} is associated with a floating-point format. Because MATLAB represents unsigned integer -Inf as 0, textscan converts the empty value in C{5} to 0, and not -Inf.

Read Custom Empty Value Strings and Comments

Using a text editor, create a comma-delimited file, data2.csv, that contains the lines

abc, 2, NA, 3, 4
// Comment Here
def, na, 5, 6, 7

Designate the input that textscan should treat as comments or empty values.

fileID = fopen('data2.csv');
C = textscan(fileID,'%s %n %n %n %n','delimiter',',',...
'treatAsEmpty',{'NA','na'},'commentStyle','//');
fclose(fileID);
celldisp(C)
C{1}{1} =
abc

C{1}{2} =
def

C{2} =
     2
   NaN

C{3} =
   NaN
     5

C{4} =
     3
     6

C{5} =
     4
     7

Treat Repeated Delimiters as One

Using a text editor, create a file, data3.csv, that contains

1,2,3,,4
5,6,7,,8

To treat the repeated commas as a single delimiter, use the MultipleDelimsAsOne parameter, and set the value to 1 (true).

fileID = fopen('data3.csv');
C = textscan(fileID,'%f %f %f %f','delimiter',',',...
'MultipleDelimsAsOne',1);
fclose(fileID);
celldisp(C)
C{1} =
     1
     5

C{2} =
     2
     6

C{3} =
     3
     7

C{4} =
     4
     8

Collect Numeric Data

Using a text editor, create a file, grades.txt, that contains:

Student_ID  | Test1  | Test2  | Test3
   1           91.5     89.2     77.3
   2           88.0     67.8     91.0
   3           76.3     78.1     92.5
   4           96.4     81.2     84.6

Read the column headers using the format '%s' four times.

fileID = fopen('grades.txt');

formatSpec = '%s';
N = 4;
C_text = textscan(fileID,formatSpec,N,'delimiter','|');

Read the numeric data in the file.

C_data0 = textscan(fileID,'%d %f %f %f')
C_data0 = 
  [4x1 int32]    [4x1 double]    [4x1 double]    [4x1 double]

The default value for CollectOutput is 0 (false), so textscan returns each column of the numeric data in a separate array.

Set CollectOutput to 1 (true) to collect the consecutive columns of the same class into a single array.

frewind(fileID);

C_text = textscan(fileID,'%s',N,'delimiter','|');

C_data1 = textscan(fileID,'%d %f %f %f','CollectOutput',1)
C_data1 = 
    [4x1 int32]    [4x3 double]

The test scores, which are all double, are collected into a single 4-by-3 array.

Close the file, grades.txt.

fclose(fileID);

Read Nondefault Control Characters

Use sprintf to convert nondefault escape sequences in your data.

Create a string that includes a form feed character, \f. Then, to read the string using textscan, call sprintf to explicitly convert the form feed.

lyric = sprintf('Blackbird\fsinging\fin\fthe\fdead\fof\fnight');
C = textscan(lyric,'%s','delimiter',sprintf('\f'));
C{1}
ans = 

    'Blackbird'
    'singing'
    'in'
    'the'
    'dead'
    'of'
    'night'

textscan returns a 1-by-1 cell array, C.

Resume a Text Scan of a String

Resume a scan of a string from a position other than the beginning.

If you resume a text scan of a string, textscan reads from the beginning of the string each time. To resume a scan from any other position in the string, use the two-output argument syntax in your initial call to textscan.

For example, create a string called lyric. Read the first word of the string, and then resume the scan.

lyric = 'Blackbird singing in the dead of night';
[firstword,pos] = textscan(lyric,'%9c',1);
lastpart = textscan(lyric(pos+1:end),'%s');

Input Arguments

expand all

fileID — File identifiernumeric scalar

File identifier of an open text file, specified as a number. Before reading a file with textscan, you must use fopen to open the file and obtain the fileID.

Data Types: double

formatSpec — Format of the data fieldsstring

Format of the data fields, specified as a string of one or more conversion specifiers. When textscan reads a file or string, it attempts to match the data to the formatSpec string. If textscan fails to match a data field, it stops reading and returns all fields read before the failure.

The number of conversion specifiers determines the number of cells in output array, C.

Numeric Fields

This table lists available conversion specifiers for numeric inputs.

Numeric Input TypeConversion SpecifierOutput Class
Integer, signed%dint32
%d8int8
%d16int16
%d32int32
%d64int64
Integer, unsigned%uuint32
%u8uint8
%u16uint16
%u32uint32
%u64uint64
Floating-point number%fdouble
%f32single
%f64double
%ndouble

Character Fields

This table lists available conversion specifiers for character inputs.

Character StringsConversion SpecifierDetails
Characters%sString
%qString, where double quotation marks indicate text to keep together
%cAny single character, including a delimiter
Pattern-matching strings%[...]

Read only the characters inside the brackets up to the first nonmatching character. To include ] in the set, specify it first: %[]...].

Example: %[mus] reads 'summer ' as 'summ'.

%[^...]

Exclude characters inside the brackets, reading until the first matching character. To exclude ], specify it first: %[^]...].

Example: %[^xrg] reads 'summer ' as 'summe'.

Optional Operators

Conversion specifiers in formatSpec can include optional operators, which appear in the following order (includes spaces for clarity):

Optional operators include:

  • Fields and Characters to Ignore

    textscan reads all characters in your file in sequence, unless you tell it to ignore a particular field or a portion of a field.

    Use the following operators to skip or read portions of fields.

    Operator

    Action Taken

    %*

    Skip the field. textscan does not create an output cell for any field that it skips.

    Example: '%s %*s %s %s %*s %*s %s' (spaces are optional) converts the string
    'Blackbird singing in the dead of night' to four output cells with the strings
    'Blackbird' 'in' 'the' 'night'

    %*n

    Ignore n characters of the field, where n is an integer less than or equal to the number of characters in the field.

    Example: %*4s ignores 4 characters, so '%*4s %s' reads 'summer' as 'er'.

  • Field Width

    textscan reads the number of characters or digits specified by the field width or precision, or up to the first delimiter, whichever comes first. A decimal point is counted as a digit. Specify the field width by inserting a number after the percent character (%) in the conversion specifier.

    Example: %5f reads '123.456' as 123.4.

      Note:   When the field width operator is used with single characters (%c), textscan also reads delimiter characters.
      Example: %7c reads 7 characters, including white-space, so'Day and night' reads as 'Day and'.

  • Precision

    For floating-point numbers (%n, %f, %f32, %f64), you can specify the number of decimal digits to read.

    Example: %7.2f reads '123.456' as 123.45.

  • Literal Text to Ignore

    textscan ignores specified text appended to the formatSpec string.

    Example: Level%u8 reads 'Level1' as 1.

    Example: %u8Step reads '2Step' as 2.

N — Number of times to apply formatSpecinteger

Number of times to apply formatSpec, specified as an integer.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

str — Input stringstring

Input string to read.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: C = textscan(fileID,formatSpec,'HeaderLines',3,'Delimiter',',') skips the first three lines of the data, and then reads the remaining data, treating commas as a delimiter.

Names are not case sensitive.

'CollectOutput' — Logical indicator determining data concatenationfalse (default) | true

Logical indicator determining data concatenation, specified as the comma-separated pair consisting of 'CollectOutput' and either true or false. If true, then textscan concatenates consecutive output cells of the same fundamental MATLAB class into a single array.

'CommentStyle' — Symbols designating text to ignorestring | cell array of strings

Symbols designating text to ignore, specified as the comma-separated pair consisting of 'CommentStyle' and a string or cell array of strings.

For example, specify a string such as '%' to ignore characters following the string on the same line. Specify a cell array of two strings, such as {'/*', '*/'}, to ignore characters between the strings.

textscan checks for comments only at the start of each field, not within a field.

Example: 'CommentStyle',{'/*', '*/'}

'Delimiter' — Field delimiter characters{'','\b','\t'} (default) | string | cell array of strings

Field delimiter characters, specified as the comma-separated pair consisting of 'Delimiter' and a string or a cell array of strings. Specify multiple delimiters in a cell array of strings.

Example: 'delimiter',{';','*'}

textscan interprets repeated delimiter characters as separate delimiters, and returns an empty value to the output cell.

Within each row of data, the default field delimiter is white space. White space can be any combination of space (' '), backspace ('\b'), or tab ('\t') characters. If you do not specify a delimiter, textscan interprets repeated white-space characters as a single delimiter.

When you specify one of the following escape sequences as a delimiter, textscan converts that sequence to the corresponding control character:

\bBackspace
\nNewline
\rCarriage return
\tTab
\\Backslash (\)

'EmptyValue' — Returned value for empty numeric fieldsNaN (default) | scalar

Returned value for empty numeric fields in delimited text files, specified as the comma-separated pair consisting of 'EmptyValue' and a scalar.

'EndOfLine' — End-of-line charactersstring

End-of-line characters, specified as the comma-separated pair consisting of 'EndOfLine' and a string. The default end-of-line sequence depends on the format of your file and can include a newline character ('\n'), a carriage return ('\r'), or a combination of the two ('\r\n').

If there are missing values and an end-of-line sequence at the end of the last line in a file, then textscan returns empty values for those fields. This ensures that individual cells in output cell array, C, are the same size.

'ExpChars' — Exponent characters'eEdD' (default) | string

Exponent characters, specified as the comma-separated pair consisting of 'ExpChars' and a string. The default exponent characters are e, E, d, and D.

'HeaderLines' — Number of header lines0 (default) | positive integer

Number of header lines, specified as the comma-separated pair consisting of 'HeaderLines' and a positive integer. textscan skips the header lines, including the remainder of the current line.

'MultipleDelimsAsOne' — Multiple delimiter handling0 (false) (default) | 1 (true)

Multiple delimiter handling, specified as the comma-separated pair consisting of 'MultipleDelimsAsOne' and either true or false. If true, textscan treats consecutive delimiters as a single delimiter. Repeated delimiters separated by white-space are also treated as a single delimiter. You must also specify the Delimiter option.

Example: 'MultipleDelimsAsOne',1

'ReturnOnError' — Behavior when textscan fails to read or convert1 (true) (default) | 0 (false)

Behavior when textscan fails to read or convert, specified as the comma-separated pair consisting of 'ReturnOnError' and either true or false. If true, textscan terminates without an error and returns all fields read. If false, textscan terminates with an error and does not return an output cell array.

'TreatAsEmpty' — Strings to treat as empty valuestring | cell array of strings

Strings to treat as empty values, specified as the comma-separated pair consisting of 'TreatAsEmpty' and a single string or cell array of strings. This option only applies to numeric fields.

'Whitespace' — White-space characters' \b\t' (default) | string

White-space characters, specified as the comma-separated pair consisting of 'Whitespace' and a string of one or more characters. textscan adds a space character, char(32), to any specified Whitespace, unless Whitespace is empty ('') and formatSpec includes any string conversion specifier.

When you specify one of the following escape sequences as any white-space character, textscan converts that sequence to the corresponding control character:

\bBackspace
\nNewline
\rCarriage return
\tTab
\\Backslash (\)

Output Arguments

expand all

C — File or string datacell array

File or string data, returned as a cell array.

For each numeric conversion specifier in formatSpec, the textscan function returns a K-by-1 MATLAB numeric vector to the output cell array, C, where K is the number of times that textscan finds a field matching the specifier.

For each string conversion specifier in formatSpec, the textscan function returns a K-by-1 cell vector of strings, where K is the number of times that textscan finds a field matching the specifier. For each character conversion that includes a field width operator, textscan returns a K-by-M character array, where M is the field width.

position — File or string positioninteger

File or string position at the end of the scan, returned as an integer of class double. For a file, ftell(fileID) would return the same value after calling textscan. For a string, position indicates how many characters textscan read.

More About

expand all

Algorithms

textscan converts numeric fields to the specified output type according to MATLAB rules regarding overflow, truncation, and the use of NaN, Inf, and -Inf. For example, MATLAB represents an integer NaN as zero. If textscan finds an empty field associated with an integer format specifier (such as %d or %u), it returns the empty value as zero and not NaN.

textscan does not include leading white-space characters in the processing of any data fields. When processing numeric data, textscan also ignores trailing white space.

textscan imports any complex number as a whole into a complex numeric field, converting the real and imaginary parts to the specified numeric type (such as %d or %f). Valid forms for a complex number are:

±<real>±<imag>i|j

Example: 5.7-3.1i

±<imag>i|j

Example: -7j

Do not include embedded white space in a complex number. textscan interprets embedded white space as a field delimiter.

See Also

| | | | | | |

Was this topic helpful?