Parsing a Large Text File into Sections

Asked by Amanda on 18 Aug 2012
Latest activity Commented on by Amanda on 18 Aug 2012

I have a large text file as below:

Run Lat Long Time
1    32  32    34
1    23  22    21
2    23  12   11
2    11  11   11
2    33  11  12

up to 10 runs etc.

So I'm trying to break up each section in the file: section 1, section 2, etc and write it to 10 different text files. File 1 will have data from Run 1. File 2 will have data from Run 2.

Thanks,

Amanda

0 Comments

Amanda

Products

No products are associated with this question.

3 Answers

Answer by Sven on 18 Aug 2012
Edited by Sven on 18 Aug 2012
Accepted answer

Hi Amanda,

This should work for you. It just reads the input file one line at a time and prints that line to an output file. If it hits a new "section", it makes a new output file named by that section.

fidIn = fopen('inputFile.txt','r');
oldFirstChars = 'somethingtostart';
fidOut = [];
while 1
    tline = fgetl(fidIn);
    if ~ischar(tline), break, end % Handle the end of the input file
        % Get the string up to the first space
        newFirstChars = regexp(tline, '\d+','match','once');
        % If it's a new "section", make a new file
        if ~strcmp(oldFirstChars, newFirstChars)
            if ~isempty(fidOut)
                % Close the old file first
                fclose(fidOut);                      
            end 
            fidOut = fopen(['outputFile' newFirstChars '.txt'],'w');
            oldFirstChars = newFirstChars;
        end
        % Just print out the line that we just read to the output file
        fprintf(fidOut, '%s\r\n',tline);
    end
  % Clean up any open files
  fclose(fidIn);
  fclose(fidOut);

0 Comments

Sven
Answer by Amanda on 18 Aug 2012
Edited by Amanda on 18 Aug 2012

Having a slight matrix dimensions error.

Thanks Sven.

Everything works great!!!

Thanks a lot.

1 Comment

Sven on 18 Aug 2012

Hi Amanda, this line is just to extract the first 1 (or 2 or 3 depending on how many digits in the section number) characters from the string. Do you have a space character after your section number, or a tab character?

There will definitely be a more robust way than I wrote (I just search for the first "space" character). Perhaps a regexp such as:

newFirstChars = regexp(tline, '\d+','match','once')
Amanda
Answer by Amanda on 18 Aug 2012

Solved the matrix dimensions -- using a plain ascii file. Success in getting the output files.

Only Problem which probably I didn't make clear (been brainstorming too long)

I'm trying to group the file based on the first number of the column:

Output File 1

1 32 32 34

1 23 22 21

Output File 2

2 23 12 11

2 11 11 11

So grouping the data based on the run 1 or run 2 in separate files.

Thanks, Amanda

2 Comments

Sven on 18 Aug 2012

Hi Amanda, I just tested the script, and it does exactly that. I've made a change or two now to fix two little bugs:

1. It now works even if the first line is the headers (and not a section)

2. It now puts a newline/carriage return rather than just a newline between lines (so that it shows up on different rows in notepad)

I've edited my first answer with these changes. (by the way, you can hit "comment" rather than "answer" if you want to comment on someone's answer).

Thanks, Sven.

Amanda on 18 Aug 2012

Works Excellent!

Thanks, Amanda

Amanda

Contact us