0

I'm trying to split a large text files into several text files. I found another thread from a few years ago with a very similar premise but couldn't find my exact situation.

https://unix.stackexchange.com/a/64691/183674

How would I split the following data if the first line didn't start with 00:00:00:00?

00:00:00:00 00:00:05:00 01SC_001.jpg
00:00:14:29 00:00:19:29 01SC_002.jpg
00:01:07:20 00:01:12:20 01SC_003.jpg
00:00:00:00 00:00:03:25 02MI_001.jpg
00:00:03:25 00:00:08:25 02MI_002.jpg
00:00:35:27 00:00:40:27 02MI_003.jpg
00:00:00:00 00:00:05:00 03Bi_001.jpg
00:00:05:19 00:00:10:19 03Bi_002.jpg
00:01:11:17 00:01:16:17 03Bi_003.jpg
00:00:00:00 00:00:05:00 04CG_001.jpg
00:00:11:03 00:00:16:03 04CG_002.jpg
00:01:12:25 00:01:17:25 04CG_003.jpg

Here's the code for reference:

#!/usr/bin/env perl

use strict;
use warnings;

open(my $infh, '<', 'ABC_TabDelim.txt') or die $!;

my $outfh;
my $filecount = 0;
while ( my $line = <$infh> ) {
    if ( $line =~ /^00:00:00:00/ ) {
        close($outfh) if $outfh;
        open($outfh, '>', sprintf('ABC%02d_TabDelim.txt', ++$filecount)) or die $!;        
    }
    print {$outfh} $line or die "Failed to write to file: $!";
}

close($outfh);
close($infh);

I tried adding a print $line; in the next line after the while statement to attempt to make it read line by line as shown in other tutorials but this did not rectify the issue.

I would appreciate any input.

edit: So for an example like

    00:01:16:17 00:00:05:00 01SC_001.jpg
    00:00:14:29 00:00:19:29 01SC_002.jpg
    00:01:07:20 00:01:12:20 01SC_003.jpg
    00:00:00:00 00:00:03:25 02MI_001.jpg
    00:00:03:25 00:00:08:25 02MI_002.jpg
    00:00:35:27 00:00:40:27 02MI_003.jpg
    00:00:00:00 00:00:05:00 03Bi_001.jpg
    00:00:05:19 00:00:10:19 03Bi_002.jpg
    00:01:11:17 00:01:16:17 03Bi_003.jpg
    00:00:00:00 00:00:05:00 04CG_001.jpg
    00:00:11:03 00:00:16:03 04CG_002.jpg
    00:01:12:25 00:01:17:25 04CG_003.jpg

I would like to get three seperate files, respectively containing

00:00:00:00 00:00:03:25 02MI_001.jpg
00:00:03:25 00:00:08:25 02MI_002.jpg
00:00:35:27 00:00:40:27 02MI_003.jpg

00:00:00:00 00:00:05:00 03Bi_001.jpg
00:00:05:19 00:00:10:19 03Bi_002.jpg
00:01:11:17 00:01:16:17 03Bi_003.jpg

00:00:00:00 00:00:05:00 04CG_001.jpg
00:00:11:03 00:00:16:03 04CG_002.jpg
00:01:12:25 00:01:17:25 04CG_003.jpg

discarding the first three lines.

4
  • 1
    How do you expect the file to be split? Commented Aug 8, 2016 at 14:07
  • I expect the code to make a file for every occurrence of 00:00:00:00, ending just before the next instance. How would I implement this if all of the lines with 00:00:00:00's were shifted down a few lines? Commented Aug 8, 2016 at 14:14
  • 1
    what is your expected output? Commented Aug 8, 2016 at 14:21
  • You should show us the expected output from your sample data, and your sample data should illustrate any corner cases that have to be dealt with (not having 00:00:00:00 in the first column of the first row, for example). Commented Aug 8, 2016 at 14:25

1 Answer 1

1

Does modifying the condition in the loop like this not do the job?

if ($line =~ /^00:00:00:00/ || !$outfh)

Suppose the first line does not start 00:00:00:00 (a 'zero marker'). The regex match fails, but the file isn't open so the || !$outfh condition is true. The code in the if body skips the close and opens the new file and the line is written to the new file. Thereafter, the file is open, so the second half of the condition doesn't change the decision making (except to slow it down marginally and probably immeasurably).

The question was clarified since I first proffered my solution. If you want to discard the rows before the first zero marker, modify the print to print only if the file handle is open (instead of the modified condition to open the file if the first line does not start with a zero marker).

print $outfh $line or die "Failed to write to file: $!" if $outfh;
Sign up to request clarification or add additional context in comments.

4 Comments

It's working with your proposed change, now I just need to understand the significance of the second condition :)
Suppose the first line starts 01. The regex match fails, but the file isn't open so the or condition is true. The code skips the close and opens the new file and the line is written. Thereafter, the file is open so the second half of the condition doesn't change the decision making (except to slow it down marginally and probably immeasurably).
That clarifies my confusion, I appreciate the help.
The question was clarified since I proffered my solution. If you want to discard the rows before the first zero marker, modify the print to print only if the file handle is open.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.