2

I have very large tab-separated files, and I need delete all rows where the word "TelePacific" appears in a specific column. In this case all the rows where TelePacifc occurs in the 4th column. Here is an example input file:

7/18/13 10:06   0:00:09 TelePacific random person DEREK         9256408665  random company
7/18/13 10:07   0:00:21 TelePacific random person DEREK         9256408665  random company
7/18/13 10:10   0:19:21 TelePacific random person DEREK         9256408665  random company
7/18/13 10:39   0:01:07 random person       107  
7/18/13 11:02   0:01:41 random person Gilbert       107 TelePacific
7/18/13 12:17   0:00:42 random person Gilbert       107 TelePacific
7/18/13 13:35   0:00:41 random person Gilbert       107 TelePacific
7/18/13 13:44   0:12:30 TelePacific ADKNOWLEDGE     8169311771  random company
7/18/13 14:46   0:19:48 TelePacific TOLL FREE CALL  8772933939  random company
7/15/13 10:09   0:01:27 random person Esquivel      272 TelePacific
7/15/13 10:16   0:00:55 random person Esquivel      272 TelePacific
7/15/13 10:59   0:00:51 random person Esquivel      272 TelePacific
7/15/13 11:01   0:01:09 random person Esquivel      272 TelePacific

5 Answers 5

5

Using grep -v:

grep -v "\bTelePacific\b" file > output && mv output file

Or using awk:

awk '$4 != "TelePacific"' file > output && mv output file
Sign up to request clarification or add additional context in comments.

1 Comment

+1 for \b ("match word boundary"), so you only match the word "TelePacific" instead of "FooTelePacific" or "TelePacificFoo".
1

fgrep -v will do this.

fgrep is equivalent to grep -F and prevents grep from interpreting special characters in your pattern as regex control characters. The -v parameter causes fgrep to output all lines that don't match the pattern, in contrast to outputting the lines that do (which is the default).

fgrep -v TelePacific inputfile.tsv > outputfile.tsv

As anubhava noted above, you may choose grep -v "\bTelePacific\b" instead to ensure that you don't accidentally match "TelePacificFoo" or "FooTelePacific".

2 Comments

Is there anyway to do it where it only searches for instances of TelePacific in the 4th column?
@Fr0ntSight That's the point where grep-related tools stop being very helpful. You could write a really nasty regular expression to parse tabs, or make a clever loop in shell script, but awk is actually designed for whitespace-delimited separated fields and that makes anubhava's awk solution the right tool for the job.
1

This should do the trick:

$ sed '/TelePacific/d' file

If you are happy with the output use the -i option to store the changes back to the file.

$ sed -i '/TelePacific/d' file

EDIT:

To only return results for TelePacific in the fourth column:

$ awk '$4=="TelePacific"' file

Or the inverse:

$ awk '$4!="TelePacific"' file

4 Comments

Won't this also delete lines with text FooTelePacific?
Sure it would, but the question wasn't that specific.
@ahilsend: Example file and this statement I have very large tab separated files indicates it is a separate word.
Is there anyway to do it where it only searches for instances of TelePacific in the 4th column?
0

here is a solution with sed

#!/bin/bash

sed '/TelePacific/d' your_file.txt > file_without_telepacific.txt

Comments

0

Try this:

grep -v TelePacific in-file > out-file

The -v option inverts the search, so grep prints all lines that don't match the search pattern.

This won't work if in-file and out-file are the same. To achive that you have to use a temp file like this:

grep -v TelePacific in-file > in-file.tmp && mv in-file.tmp in-file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.