0

I have a file that looks like this:

1 : Aa|xxx Aa|xxx Bb|xxx Cc|xxx Cc|xxx Cc|xxx 
2 : Cc|xxx Aa|xxx Aa|xxx Aa|xxx Bb|xxx    
3 : Bb|xxx Bb|xxx Aa|xxx Cc|xxx    
4 : Bb|xxx Aa|xxx Cc|xxx    
5 : Aa|xxx Cc|xxx Bb|xxx 

The xxx stands for an individual code Aa for example for a Name. Each line always has all three Names.

I would like to have three files containing the line number (first column) and only one name. Something like this:

1 : Aa|xxx Aa|xxx
2 : Aa|xxx Aa|xxx Aa|xxx
3 : Aa|xxx
4 : Aa|xxx
5 : Aa|xxx

Could somebody help me with this? I would be super happy. Thank you in advance!

2 Answers 2

2

A possible approach is to remove the extra content:

perl -pe 's/ (Bb|Cc)\S*//g' file > A
perl -pe 's/ (Aa|Cc)\S*//g' file > B
perl -pe 's/ (Aa|Bb)\S*//g' file > C

(the some can be done with sed, awk, ex)

0

As pointed out by @JJao, it's also very easy with sed and extended regex (-r):

$ sed -r 's/\s(Cc|Bb)\|...//g' file > A
$ sed -r 's/\s(Aa|Cc)\|...//g' file > B
$ sed -r 's/\s(Aa|Bb)\|...//g' file > C

For Os X (on Apple systems), the option -r does not mean the same as for GNU sed. In particular it does not interpret \s correctly as a space. Instead, use: [[:space:]].

If the name "xxx" following the pipe is not always 3 alphanumeric characters, replace ... in the regex by [^[:space:]]+. The cutoff for the matched name will be the first encountered space.

So the more general answer relying on sed would be for output file A:

$  sed -r 's/[[:space:]](Cc|Bb)\|[^[:space:]]+//g' file > A

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.