1. Home
2. Questions
3. Unanswered
4. AI Assist
5. Tags
7. Chat
8. Users
10. Companies
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Stack Internal
Bring the best of human thought and AI automation together at your work. Learn more

Split a text file into multiple files according column content

Ask Question

Asked 9 years, 3 months ago

Modified 9 years, 3 months ago

Viewed 330 times

0

I have a file that looks like this:

1 : Aa|xxx Aa|xxx Bb|xxx Cc|xxx Cc|xxx Cc|xxx 
2 : Cc|xxx Aa|xxx Aa|xxx Aa|xxx Bb|xxx    
3 : Bb|xxx Bb|xxx Aa|xxx Cc|xxx    
4 : Bb|xxx Aa|xxx Cc|xxx    
5 : Aa|xxx Cc|xxx Bb|xxx

The xxx stands for an individual code Aa for example for a Name. Each line always has all three Names.

I would like to have three files containing the line number (first column) and only one name. Something like this:

1 : Aa|xxx Aa|xxx
2 : Aa|xxx Aa|xxx Aa|xxx
3 : Aa|xxx
4 : Aa|xxx
5 : Aa|xxx

Could somebody help me with this? I would be super happy. Thank you in advance!

edited Sep 10, 2016 at 9:45

Sundeep

12.2k3 gold badges28 silver badges78 bronze badges

asked Sep 10, 2016 at 9:03

Wiebke

1

Add a comment |

2 Answers 2

Sorted by:

2

A possible approach is to remove the extra content:

perl -pe 's/ (Bb|Cc)\S*//g' file > A
perl -pe 's/ (Aa|Cc)\S*//g' file > B
perl -pe 's/ (Aa|Bb)\S*//g' file > C

(the some can be done with sed, awk, ex)

edited Sep 10, 2016 at 10:05

answered Sep 10, 2016 at 9:26

JJoao

12.8k1 gold badge26 silver badges45 bronze badges

Add a comment |

0

As pointed out by @JJao, it's also very easy with sed and extended regex (-r):

$ sed -r 's/\s(Cc|Bb)\|...//g' file > A
$ sed -r 's/\s(Aa|Cc)\|...//g' file > B
$ sed -r 's/\s(Aa|Bb)\|...//g' file > C

For Os X (on Apple systems), the option -r does not mean the same as for GNU sed. In particular it does not interpret \s correctly as a space. Instead, use: [[:space:]].

If the name "xxx" following the pipe is not always 3 alphanumeric characters, replace ... in the regex by [^[:space:]]+. The cutoff for the matched name will be the first encountered space.

So the more general answer relying on sed would be for output file A:

$  sed -r 's/[[:space:]](Cc|Bb)\|[^[:space:]]+//g' file > A

edited Sep 11, 2016 at 9:39

answered Sep 11, 2016 at 9:13

Cbhihe

2,9304 gold badges24 silver badges33 bronze badges

Add a comment |

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.