0

Sorry to post such a rudimentary question, but I'm getting confused by all the different tutorials and examples (and slashes and hyphens and back-ticks oh my) so I figured I would get someone's experienced input.

I have a .csv which is obviously comma seperated that has several hundred lines which looks like this:

abcd-3096,62#,,100,,,25,,75,3, and it should be formatted like so:

{name: 'abcd-3096', weight : 62, some-field1: null, class: 100, some-field2: null, some-field3: null, unit-weight : 25, some-field4 : null, capacity : 75,   }

I know you'll either want to use awk or sed in order to replace it, and I'm more than fine with doing the formatting in several commands.

I don't expect anyone to format the whole line for me, but I'm hoping some one can show me how to prepend a column with some some text. I can't seem to find a reliable explanation of the command anywhere online.

3
  • Will any of the fields in your .csv ever contain a comma? Commented Oct 20, 2015 at 23:39
  • No, we can assume that commas only delineate the fields or columns. Commented Oct 20, 2015 at 23:40
  • Why not Perl one-liner? Commented Oct 20, 2015 at 23:46

1 Answer 1

2

You can use negating character classes like [^,] for this:

sed -r 's/^([^,]*),([^,]*),([^,]*)/{ name: "\1", weight: "\2", somefield1: "\3" }/' file.csv

The example uses only 3 groups for simplicity ... but you get the idea.

If your system does not support sed -r (extended regex syntax), you need to use \(group\) instead of (group):

sed 's/^\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\)/{ name: "\1", weight: "\2", somefield1: "\3" }/' file.csv

In case you don't need to use sed, you can also use bash directly:

while IFS=',' read -r name weight somefield1 class somefield2 somefield3 unitweight capacity rest
do
    echo -e "{ name: \"$name\", weight: \"$weight\", somefield1: \"$somefield1\",";
    echo -e " class: \"$class\", somefield2: \"somefield2\" somefield3: \"$somefield3\",";
    echo -e " unitweight: \"$unitweight\", capacity: \"$capacity\" }";
done < file.csv
IFS=$' \t\n'

(taken from this answer by koola)

Sign up to request clarification or add additional context in comments.

5 Comments

(All of these solutions assume that you have no commas in your data, as you stated in your comment on your question.)
This is an excellent answer. For your first example, it looks like you are trying to negate the same thing 3 times. Is that in compensation for the triple comma part of my data?
([^,]*) means "capture 0 or more characters that are not a comma". So for 2 values the pattern is ([^,]*),([^,]*), matching "a value, then a comma, then a value". For each additional group, you'd add ,([^,]*).
So if I wanted to specify for just a single column using sed, how would the following pseudo code: sed -r /regex field#/'string' field#/?
sed -r 's/([^,]*).*/{ "name": "\1" }/'. The basic sed replace syntax is sed 's/pattern/replacement'. For regular expressions specifically I can recommend a read of regular-expressions.info

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.