I want to parse CSV records like the one below with awk or gawk.
The fields are separated by commas but the last field ($6) is special because it really consists of subfields. These subfields are separated by # as the field separator (or, to be precise, ". # "). This in itself is not a problem: I can use awk -F'(,)|(. # )' to set alternative field separators.
However, there are stray commas in this last field as well that need to be ignored.
Is there a way to solve this with awk, perhaps using FPAT?
Sample record:
"http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab","http://publications.europa.eu/resource/cellar/3befa3c3-a9af-4dac-baa2-92e95cb6e3ab.0002","EU:C:1985:443","61984CJ0239","Gerlach","Judgment of the Court (Third Chamber) of 24 October 1985. # Gerlach & Co. BV, Internationale Expeditie, v Minister van Economische Zaken. # Reference for a preliminary ruling: College van Beroep voor het Bedrijfsleven - Netherlands. # Article 41 ECSC - Anti-dumping duties. # Case 239/84."
awkencounters a comma that's the end of the field. E.g. in the sample record, there will only be two subfields in $6 and then the comma after BV means there's suddenly a $7 etc.