My bank sends a non common CSV file using ; as field separator and a binary code (hexadecimal a0 or octal 240) to enclose the fields where a ; could occur, as below:
Input
Extrait;Date;Date valeur;Compte;Description;Montant;Devise
�2020/0001/0002�;29.02.2020;29.02.2020;-;�28/02/20 Some shop in Antwerp A Antwerpen (BE)�;-16,50;EUR
�2020/0001/0001�;01.02.2020;01.02.2020;-;�31/01/20 Some shop in Zaventem Z Zaventem (BE)�;-13,00;EUR
I need to process fields 2, 5 and 6 with AWK.
Desired output
{Date}{Description}{Montant}
{29.02.2020}{28/02/20 Some shop in Antwerp A Antwerpen (BE)}{-16,50}
{01.02.2020}{31/01/20 Some shop in Zaventem Z Zaventem (BE)}{-13,00}
Up to now, as long as the fields enclosed by � do not contain any ; the script below using the variable FPAT works:
#!/usr/bin/awk -f
BEGIN {
FS=";"
FPAT="[^;]*" # this works but not in all cases
#FPAT="([^;]*)|(\240[^\240]+\240)" # this doesn't work
}
{ gsub (/\240/, "", $5) # I wish I could skip this instruction too
print "{" $2 "}{" $5 "}{" $6 "}"
}
I found a similar case (see awk FPAT to ignore commas in csv) but changing the , into ; and the \" into \240 didn't do the trick.
I need help for implementing a FPAT pattern to scan correctly my CSV file in all cases.
\xc2\xa0instead of\xa0, which I can't use in theFPATproposed by anubhava. I will have to find a workaround...[^;\xc2]+(\xc2[^\xa0][^;\xc2]*)*|(\xc2[^\xa0][^;\xc2]*)+(without the typo)