0

I need to parse a FWF to a df, but the system that exports the file instead of & symbol exports &. This break the fixed-width as the lines with & have now more characters so I cant use read_fwf. I can import it with the below code, but this gives me a single column that I must later split.

import pandas as pd
cnv = lambda txt: txt.replace('&', '&')
df = pd.read_csv('AccountStatement1.txt', skiprows=4, skipfooter=3, engine='python', sep='$%^#~&', converters= { i: cnv for i in range(1) })
df

i use sep='$%^#~&' so i can have only one column and correct the text using the converter. What is the proper solution to this?

Sample of the text file:

=======================================================================================================================================================================================================================================================================================================================================================
From: Transaction Date: 21/05/2021 To: Transaction Date: 23/06/2021 IBAN:: CYxxxxxx0000000000000 Currency: EUR Previous Statement Balance:       1,111.10 BIC: xxxxxxxx
=======================================================================================================================================================================================================================================================================================================================================================

Transaction Date Transaction details                  Reference No.         Description 1                        Description 2                        Value date                    Debit               Credit              Balance  
27/05/2021       CHQ: 12568987                                                                                                                        26/05/2021                   645.00                                  9,708.70  
27/05/2021       DEBIT                                EB2021057554434221149 xxxx xxxxxxxxx xxxxxx                                                     27/05/2021                 0,888.36                                  3,723.74  
28/05/2021       I2456787437452                                             B/O: xxxxxxxxxxxxxxx LTD             TRANSFER xxxxxxxxx xxxxxxxxxx        27/05/2021                                     19,002.00            13,755.74  
28/05/2021       INWARD TRANSFER COMMISSION                                 CY21jhjh884786                       I2107675689452                       28/05/2021                    10.00                                 15,723.74  
31/05/2021       ATM/POS DEBIT                                              jhgjhkjhjk jkh f4         1211       xxxxxx xxxxx & xx xxxxx              27/05/2021                    60.00                                 52,680.74  
31/05/2021       Service Charges                                            MONTHLY MAINTENANCE FEE                                                   31/05/2021                    35.00                                 73,645.74  
01/06/2021       Service Charges                                            STATEMENT FEE - MAY 2021                                                  31/05/2021                     5.00                                 19,645.74  
02/06/2021       ATM/POS DEBIT                                              POS 567521 124454         1211       xxxxxxxxxxxx & Exxxxxxx              31/05/2021                   170.00                                 09,320.74  
03/06/2021       CHQ: 13456784                                                                                                                        02/06/2021                    80.00                                 10,230.74  
04/06/2021       ATM/POS DEBIT                                              POS 345671 124258         1278       xxxxxxxxxxxx & xxxxxxxx              02/06/2021                   940.00                                 23,960.74  
08/06/2021       ATM/POS DEBIT                                              POS 345671 125678         1278       xxxxxxx xxxxx xxxxx                  04/06/2021                    13.20                                 13,347.54  
15/06/2021       ATM/POS DEBIT                                              POS 145671 156612         1671       xxxx xxxxxxxxxxxxxx680               11/06/2021                    25.53                                 13,322.01  
=======================================================================================================================================================================================================================================================================================================================================================
Number of records:     22          IBAN:: xxxx234567898765434567876545     Currency: EUR     Current Statement Balance:       0,000.00
=======================================================================================================================================================================================================================================================================================================================================================

4
  • Could you show us your file, or at least the first 20 lines or so? That would be really helpful Commented Aug 21, 2021 at 9:09
  • why read_csv() when it's fwf. pandas.pydata.org/docs/reference/api/pandas.read_fwf.html as per other comment, sample data Commented Aug 21, 2021 at 9:11
  • the system that exports the file instead of & symbol exports &. This fact break the fixed-width as the lines with & have now more characters so I can't use read_fwf. Commented Aug 21, 2021 at 14:30
  • Added sample of text file @Cimbali Commented Aug 21, 2021 at 14:44

1 Answer 1

1

Maybe you could load the file, replace the problematic characters, then read it as fixed width with pd.read_fwf using io.StringIO to make an in-memory buffer:

>>> import io, pandas as pd
>>> with open('test.csv') as f:
...   lines = f.readlines()
>>> pd.read_fwf(io.StringIO(''.join(lines[4:-3]).replace('&', '&')))
    a    b  c
0  11  fo&  0

This is the file’s content, unaligned by & as you indicate:

>>> print(''.join(lines))
foo
bar
baz
qux
 a   b  c
11 fo&  0
ig
nore
me
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.