python regex: capture different strings line by line from .txt file

Question

I need to extract names/strings from a .txt file line by line. I am trying to use regex to do this.

Eg. In this below line I want to extract the name "Victor Lau", "Siti Zuan" and the string "TELEGRAPHIC TRANSFER" in three different lists then output them into an excel file. You may see the txt file also

TELEGRAPHIC TRANSFER 0008563668 040122 BRH BDVI0093 VICTOR LAU 10,126.75- .00 10,126.75- SITI ZUZAN 16:15:09

I have tried this code

for file in os.listdir(directory):
     filename = os.fsdecode(file)
     if (filename.endswith(".txt") or filename.endswith(".TXT")) and (filename.find('AllBanks')!=-1):
        with open(file) as AllBanks:
            for line in AllBanks:
                try:
                    match4 = re.search(r'( [a-zA-Z]+ [a-zA-Z]+ [a-zA-Z]+ )|( [a-zA-Z]+ [a-zA-Z]+)', line)                    
                    List4.append(match4.group(0).strip())                     
                except:
                    List4.append('NA')
df = pd.DataFrame(np.column_stack([List4,List5,List6]),columns=['a', 'b', 'c'])
df.to_excel('AllBanks.xlsx', index=False)

Please include all relevant information as part of your question. Links to external files are not allowed — pho
– pho, Commented Mar 11, 2022 at 5:09
Pranav ji - the link is to my own file. A sample file which will help in looking at the data. If links are not allowed then what is the function of the link icon in posts? — Shri
– Shri, Commented Mar 11, 2022 at 8:05

bmiller · Accepted Answer · 2022-03-11 05:08:23Z

1

Your text file looks to be fixed width columns - no delimiters. You can use re capture groups like '^(.{20})(.{15})(.{30})'

or you can specify the columns start position and width and use that to splice out the data from each row.

This method will parse 2 columns from each line of your file and return an array of rows, each with an array of columns.

def parse(filename):
    fixed_columns = [[0, 28], [71, 50]] # start pos and width pairs of columns you want
    rows = []
    with open(filename) as file:
        for line in file:
            cols = []
            for start,wid in fixed_columns:
                cols.append(line[start: start+wid].strip())
            rows.append(cols)
    return rows

for row in parse(filename):
    print(", ".join(row))

Output:

TELEGRAPHIC TRANSFER, LIEW WAI KEEN
TELEGRAPHIC TRANSFER, KWAN SANG@KWAN CHEE SANG
TELEGRAPHIC TRANSFER, VICTOR LAU
TELEGRAPHIC TRANSFER, VICTOR LAU

From here you can save the data any way you like.

answered Mar 11, 2022 at 5:08

bmiller

1,8131 gold badge17 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shri Over a year ago

Thanks for your answer! I am not sure what was the issue with Mr. Pranav for downvoting my question. It is so usual stuff in stackoverflow.

Collectives™ on Stack Overflow

python regex: capture different strings line by line from .txt file

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related