1

I have a CSV file which contains lines of sql entries. Each sql entry, contains message which are SOH ("\x01") delimited and are tag=value pairs.

8=Fix1.1<SOH>9=70<SOH>35=AE<SOH>10=237
8=Fix1.1<SOH>9=71<SOH>35=AE<SOH>10=238
8=Fix1.1<SOH>9=72<SOH>35=AE<SOH>10=239
8=Fix1.1<SOH>9=73<SOH>35=AE<SOH>10=240

(<SOH> is a placeholder for the actual character because Stack Overflow wouldn't let me include the \x01 character in the text)

Issue:

  • The below code snippet removes SOH to "," as expected, however, having trouble removing the tag part from the lines.
# Read in the file
with open('file.txt', 'r') as file :
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('x01', ',') 
filedata2 = filedata.replace("=", ",") 
# Write the file out again
with open('file.txt', 'w') as file:
  file.write(filedata2)

Output:

8,Fix1.1,9,70,35,AE,10,237
8,Fix1.1,9,71,35,AE,10,238
8,Fix1.1,9,72,35,AE,10,239
8,Fix1.1,9,73,35,AE,10,240

I've also tried regex = re.compile ("[=]") and then loop into line reader and modify, but just returns all [=] in the print.

Desired output:

Fix1.1,70,AE,237

Fix1.1,71,AE,238

Fix1.1,72,AE,239

Fix1.1,73,AE,240

1 Answer 1

2

Use csv.reader with delimiter="\x01" to split by the SOH character. Then, as you read each line, split each element by "=" and keep only the values.

import csv

filedata = []
with open('file.txt', 'r') as file:
    reader = csv.reader(file, delimiter="\x01")
    for row in reader:
        # Split each item in row
        # Keep only second element of each split
        values = [item.split("=", 1)[1] for item in row]
        filedata.append(values)

print(filedata)

which gives

[['Fix1.1', '70', 'AE', '237'], 
 ['Fix1.1', '71', 'AE', '238'], 
 ['Fix1.1', '72', 'AE', '239'], 
 ['Fix1.1', '73', 'AE', '240']]

You can write this list of lists to a file using csv.writer.writerows().

with open('outfile.txt', 'w') as f:
    w = csv.writer(f)
    w.writerows(filedata)

and your output file has:

Fix1.1,70,AE,237
Fix1.1,71,AE,238
Fix1.1,72,AE,239
Fix1.1,73,AE,240

If all you care about is to read the original file and write the new file, you can combine these two operations in one loop:

import csv

filedata = []
with open('file.txt', 'r') as file_in, open('outfile.txt', 'w') as file_out:
    reader = csv.reader(file_in, delimiter="\x01")
    writer = csv.writer(file_out)
    for row in reader:
        # Split each item in row
        # Keep only second element of each split
        values = [item.split("=", 1)[1] for item in row]
        # filedata.append(values)
        # Instead of appending to a container list,
        # Write to the output file
        writer.writerow(values)

Sign up to request clarification or add additional context in comments.

8 Comments

Many thanks @Pranav Hosangadi, can confirm this works. One question please, how can I keep the data in csv file as opposed to list? Used the below code, but got TypeError: a bytes-like object is required, not 'str'. So this CSV was generated by query to DB - output in tuple/list - which is converted to csv. So trying to keep in csv format as I can use for line by line comparison with another csv as opposed to list - csv module easier to control.
@test11 I'm not sure what you mean by that. When you use writerows() it's written to an output file. How did you manage to get that error?
with open('test.csv', 'wb') as f: write = csv.writer(f) write.writerow(filedata) write.writerows(row)
writerows() (notice the S at the end of the function) takes multiple rows, like filedata. writerow() takes a single row, so you have to iterate over each row in filedata before using writerow() (Or just use writerows(filedata)). Which line throws the error? @test11
Many thanks for your help @PranavHosangadi - much appreciated!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.