0

I am trying to remove non-ascii characters from a file. I am actually trying to convert a text file which contains these characters (eg. hello§‚å½¢æˆ äº†å¯¹æ¯”ã€‚ 花å) into a csv file.

However, I am unable to iterate through these characters and hence I want to remove them (i.e chop off or put a space). Here's the code (researched and gathered from various sources)

The problem with the code is, after running the script, the csv/txt file has not been updated. Which means the characters are still there. Have absolutely no idea how to go about doing this anymore. Researched for a day :(

Would kindly appreciate your help!

import csv

txt_file = r"xxx.txt"
csv_file = r"xxx.csv"

in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
for row in in_txt:
    for i in row:
        i = "".join([a if ord(a)<128 else''for a in i])

out_csv.writerows(in_txt)
4
  • Because you never update in_txt (the content that you're outputting to the csv. i is a copy of the row, it's not a pointer to the original row in in_txt Commented May 26, 2016 at 9:53
  • 2
    Strings are immutable in python and assignment does not mutate a value in place, it reassigns the name to reference the now assigned object. So as @Torxed pointed out, you never actually update anything. Commented May 26, 2016 at 9:53
  • hey @Torxed and ilja, sorry for sounding stupid but, I thought by 'updating' the i, I have already updated in_txt? May I ask how to update in_txt? Commented May 26, 2016 at 9:55
  • 1
    @Bread Either you would need to use out_csv.write(...) for each for row loop, or you'd have to save each row in a output buffer that you write instead of in_txt, the later being the better performance and disk IO wise. Commented May 26, 2016 at 9:58

1 Answer 1

3

Variable assignment is not magically transferred to the original source; you have to build up a new list of your changed rows:

import csv

txt_file = r"xxx.txt"
csv_file = r"xxx.csv"

in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
out_txt = []
for row in in_txt:
    out_txt.append([
        "".join(a if ord(a) < 128 else '' for a in i)
        for i in row
    ]

out_csv.writerows(out_txt)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.