0

I've been trying to read a CSV file into a HTML table through Python for a little while now. Currently my code looks like:

import csv

table = ''
with open("2016motogp.csv", encoding="utf8") as csvFile:
    reader = csv.DictReader(csvFile, delimiter=',')
    table = '<tr>{}</tr>'.format(''.join(['<td>{}</td>'.format(header) for header in reader.fieldnames]))
    for row in reader:
        table_row = '<tr>'
        for fn in reader.fieldnames:
            table_row += '<td>{}<\td>'.format(row[fn])
        table_row += '<\tr>'
        table += table_row

This is the output of the written table: https://www.w3schools.com/code/tryit.asp?filename=FG5TPW9EY3LT

It has got all the HTML table tags throughout along with a few errors in names and odd additions that shouldn't be there. The header line is clean besides the odd addition in front of the year cell.

Here is a link to the csv: https://uploadfiles.io/6joj6

If anyone could help to 'clean up' the table by adjusting the code it would be much appreciated. Thanks in advance,

EDIT: Thanks for the help, the html tags were rectified by correcting the backslash to forward-slashes, the addition to the year cell was corrected by changing the encoding option. I discovered that the \xa0 was an encoding error or something along those lines and used: table = table.replace(u'\xa0', u' ') to replace the additions.

2 Answers 2

1

Do not generate HTML "by hand", use dominate module instead. Much easier and more robust. Also, you have two typos in your code - <\tr> should be </tr>, and <\td> should be </td>.

Sign up to request clarification or add additional context in comments.

2 Comments

Wow, I can't believe I missed that. The table looks much cleaner now. Thankyou. Unfortunately the task requires HTML to be generated manually. Are you able to give any advice on the random additions within the rider column and year cell. Clean table for reference: w3schools.com/code/tryit.asp?filename=FG5UC7Q6RE55
You actually have those characters in your CSV file. I mean, each field in "rider" column starts with a whitespace character which is not space (ASCII 32) but rather non-breaking space (Unicode 160, or 0xA0 in hex). What you do about them (remove, convert to space, or something else) is entirely up to you.
1

As @błotosmętek already mentioned, you have <\ instead of </ in some HTML tags.

Regarding the strange additions, it looks like the CSV is not UTF-8, it's UTF-8 with BOM. Try open("2016motogp.csv", encoding="utf-8-sig").

3 Comments

Thanks, this has help with the additions in the year cell. The rider column is still full with " \xa0" before each name. I was unsure of the encoding and have a very vague understanding of the concept.
That is a no breaking space, I would argue that it shouldn't be there. You can use strip: table_row += '<td>{}</td>'.format(row[fn].strip())
Thanks, I did some quick research and came to the same conclusion. I just used 'table = table.replace(u'\xa0', u' ')' to remove it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.