I’m updating a Python script from 2 to 3. It reads in a manifest (i.e., [batchdate]xmlList.xml), iterates through each XML file identified in the manifest, collects stats, then outputs a stats file in tab-delimited text format. The formatting and encoding of the tab file is off, and I can’t figure out how to fix it.
for encoding in utf-8:
class UnicodeWriter:
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.queue = StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
def writerow(self, row):
self.writer.writerow([str(s).encode("utf-8") for s in row])
data = self.queue.getvalue()
self.stream.write(data)
self.queue.truncate(0)
read in xmllist.xml manifest:
xmlListPath = input('Enter the filepath of the xmlList.xml file: ').replace('"', '')
xmlListFile = codecs.open(xmlListPath)
xmlList = etree.parse(xmlListFile)
listRoot = xmlList.getroot()
xmlListFile.close()
create stats file and write header:
batchID = path.split(xmlListPath)[1]
statsFile = 'S:/Metadata/ETD/Documentation/Statistics/' + batchID.replace('xmlList.xml', '.stats.txt')
stats = open(statsFile, 'w')
wtrStats = UnicodeWriter(stats, delimiter='\t')
statsHeader = ['Author', 'Degree', 'Department', 'Embargo Start Date', 'Date Web Available',
'Embargo Code', 'Identifier', 'PURL', 'Title', 'Comments']
wtrStats.writerow(statsHeader)
Here is how the tab file is coming out:
b'Author' b'Degree' b'Department' b'Embargo Start Date' b'Date Web Available' b'Embargo Code' b'Identifier' b'PURL' b'Title' b'Comments'
b'Confer, Matthew Phelan' b'Ph.D.' b'Chemical & Biological Engineering' b'01/01/2021' b'01/01/2026' b'4' b'u0015_0000001_0003682' b'http://purl.lib.ua.edu/177826' b'EXPERIMENTAL AND COMPUTATIONAL STUDIES OF MATERIALS DECOMPOSITION' b''
Thanks for any help.
self.writer.writetrow()? That is what puts theband apostrophes with each column header.