Write list of tuples of one different item length to CSV file using Python

Question

I have the following list having the structure [(int, [], []), (int, [], []), (int, [], []), ..., (int, [], [])]. [] represents tokens of a sentence.

data = [(11.221, ['Maruyama', '(', 'Japan', ')'], ['S-PER', 'O', 'S-LOC', 'O']), 
        (5.56, ['MANAMA', '1996-08-22'], ['S-LOC', 'O']), 
        (5.381, ['BEIJING', '1996-08-22'], ['S-LOC', 'O'])]

I want to write data into a CSV file as follows:

11.221, Maruyama (Japan)  , Maruyama  , S-PER
                            (,        , O 
                            Japan,    , S-LOC
                            ),        , O
[HERE SHOULD BE SPACE]
5.56  , MANAMA 1996-08-22 , MANAMA    , S-LOC
                          , 1996-08-22, O
[HERE SHOULD BE SPACE]
5.381 , BEIJING 1996-08-22, BEIJING   , S-LOC
                          , 1996-08-22, O

CSV file has the format:

int, sentence (concatenated tokens), token_1, tag_1
                                   , token_2, tag_2
                                   , ...

I have tried the following but didn't work for me properly.

import csv
with open('output.csv','w') as f:
    for x in  [tuple(zip(x[0], x[1], x[2])) for x in data]:
        for r in x:
            f.write(' '.join(r) + '\n')
        f.write('\n')

Traceback: TypeError: 'float' object is not iterable

I also aimed to do as follows:

data = [(value, ' '.join(sent), sent, tag) for value, sent, tag in data]

to start from then I tried the following.

with open('output.csv', 'w') as f:
    writer = csv.writer(f , lineterminator='\n')
    for value, sent, tokens, tags in data:
        writer.writerow(value)
        writer.writerow(sent)
        for x in  [tuple(zip(tokens, tags))]:
            for r in x:
                writer.writerow(' '.join(r) + '\n')
            writer.writerow('\n')

Traceback: Error: iterable expected, not float

Does it have to be CSV format? Why not use something like JSON so it can be more easily loaded back into a program? — Kraigolas
– Kraigolas, Commented Jun 18, 2021 at 0:47
@Kraigolas, not necessary to be in CSV format. A JSON format is also good to be easy to load back into a program. — joe
– joe, Commented Jun 18, 2021 at 2:09

enzo · Accepted Answer · 2021-06-18 00:45:48Z

You can do something like that:

# Create a class to store each text information
class Text:
    def __init__(self, code, tokens, tags):
        self.code = code
        self.tokens = tokens
        self.tags = tags
        
        # Concatenate the tokens to create a sentence
        self.sentence = ' '.join(tokens)
        
def write_to_file(data, f):
    # Convert all the data to Text objects
    texts = [Text(code, tokens, tags) for code, tokens, tags in data]
    
    # Find the maximum column width for each row
    widths = {}
    widths["code"] = max(len(str(text.code)) for text in texts)
    widths["sentence"] = max(len(str(text.sentence)) for text in texts)
    widths["token"] = max(len(str(token)) for text in texts for token in text.tokens)
    widths["tag"] = max(len(str(tag)) for text in texts for tag in text.tags)
    
    for text in texts:
        # Print the code with the code column width
        # Note that this print ends with ', ', which have
        # length 2. This will be used later.
        print(f"{text.code}".ljust(widths["code"], ' '), file=f, end=', ')
        
        # Print the sentence with the sentence column width
        # Note that this print also ends with ', ', which have
        # length 2. This will also be used later.
        print(f"{text.sentence}".ljust(widths["sentence"], ' '), file=f, end=', ')
        
        for i, (token, tag) in enumerate(zip(text.tokens, text.tags)):
            # If it's not the first line of the file
            if i != 0:
                # Print, as spaces, the code column width added to the
                # sentence column width, separated by 2 spaces each 
                print(" " * (widths["code"] + 2 + widths["sentence"] + 2), file=f, end='')

            # Print the token with the token column width
            print(f"{token}".ljust(widths["token"]), file=f, end=', ')
            
            # Print the tag with the tag column width
            print(f"{tag}".ljust(widths["tag"]), file=f, end='')
            
            print(file=f)

        print(file=f)

Usage:

data = [(11.221, ['Maruyama', '(', 'Japan', ')'], ['S-PER', 'O', 'S-LOC', 'O']), 
        (5.56, ['MANAMA', '1996-08-22'], ['S-LOC', 'O']), 
        (5.381, ['BEIJING', '1996-08-22'], ['S-LOC', 'O'])]

with open('file.txt', 'w+') as f:
    write_to_file(date, f)

The content of the file will be

11.221, Maruyama ( Japan ), Maruyama  , S-PER
                            (         , O    
                            Japan     , S-LOC
                            )         , O    

5.56  , MANAMA 1996-08-22 , MANAMA    , S-LOC
                            1996-08-22, O    

5.381 , BEIJING 1996-08-22, BEIJING   , S-LOC
                            1996-08-22, O

Collectives™ on Stack Overflow

Write list of tuples of one different item length to CSV file using Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related