Working with named tuples to output specific data

Question

I am having some trouble working with initializing my data so that I can call specific values by their keys...

This is my code so far:

from kafka import KafkaConsumer
import ast
from collections import namedtuple
import json
import csv
import sys
from datetime import datetime
import os

# connect to kafka topic
kaf = KafkaConsumer('kafka.topic',
                   auto_offset_reset='earliest', bootstrap_servers=['consumer-kafka.server'])
outputfile = 'C:\\Users\\Documents\\KafkaConsum\\file.csv'

outfile = open(outputfile, mode='w', newline='')

for row in kaf:
    a = row.value.decode("utf-8")
        if "TAG_NAME" in a:
            print(a)
            outfile.write(a + '\n')

This is how my data is formatted:

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

I am looking to be able to parse this data to look like this in my csv file:

app | receiverId | partner | Type | Class | description | appName | appAction |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

You can use regular expression to extract the data from each line (example stackoverflow.com/questions/30627810/…) — Mohamed Ali JAMAOUI
– Mohamed Ali JAMAOUI, Commented Dec 7, 2018 at 16:32

Chris Charley · Accepted Answer · 2018-12-08 17:01:27Z

Here is a solution, but it doesn't use csv (probably should).

It grabs the header and the value in findall(... and then below that, it separates the header from the value (separated by the = sign) and writes the header (one time only) and all of the values.

import re

def main():
    header = True
    fin = open('f3.txt', 'r')
    for line in fin:
        data = re.findall(r'\w+=\s*[\'"]?[\w-]+', line)
        headers = []
        array = []
        for pair in data:
            m = re.search(r'(\w+)=\s*[\'"]?([\w-]+)', pair)
            headers.append(m.group(1)) # get header
            array.append(m.group(2))   # get value

        if header == True:
            print('|'.join(headers))
            header = False
        print('|'.join(array))
    fin.close()

main()

This produced this output:

app|receiverId|partner|Type|Class|description|appName|appAction
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start

SRT HellKitty · Accepted Answer · 2018-12-07 19:22:53Z

0

As Medali has said, you can use regular expression to get the data you want and separate it properly. Something along the lines of;

import re

pattern = r'app=(.*?),'
app = re.search(pattern, a).group(1)

you could actually have a list of those headers you want and make a for loop through the pattern saving it in a dictionary and then write that directly to a csv.

you'll need a new variable csv_outfile or such and change your open variables;

headers = ['app', 'receiverid', .... , 'appAction']
outfile = open(outputfile, mode='wb')
csv_outfile = csv.DictWriter(outfile, headers, delimiter = '|')
csv_outfile.writeheader()


for header in headers:
    pattern = header + r'=(.*?),'
    my_dict[header] = re.search(pattern, a).group(1)
csv_outfile.writerow(my_dict)

I think this answers your questions?

answered Dec 7, 2018 at 19:22

SRT HellKitty

5973 silver badges10 bronze badges

2 Comments

j.Doe Over a year ago

I attempted using this but I keep getting the errors AttributeError: 'NoneType' object has no attribute 'group' and TypeError: unhashable type: 'list'. I did make a few modifications such as adding my_dict = {} and my_dict[headers] = re.search(pattern, str(a)).group(1)

SRT HellKitty Over a year ago

AttributeError: 'NoneType' object has no attribute 'group' means that you are not getting any results from the search, make sure the header is correct. Do you know where the TypeError: unhashable type: 'list'. is coming from in the code?

Collectives™ on Stack Overflow

Working with named tuples to output specific data

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related