0

I am having some trouble working with initializing my data so that I can call specific values by their keys...

This is my code so far:

from kafka import KafkaConsumer
import ast
from collections import namedtuple
import json
import csv
import sys
from datetime import datetime
import os

# connect to kafka topic
kaf = KafkaConsumer('kafka.topic',
                   auto_offset_reset='earliest', bootstrap_servers=['consumer-kafka.server'])
outputfile = 'C:\\Users\\Documents\\KafkaConsum\\file.csv'

outfile = open(outputfile, mode='w', newline='')

for row in kaf:
    a = row.value.decode("utf-8")
        if "TAG_NAME" in a:
            print(a)
            outfile.write(a + '\n')

This is how my data is formatted:

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

2018-12-04 13:27:12,511 [a-1 app=helloname,receiverId=abc-abc-123-123,partner=company] INFO kafka.consumer.topic TAG_NAME Type='Generic App' Class= UpdateCheck description=Version1 appName="TWITTER" appAction="start"

I am looking to be able to parse this data to look like this in my csv file:

app | receiverId | partner | Type | Class | description | appName | appAction |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

helloname | abc-abc-123-123 | company | Generic App | UpdateCheck | Version1 | TWITTER | start |

1

2 Answers 2

1

Here is a solution, but it doesn't use csv (probably should).

It grabs the header and the value in findall(... and then below that, it separates the header from the value (separated by the = sign) and writes the header (one time only) and all of the values.

import re

def main():
    header = True
    fin = open('f3.txt', 'r')
    for line in fin:
        data = re.findall(r'\w+=\s*[\'"]?[\w-]+', line)
        headers = []
        array = []
        for pair in data:
            m = re.search(r'(\w+)=\s*[\'"]?([\w-]+)', pair)
            headers.append(m.group(1)) # get header
            array.append(m.group(2))   # get value

        if header == True:
            print('|'.join(headers))
            header = False
        print('|'.join(array))
    fin.close()

main()

This produced this output:

app|receiverId|partner|Type|Class|description|appName|appAction
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start
helloname|abc-abc-123-123|company|Generic|UpdateCheck|Version1|TWITTER|start
Sign up to request clarification or add additional context in comments.

Comments

0

As Medali has said, you can use regular expression to get the data you want and separate it properly. Something along the lines of;

import re

pattern = r'app=(.*?),'
app = re.search(pattern, a).group(1)

you could actually have a list of those headers you want and make a for loop through the pattern saving it in a dictionary and then write that directly to a csv.

you'll need a new variable csv_outfile or such and change your open variables;

headers = ['app', 'receiverid', .... , 'appAction']
outfile = open(outputfile, mode='wb')
csv_outfile = csv.DictWriter(outfile, headers, delimiter = '|')
csv_outfile.writeheader()


for header in headers:
    pattern = header + r'=(.*?),'
    my_dict[header] = re.search(pattern, a).group(1)
csv_outfile.writerow(my_dict)

I think this answers your questions?

2 Comments

I attempted using this but I keep getting the errors AttributeError: 'NoneType' object has no attribute 'group' and TypeError: unhashable type: 'list'. I did make a few modifications such as adding my_dict = {} and my_dict[headers] = re.search(pattern, str(a)).group(1)
AttributeError: 'NoneType' object has no attribute 'group' means that you are not getting any results from the search, make sure the header is correct. Do you know where the TypeError: unhashable type: 'list'. is coming from in the code?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.