0

**This is my python code, I'm trying to convert NGINX logs.

I'm reading logs from access.log file and using regular expressions to convert it into JSON format and i need to upload these logs to Elasticseach. Please also guide related to that. I'm new into both**

 import json 
 import re

 i = 0
 result = {}

with open('access.log') as f:
  lines = f.readlines()


regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

for line in lines:

  r = re.match(regex,line)

  if len(r) >= 6:
    result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status': 
                 r[3], 'Browser Info': r[4]}
    i += 1
 print(result) 

with open('data.json', 'w') as fp:
 json.dump(result, fp)

I'm facing the following error

Traceback (most recent call last):
   File "/home/zain/Downloads/stack.py", line 17, in <module>
    if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()

These are log format

127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

Expected output is

IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14  HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437  Browser Info: "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
12
  • @barmar kindly guide me related to this Commented May 26, 2022 at 21:22
  • it doesn't look like 'r' has assigned value from the regex. What do you get if you print(r)? Commented May 26, 2022 at 21:28
  • @CaptainCaveman it shows nothing and same output error as mentioned in questioned Commented May 26, 2022 at 21:32
  • Yes, that is why if len(r) >= 6 is returning an error. You can't check the len() of something doesn't have a value. So, which part of each line in the log are you trying to extract with the regex? Commented May 26, 2022 at 21:33
  • 1
    Welcome to Stack Overflow! Please don't vandalize your posts. By posting on the Stack Exchange network, you've granted a non-revocable right, under the CC BY-SA 4.0 license, for Stack Exchange to distribute that content (i.e. regardless of your future choices). By Stack Exchange policy, the non-vandalized version of the post is the one which is distributed, and thus, any vandalism will be reverted. If you want to know more about deleting a post please see: How does deleting work?. Commented May 28, 2022 at 1:17

1 Answer 1

1

I took my cue from this code. Believe the following should do it:

import json 
import re

i = 0
result = {}

with open('access.log') as f:
    lines = f.readlines()

regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'

for line in lines:

    r = re.match(regex,line)
    
    if r != None:
        result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'), 
                     'HTTP status': r.group('httpstatus'), 'Return status': 
                     r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
        i += 1
    
print(result)

with open('data.json', 'w') as fp:
    json.dump(result, fp) 

Result (print(json.dumps(result, sort_keys=False, indent=4))):

{
    "0": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET / HTTP/1.1",
        "Return status": "200 3437",
        "Browser Info": "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "1": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "2": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /favicon.ico HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you it worked. Do you know how can i upload it to elasticsearch?
I'm not familiar with elasticsearch, but you should probably be able to find the answer here on SO already, E.g. this post?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.