0

I am trying to find and plot the hourly error rate in a custom log file that looks like this.

<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>
<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>
<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>
<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>

I would like to plot the number of "500 Internal Server Error" in every hour. I tried to parse this log into a pandas dataframe using the following:

import pandas as pd
from pandas.compat import StringIO


tmp=u"""<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>
<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>
<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>
 --ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>
<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>"""

df = pd.read_csv(StringIO(tmp), comment=' --', sep='0> ', names=['Time','Text'])
indexNames = df[ (df['Time'].str.startswith(' --')) ].index
df.drop(indexNames , inplace=True)

# remove < by strip and convert column Time to_datetime:
df.Time = pd.to_datetime(df.Time.str.strip('<'), format='%Y-%m-%dT%H:%M:%S.%f+0000')
df.Text = df.Text.str.strip()

print (df)
print (df.dtypes)

For some reason I am unable to remove rows from the dataframe.

I am using pandas 0.24.2 with Python 3.7.3 Any ideas?

2 Answers 2

1

The affected rows does not start with any spaces. Replace startswith(' --') with startswith('--'):

indexNames = df[ (df['Time'].str.startswith('--')) ].index

On a side note, your comment=' --' parameter in pd.read_csv() doesn't work. According to the docs,

comment : str, optional

Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character.

Sign up to request clarification or add additional context in comments.

1 Comment

You are correct. And thanks for the comment on comment :-) Very useful.
1

I am trying to find and plot the hourly error rate in a custom log file that looks like this.

Log files naturally contain the end of the line information. So if you have access to the log file, I recommend directly processing over the log file.

errors = []
with open("log.txt", "r") as log:
    for line in log:
        if "500 Internal Server Error" in line:
            errors.append(datetime.strptime(line.strip().split()[0], '<%Y-%m-%dT%H:%M:%S.%f+0000>'))

df = pd.DataFrame({'Time': errors})

Testing

log = [
"""<2019-12-19T16:02:14.776+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>""",
"""--ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: DeploymentMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1]  --Transport info: HTTP method: POST ,URL: https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:02:14 GMT , Content-Type=text/html   <Ref: WKWLUTVWBDJ2GAGOIRZL5VHMK3ZTJIWX>""",
"""<2019-12-19T16:04:12.242+0000> WARNING <ACTION.JCB> <e646de45-4c09-4a6e-a5d1-3552db8fc0dc-0000000b> Failed to get business interface of sdpinternal.messaging.management.em.ServerTargetImpl, class will not be monitored.  <Ref: N2IOE3DZNWKAYSYLJTN5AWRQJP7DKMTS>""",
"""<2019-12-19T16:04:14.745+0000> WARNING <clientToServer.smaap> Response NOT OK --Transport content: <html><head><title>500 Internal Server Error</title></head><body bgcolor="white"><center><h1>500 Internal Server Error</h1></center><hr><center> </center></body></html>""",
"""--ServiceInfo: PostDataService - client: A201ACCDC3DAB47C0B4BF021D11785DFD49F1863 ,tenant: 19b0be25fd5248588f0631a820a43c88 ,payloadType: apm_metric ,messageForClient: false Observations: HostMetric[1] DeploymentMetric[1] JVMMetric[1] InfrastructureMetric[1] MappingMetric[1] RequestTypeMetric[1] LinkMetric[1] ThreadPoolMetric[1] AppServerMetric[1] ConnectionPoolMetric[1]  --Transport info: HTTP method: POST ,URL: https://omc-19b0be25fd5248588f0631a820a43c88.api.omc.ocp.oraclecloud.com/static/data.storage/apm_metric ,response status: 500 ,response headers: Connection=keep-alive , Content-Length=182 , Date=Thu, 19 Dec 2019 16:04:14 GMT , Content-Type=text/html   <Ref: PU5HERXNVLSVAG33LIOPJVKYSZJC4R73>""",
"""<2019-12-19T16:04:14.753+0000> WARNING <clientToServer.transport> Error connecting to https://oc-19b0be25fd5248588f0631a820a43c88.api.smouloud.com/static/data.storage/m_metric  <Ref: NL6XDJAALZ23BM4PPRRFWRFFBC6KLYSE>"""
]

errors = []

#with open("log.txt", "r") as log:
for line in log:
    if "500 Internal Server Error" in line:
        errors.append(datetime.strptime(line.strip().split()[0], '<%Y-%m-%dT%H:%M:%S.%f+0000>'))

df = pd.DataFrame({'Time': errors})

1 Comment

Nice. That's a super, easy way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.