I have a log file in the below format. For each line I need to capture 3rd column e.g 0102b69880c4b330, corresponding message DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG and their respective counts (please see the output). I thought using regular expressions makes solution easier for me.
Explanation:
Case 1: The ID 0102b69880c4b330 occured 3 times (line 1, 2, 3). So the count for ID is 3 and the corresponding message DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG also occurred 3 times, so the count 3.
Case 2: Now the ID 0102b69880c4e3b2 in 4th and 5th line has two different messages JMS DO_METHOD TRACE LAUNCH, DO_METHOD TRACE LAUNCH, the ID count is 2 but the count for their message should be 1, 1 respectively.
Case 3: The ID 0102b6988000000c in 10th line to last line has message DM_WORKFLOW_E_PROCESS_AUTO_TASK. The ID count is 3, message count is 3. But here I need to take the process task id and workflow id which is next to this error message.
I used [Ignore for this] in output to just explain I don't need id's.
And finally I also need to maintain the total count of DM_WORKFLOW_E_PROCESS_AUTO_TASK.
Input:
2019-05-05T00:05:11.507245 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info: Attempting to status Index Agent Instance host-address_9200_IndexAgent
2019-05-05T00:05:11.759829 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : Response from HTTP_POST command: HTTP/1.1 200 OK Status: 0 , Time Taken: 0 seconds.
2019-05-05T00:05:11.759898 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : HTTP_POST with args -command status -docbase SubWayX -user dm_fulltext_index_user -ticket ****** -instance host-address_9200_IndexAgent -details false to Index Agent host-address_9200_IndexAgent is successful.
2019-05-05T01:40:53.148751 20135[20135] 0102b69880c4e3b2 JMS DO_METHOD TRACE LAUNCH: do_method launch: successful: user: Xie Xiaoke, session id: 0102b69880c4e3b2, JMS id: 0802b69880003535, method: D2LifecycleChangeStateMethod, host:host-address.net, port:9082, path:/DmMethods/servlet/DoMethod
2019-05-05T01:40:53.148877 20135[20135] 0102b69880c4e3b2 DO_METHOD TRACE LAUNCH: method launch: successful, user: Xie Xiaoke, session id: 0102b69880c4e3b2, method: D2LifecycleChangeStateMethod
2019-05-07T05:42:21.171087 22484[22484] 0102b6988000000b [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: "Workflow Agent failed to process task 4a02b698800aad04 of workflow 4d02b6988000f709. The task is using method 'D2WFLifeCycleMethod'. Activity: 'Demote to Draft with new Version'. Check the Java Method Server log for errors."
2019-05-05T05:24:48.483966 17114[17114] 0102b69880c4fb1e JMS DO_METHOD TRACE LAUNCH: user: dmadmin, session id: 0102b69880c4fb1e, JMS id: 0802b69880003535, method: D2LifecycleChangeStateMethod, host:host-address.net, port:9082, path:/DmMethods/servlet/DoMethod, arguments:-method_verb com.emc.d2.api.methods.D2Method -class_name com.emc.d2.api.methods.D2LifecycleChangeStateMethod -__dm_docbase__ SubWayX -__dm_server_config__ host-address_SubWayX -docbase_name SubWayX -user_name dmadmin -method_return_id "0802b6988167b46e" -locale en
2019-05-05T05:24:50.362650 17114[17114] 0102b69880c4fb1e JMS DO_METHOD TRACE LAUNCH: do_method launch: successful: user: dmadmin, session id: 0102b69880c4fb1e, JMS id: 0802b69880003535, method: D2LifecycleChangeStateMethod, host:host-address.net, port:9082, path:/DmMethods/servlet/DoMethod
2019-05-05T05:24:50.362702 17114[17114] 0102b69880c4fb1e DO_METHOD TRACE LAUNCH: method launch: successful, user: dmadmin, session id: 0102b69880c4fb1e, method: D2LifecycleChangeStateMethod
2019-05-05T05:44:35.410674 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: "Workflow Agent failed to process task 4a02b698800a977c of workflow 4d02b698800107e9. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs."
2019-05-05T05:50:31.383668 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: "Workflow Agent failed to process task 4a02b698800a9782 of workflow 4d02b6988001081e. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs."
2019-05-05T05:53:49.978053 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: "Workflow Agent failed to process task 4a02b698800a9784 of workflow 4d02b6988001081c. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs."
2019-05-05T00:50:11.761273 2591[2591] 0102b69880c4ccde [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info: Attempting to status Index Agent Instance phchbs-sp220333_9200_IndexAgent
2019-05-05T00:50:12.015521 2591[2591] 0102b69880c4ccde [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : Response from HTTP_POST command: HTTP/1.1 200 OK Status: 0 , Time Taken: 1 seconds.
2019-05-05T00:50:12.015563 2591[2591] 0102b69880c4ccde [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : HTTP_POST with args -command status -docbase SubWayX -user dm_fulltext_index_user -ticket ****** -instance phchbs-sp220333_9200_IndexAgent -details false to Index Agent phchbs-sp220333_9200_IndexAgent is successful.
I need to get the below output:
Output:
ID: Count: Message: Corresponding Message Count Task ID: Workflow ID
0102b69880c4b330 3 DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG 3 [Ignore for this] [Ignore for this]
0102b69880c4e3b2 2 JMS DO_METHOD TRACE LAUNCH, DO_METHOD TRACE LAUNCH 1, 1 [Ignore for this] [Ignore for this]
0102b6988000000b 1 DM_WORKFLOW_E_PROCESS_AUTO_TASK 1 4a02b698800aad04 4d02b6988000f709
0102b69880c4fb1e 3 JMS DO_METHOD TRACE LAUNCH, DO_METHOD TRACE LAUNCH 2, 1 [Ignore for this] [Ignore for this]
0102b6988000000c 3 DM_WORKFLOW_E_PROCESS_AUTO_TASK 3 4a02b698800a977c, 4a02b698800a9782, 4a02b698800a9784 4d02b698800107e9, 4d02b6988001081e, 4d02b6988001081c
The program I have tried for testing is below. I have not properly used regex after ID column, I just picked the value that contains value within [], but it skips that doesn't. And it doesn't pick process task id and workflow id as well. Can you help me how to modify my code to get the proper count, task id and workflow id?
import re
import collections
regexp = re.compile(
r'(?P<date>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}\.[0-9]{6}\s*)'+
'(?P<un_num>[0-9]{3,5}\[[0-9]{3,5}\]\s*)'+
'(?P<id>[a-z0-9]{16}\s*)'+
'(?P<message>\[(.*?)\])'
)
ls = ["2019-05-05T00:05:11.507245 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info: Attempting to status Index Agent Instance host-address_9200_IndexAgent",
"2019-05-05T00:05:11.759829 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : Response from HTTP_POST command: HTTP/1.1 200 OK Status: 0 , Time Taken: 0 seconds.",
"2019-05-05T00:05:11.759898 12090[12090] 0102b69880c4b330 [DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG] info : HTTP_POST with args -command status -docbase SubWayX -user dm_fulltext_index_user -ticket ****** -instance host-address_9200_IndexAgent -details false to Index Agent host-address_9200_IndexAgent is successful.",
"2019-05-05T01:40:53.148751 20135[20135] 0102b69880c4e3b2 JMS DO_METHOD TRACE LAUNCH: do_method launch: successful: user: Xie Xiaoke, session id: 0102b69880c4e3b2, JMS id: 0802b69880003535, method: D2LifecycleChangeStateMethod, host:host-address.net, port:9082, path:/DmMethods/servlet/DoMethod",
"2019-05-05T01:40:53.148877 20135[20135] 0102b69880c4e3b2 DO_METHOD TRACE LAUNCH: method launch: successful, user: Xie Xiaoke, session id: 0102b69880c4e3b2, method: D2LifecycleChangeStateMethod",
"2019-05-07T05:42:21.171087 22484[22484] 0102b6988000000b [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: 'Workflow Agent failed to process task 4a02b698800aad04 of workflow 4d02b6988000f709. The task is using method 'D2WFLifeCycleMethod'. Activity: 'Demote to Draft with new Version'. Check the Java Method Server log for errors.'",
"2019-05-05T05:44:35.410674 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: 'Workflow Agent failed to process task 4a02b698800a977c of workflow 4d02b698800107e9. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs.'",
"2019-05-05T05:50:31.383668 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: 'Workflow Agent failed to process task 4a02b698800a9782 of workflow 4d02b6988001081e. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs.'",
"2019-05-05T05:53:49.978053 12791[12791] 0102b6988000000c [DM_WORKFLOW_E_PROCESS_AUTO_TASK]error: 'Workflow Agent failed to process task 4a02b698800a9784 of workflow 4d02b6988001081c. The task is using method 'D2WFLifeCycleMethod'. Activity: 'validate entry conditions for Effective'. Method timed out within 60 secs.'"
]
id_counter = collections.Counter()
message_counter = collections.Counter()
print("started......!!!!!")
for i in range(len(ls)):
x = regexp.match(ls[i])
y = re.search(regexp, ls[i])
if x is None or y is None:
print("None")
continue
print("-----------------")
print(y.group('date'))
print(y.group('un_num'))
print(y.group('id'))
id_counter.update([y.group('id')])
print(y.group('message'))
message_counter.update([y.group('message')])
print("end....!!!")
print(id_counter)
print(message_counter)
def print_counts(cdict):
for key, values in enumerate(cdict.items()):
print(key, values)
print_counts(id_counter)
print_counts(message_counter)
The output for this is:
started......!!!!!
-----------------
2019-05-05T00:05:11.507245
12090[12090]
0102b69880c4b330
[DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG]
-----------------
2019-05-05T00:05:11.759829
12090[12090]
0102b69880c4b330
[DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG]
-----------------
2019-05-05T00:05:11.759898
12090[12090]
0102b69880c4b330
[DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG]
None
None
-----------------
2019-05-07T05:42:21.171087
22484[22484]
0102b6988000000b
[DM_WORKFLOW_E_PROCESS_AUTO_TASK]
-----------------
2019-05-05T05:44:35.410674
12791[12791]
0102b6988000000c
[DM_WORKFLOW_E_PROCESS_AUTO_TASK]
-----------------
2019-05-05T05:50:31.383668
12791[12791]
0102b6988000000c
[DM_WORKFLOW_E_PROCESS_AUTO_TASK]
-----------------
2019-05-05T05:53:49.978053
12791[12791]
0102b6988000000c
[DM_WORKFLOW_E_PROCESS_AUTO_TASK]
end....!!!
Counter({'0102b69880c4b330\t': 3, '0102b6988000000c\t': 3, '0102b6988000000b ': 1})
Counter({'[DM_WORKFLOW_E_PROCESS_AUTO_TASK]': 4, '[DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG]': 3})
0 ('0102b69880c4b330\t', 3)
1 ('0102b6988000000b ', 1)
2 ('0102b6988000000c\t', 3)
0 ('[DM_FT_INDEX_T_INIT_INDEX_AGENT_MSG]', 3)
1 ('[DM_WORKFLOW_E_PROCESS_AUTO_TASK]', 4)