Say I have a file that looks like this:
'2021-06-23T08:02:08Z UTC [ db=dev LOG: BEGIN;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: SET datestyle TO ISO;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: SET TRANSACTION READ ONLY;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: SET STATEMENT_TIMEOUT TO 300000;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: /* hash: 8d9692aa66628f2ea5b0b9de8e4ea59b */
SELECT action,
status,
COUNT(*) AS num_req
FROM stl_datashare_changes_consumer
WHERE actiontime > getdate() - INTERVAL '1 day'
GROUP BY 1,2;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: SELECT pg_catalog.stll_datashare_changes_consumer.action AS action, pg_catalog.stll_datashare_changes_consumer.status AS status, COUNT(*) AS num_req FROM pg_catalog.stll_datashare_changes_consumer WHERE pg_catalog.stll_datashare_changes_consumer.actiontime > getdate() - interval '1 day'::Interval GROUP BY 1, 2;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: COMMIT;
'2021-06-23T08:02:08Z UTC [ db=dev LOG: SET query_group to ''
'2021-06-23T08:02:22Z UTC [ db=dev LOG: SELECT 1
'2021-06-23T08:02:30Z UTC [ db=dev LOG: /* hash: 64f5dca78e917617f51632257854cb2f */
WITH per_commit_info AS
(
SELECT date_trunc('day', startwork) AS day,
c.xid,
SUM(num_metadata_blocks_retained) AS sum_retained,
SUM(total_metadata_blocks) AS sum_total,
AVG(num_metadata_blocks_retained) AS avg_retained,
AVG(total_metadata_blocks) AS avg_total
FROM stl_commit_stats c,
stl_commit_internal_stats i
WHERE c.xid = i.xid
< ...even more sql >;
'2021-06-23T08:02:30Z UTC [ db=dev LOG: SELECT per_commit_info.day AS day, COUNT(*) AS commits,
and I want to eventually get a data store that looks like this:
[
{
'timestamp': '2021-06-23T08:02:08Z UTC',
'db': 'dev',
'query': 'LOG: BEGIN;',
},
{
'timestamp': '2021-06-23T08:02:08Z UTC',
'db': 'dev',
'query': 'LOG: <Extremely long query string',
},
]
Some of the problems here are that the queries can be multiline and so newlines are not nec
So I have a regex pattern that looks like this:
"(?P<query_date>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z UTC) \[ db=(?P<db>\w*) LOG:(?P<query_text>.*)",
which I think is close to right. How do I use this to capture all of the matching groups in this file. Can anyone help with this code?
Is the code something like this:
import re
pattern = re.compile(<my pattenr>)
for i, line in enumerate(open(<my file>)):
for match in re.finditer(pattern, line):
<add matching group to empty array after making a dictionary>
Is it something like that? One thing to note is that some of the queries do not end in a semi-colon!