1

I have written a fairly basic system monitor for my router to track when the signal is dropping (and all the stats that have occurred at that time) as the very excellent routerstatslite doesn't gather everything that I need.

Here's the Gist, but I want to sanitize the data before I upload it to loggly so I can remove the db and mbps suffixes as necessary

https://gist.github.com/scottharman/6ca07a7c46ca09de3e3b2f0a5094d86e

script =  stats.findAll('script')[1]
pattern = re.compile('(\w+)="(.*?)Mbps\|dB"')
fields = dict(re.findall(pattern, script.text))
clean_fields = { k:v.strip() for k, v in fields.iteritems()}
if old_fields != clean_fields:
    logger.info(json.dumps(clean_fields))
old_fields = clean_fields
print clean_fields
sleep(5)

As I'm putting it straight into a dict, I want to discard Mbps or dB when found, but obviously what I've got isn't going to work. It's tidier if I can simply remove the two strings from the 70-80 odd status lines that I've got when extracting the fields, but is it just not possible?

Cheers

Sample input from script tag:

var conn_down="    13.35 Mbps";
var conn_up="     0.82 Mbps";
var line_down="    34.60 dB";
var line_up="    19.70 dB";
var noise_down="     6.10 dB";
var noise_up="     6.50 dB";

var sys_uptime="74523";
var lan_status="Link up";
var lan_txpkts="1294024";
var lan_rxpkts="2256747";
var lan_collisions="0";
var lan_txbs="10004";
var lan_rxbs="35259";
var lan_systime="74523";

Then the processed data looks like this:

u'noise_up': u'6.50 dB', u'lan_rxbs': u'35259', u'an_rxpkts': u'2857867', u'bgn_status': u'600M', u'lan_status0': u'100M/Full', 
u'lan_status3': u'1000M/Full', u'lan_status2': u'100M/Full', u'conn_up': u'0.82 Mbps',
4
  • Could you provide example input and expected output? Commented Jan 30, 2017 at 2:38
  • Hi @niemmi - have just added that for clarity Commented Jan 30, 2017 at 2:49
  • And you'd like the output to look like u'noise_up': u'6.50', u'conn_up': u'0.82', ...? Commented Jan 30, 2017 at 2:51
  • obviously what I've got isn't going to work. Not obvious. What is the actual problem here? Commented Jan 30, 2017 at 2:59

2 Answers 2

1

You could use optional non-capturing group to match ' Mbps' or ' dB':

import re
import pprint

s = '''var conn_down="    13.35 Mbps";
var conn_up="     0.82 Mbps";
var line_down="    34.60 dB";
var line_up="    19.70 dB";
var noise_down="     6.10 dB";
var noise_up="     6.50 dB";

var sys_uptime="74523";
var lan_status="Link up";
var lan_txpkts="1294024";
var lan_rxpkts="2256747";
var lan_collisions="0";
var lan_txbs="10004";
var lan_rxbs="35259";
var lan_systime="74523";'''

pattern = re.compile(r'(\w+)=\"\s*(.*?)(?:\sMbps|\sdB)?\"')
fields = dict(re.findall(pattern, s))
pprint.pprint(fields)

Output:

{'conn_down': '13.35',
 'conn_up': '0.82',
 'lan_collisions': '0',
 'lan_rxbs': '35259',
 'lan_rxpkts': '2256747',
 'lan_status': 'Link up',
 'lan_systime': '74523',
 'lan_txbs': '10004',
 'lan_txpkts': '1294024',
 'line_down': '34.60',
 'line_up': '19.70',
 'noise_down': '6.10',
 'noise_up': '6.50',
 'sys_uptime': '74523'}

In above (\w+)= captures one or more alphanumeric characters followed by =. \"\s* matches quotation mark followed by zero or more whitespace. (.*?) captures non-greedily any text and (?:\sMbps|\sdB)? is optional non-capturing group that matches ' Mbps' or ' dB'. See regex101 demo.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes! That's the syntax I was looking for! I couldn't work out how to get that working as it was throwing matching nothing every time I tried it - so assumed it was simply never going to work. Many thank!
1

Try changing your pattern to:

pattern = re.compile('(\w+)="\s*(.+?)\s*(?:Mbps|dB)?"')

I think that will work, if I'm correctly understanding what you want. It's basically what you have now, but with a non-capturing section for the "Mbps/dB" so the units won't be included in the match.

2 Comments

Yeah sort of - I wanted to capture the Link status too, as I'm going to be dumping a rolling log to dropbox as well as to loggly so I can have something to send to my telco if the dropouts continue to get worse
@Scott Oh, sorry! I totally missed the fact that you were trying to capture non-number data for the value. I updated my answer, but it seems like niemmi already gave you want you need. Good luck with your project.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.