I am trying to read a large text file, containing variable names and corresponding values (see below for small example). Names are all upper case and the value is usually separated by a periods and whitespaces, but if the variable name is too long it is separated by only whitespaces.
WATER DEPTH .......... 20.00 M TENSION AT TOUCHDOWN . 382.47 KN
TOUCHDOWN X-COORD. ... -206.75 M BOTTOM SLOPE ANGLE ... 0.000 DEG
PROJECTED SPAN LENGTH 166.74 M PIPE LENGTH GAIN ..... 1.72 M
I am able to find the values using the following expression:
line = ' PROJECTED SPAN LENGTH 166.74 M PIPE LENGTH GAIN ..... 1.72 M \n'
re.findall(r"[-+]?\d*\.\d+|\d+", line):
['166.74', '1.72']
But when I try to extract the variable names, using below expression I have leading and trailing whitespaces which I would like to leave out.
re.findall('(?<=\s.)[A-Z\s]+', line)
[' PROJECTED SPAN LENGTH ', ' PIPE LENGTH GAIN ', ' ', ' \n']
I believe it should have something like ^\s, but I can't get it to work. When successful I'd like to store the data in a dataframe, having the variable names as indices and the values as column.
r'[A-Z]+(?:\s+[A-Z]+)*'