Global Constants
block_names = ("object", "region", "info", "table ")
keywords = ("define", "select", "reject", "take", "using", "sort", "weight", "histo")
table_k_v = "# val err- err+ xmin xmax"
These should be defined in UPPER_CASE, to indicate they are constants.
Suspicious value
In block_names = ("object", "region", "info", "table "), the final value has a trailing space, none of the other values do. This is suspicious.
Reading the code, I see the test line.startswith(block_names), and reviewing the sample file, I see the word tabletype, so I'm guessing you've used "table " with the trailing space to avoid accidentally matching tabletype. However, this is confusing and fragile. A new term like "regional" could accidentally match, and a change in white space, such as using a tab character (\t) instead of a space would prevent the correct match in "table\tbtagdeepCSVmedium.
Close resources
When opening resources, use a with statement, to ensure the resource is closed. You never explicitly call f.close() in your code, so the resource may be held open for much longer than you intend. However, you should prefer using the with-syntax to avoid the need to call f.close() explicitly.
# Read the ADL file
with open(adl_file, 'r') as f:
...
Multiple ways to parse blank lines
if line in ['\n', '\r\n']:
if parsing_block:
json_parsed[block_type].append(block_data)
block_data = {}
parsing_block = False
parsing_table = False
else:
continue
line = line.strip()
if not line:
continue
If a blank line, then if in a parsing block you do some work and then proceed to the line = line.strip(). If not in a parsing block, you continue the loop. After the blank line check, you strip white space from the start/end, and if that results in a blank line, you continue without checking if you are in a parsing block.
Uh ... so two line which are visually the same are handled differently if they contain spaces or tabs?
Plus you later have to handle "when new block starts without an empty line" anyway.
It would be simpler to strip whitespace, and then check for empty/blank lines, and rely on the existing "when new block starts without an empty line" code:
line = line.strip()
if not line:
continue
Comments
if line.startswith("#"):
continue
This happens before stripping off leading white-space, so indented comments would still be processed.
From the ADL definition you linked to, comments can also appear at end of lines:
keyword3 value3 # comment about value3
As long as you can't have any # character embedded inside quoted string, it would simplify later parsing if these were removed. Since removing the comment can turn a non-blank line into a blank line, let's strip off comments first, then check for blank lines:
if '#' in line:
line = line[:line.index('#')]
line = line.strip()
if not line:
continue
Splitting on Spaces
values = list(filter(None, line.split(" ")))
It took me a while to figure out what you were trying to do here, but I finally got it. You are splitting a line on the space character, which is giving you a bunch of empty strings if terms are separated by multiple spaces. The filter(None, ...) produces a generator that removes falsy empty strings, which you then turn into a list.
This does practically the same thing:
values = line.split()
It is better, actually, since it will treat any combination of one or more white-space characters as one separator, allowing the tables to use tabs to line things up nicely.
Last element of list
This bugged me ... keyword_value[-1] ... until I realized what you were doing with .split(" "). Using .split() fixes that issue, and you can safely use keyword_value[1] to reference the second item in the line, even if multiple spaces exist between the first and second items.
Exception Handling
for line in f:
...
try:
...
else:
raise ValueError('Parsing of Keyword {} is not supported.'.format(keyword_value[0]))
except Exception:
block_data = {}
parsing_block = False
parsing_table = False
Catching any Exception is bad. If you make an error, like using keyword_val instead of keyword_value, or using a bad list index, the exception (NameError, IndexError) is caught, and your code will blank out the current parsing and muddle-on without ever telling you what happened. Yikes!
You are explicitly raising a ValueError. You should explicitly catch that, and only that.
However, you've only got one raise statement, which is followed immediately by the catch clause. You could eliminate the entire try...except, and move the cleanup code into the else: statement.
You might want to print out the unrecognized keywords too. You might think you've coded for them all, but you might not have.
(To be continued)