0

I've been sent data in a horrible format, consisitng of a series of points with subsequent attributes. Each point is bounded by square brackets [] but is currently a string type.

I've tried the standard list() method to convert the string, however this puts the entire string, consisting of many points, into a list. I want to use the existing square brackets within the string to be recognised as lists, rather than creating an over-arching list containing one item.

The data in string type looks like below. Below is just one group of points and I have hundreds to iterate through; the double opening and closing square brackets at the beginning and end signify a group.

[[451166.32,719761.36,20.37,0.06,],[451162.97,719765.06,20.41,0.048,1],[451161.63,719766.54,10.17,0.048,],[451158.26,719770.23,20.44,0.048,],[451156.19,719772.54,20.05,0.048,0],[451148.7,719780.68,-10.77,0.048,],[451138.57,719791.95,-10.2,0.048,],[451129.33,719802.15,-10.38,0.048,],[451118.07,719814.56,10.06,0.048,],[451105.98,719827.91,-10.64,0.048,],[451095.10,719839.91,-10.47,0.048,],[451087.17,719848.66,-10.72,0.048,],[451082.94,719853.31,10.92,0.048,0],[451078.,719858.77,2.75,10.048,],[451076.79,719860.10,5.2,10.06,1]]

I've tried list(xsData.split(",")), [i.strip("[],").split(",") for i in myList] and a couple of other methods, but all are either putting the string into a over-arching list, or putting each character into its own list.

The end goal is to be able to iterate through each item in every list, in order to write the data to a friendlier format, e.g. TXT/CSV.

Edit: ast.literal.eval() works with all groups of points except the group below, throwing an invalid syntax error. I cannot see the reason why. [[455972.1700000000128057,786651.7399999999906868,44.4499999999999993,0.045,],[455976.5700000000069849,786652.7800000000279397,10.2899999999999991,4.04,1],[455977.7000000000116415,786653.0500000000465661,12.8300000000000001,1.04,],[455979.0499999999883585,786653.3699999999953434,2.8800000000000008,0.04,],[455979.6900000000023283,786653.5200000000186265,3.4199999999999999,5.04,],[455983.9299999999930151,786654.5200000000186265,9.75,0.04,],[455990.8900000000139698,786656.1700000000419095,0.8499999999999996,0.04,],[455993.5100000000093132,786656.7900000000372529,0.4100000000000001,0.04,],[455993.7900000000081491,786656.8499999999767169,0.3300000000000001,0.04,],[455994.8699999999953434,786657.1099999999860302,4.5199999999999996,0.04,],[455997.0499999999883585,786657.6300000000046566,4.6100000000000003,0.04,],[455997.5899999999965075,786657.75,4.8600000000000003,0.04,],[455998.7099999999918509,786658.0200000000186265,1.0099999999999998,0.045,1],[456000.3200000000069849,786658.4000000000232831,1.3699999999999992,0.045,],[456002.2799999999988358,786658.8599999999860302,17.6400000000000006,0.045,],[456006.2900000000081491,786659.8100000000558794,14.8100000000000005,0.045,],[456009.5899999999965075,786660.5899999999674037,10.4399999999999995,,],[456017.0,786662.3499999999767169,19.1099999999999994,,]]

1
  • Would you ever have group of group, i.e. [[[xxx,yyy],[...]][[,...][...]]]? Commented Feb 15, 2019 at 15:22

1 Answer 1

4

If the string looks like a syntactically valid Python list, then you can get that list data by calling ast.literal_eval:

>>> import ast
>>> s = "[[451166.32,719761.36,20.37,0.06,],[451162.97,719765.06,20.41,0.048,1],[451161.63,719766.54,10.17,0.048,],[451158.26,719770.23,20.44,0.048,],[451156.19,719772.54,20.05,0.048,0],[451148.7,719780.68,-10.77,0.048,],[451138.57,719791.95,-10.2,0.048,],[451129.33,719802.15,-10.38,0.048,],[451118.07,719814.56,10.06,0.048,],[451105.98,719827.91,-10.64,0.048,],[451095.10,719839.91,-10.47,0.048,],[451087.17,719848.66,-10.72,0.048,],[451082.94,719853.31,10.92,0.048,0],[451078.,719858.77,2.75,10.048,],[451076.79,719860.10,5.2,10.06,1]]"
>>> x = ast.literal_eval(s)
>>> type(x)
<class 'list'>
>>> x
[[451166.32, 719761.36, 20.37, 0.06], [451162.97, 719765.06, 20.41, 0.048, 1], [451161.63, 719766.54, 10.17, 0.048], [451158.26, 719770.23, 20.44, 0.048], [451156.19, 719772.54, 20.05, 0.048, 0], [451148.7, 719780.68, -10.77, 0.048], [451138.57, 719791.95, -10.2, 0.048], [451129.33, 719802.15, -10.38, 0.048], [451118.07, 719814.56, 10.06, 0.048], [451105.98, 719827.91, -10.64, 0.048], [451095.1, 719839.91, -10.47, 0.048], [451087.17, 719848.66, -10.72, 0.048], [451082.94, 719853.31, 10.92, 0.048, 0], [451078.0, 719858.77, 2.75, 10.048], [451076.79, 719860.1, 5.2, 10.06, 1]]

I'm not completely sure, but it sounds like your string might actually look like multiple lists concatenated together, in which case you can't just call literal_eval on it:

>>> import ast
>>> s = "[1,2][3,4,[[5,6],7]][8,9]"
>>> ast.literal_eval(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Programming\Python 3.6\lib\ast.py", line 85, in literal_eval
    return _convert(node_or_string)
  File "C:\Programming\Python 3.6\lib\ast.py", line 84, in _convert
    raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Subscript object at 0x02E25650>

If this is the case, you can separate your data into individual groups so you can eval them independently.

import ast

def separate_groups(s):
    """finds matching square brackets within `s` and yields successive portions that resemble valid list literals.
    note: may not operate correctly on data that contains quoted brackets, for example `"[1, '[', 2][3,4]"`
    """
    depth = 0
    last_seen_group_end = -1
    for i,c in enumerate(s):
        if c == "[":
            depth += 1
        elif c == "]":
            depth -= 1
            if depth == 0:
                yield s[last_seen_group_end+1: i+1]
                last_seen_group_end = i

s = "[1,2][3,4,[[5,6],7]][8,9]"
result = [ast.literal_eval(group) for group in separate_groups(s)]
print(result)

Result:

[[1, 2], [3, 4, [[5, 6], 7]], [8, 9]]
Sign up to request clarification or add additional context in comments.

2 Comments

I forgot to mention I had tried ast.literal_eval() (I'm picking this up again after a couple of weeks). With a bit more playing, this works for all bar one group of points; for this one group, it throws an invalid syntax error and I cannot figure out the reason why as it looks no different to the others. Too long to post in comment but I'll edit OP.
Hmm, could be because of the ,,] at the end of some of the lines. [1,2,] is legal list syntax, but [1,2,,] isn't. I guess the quick fix would be to run s = s.replace(",,", ",") before trying to parse it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.