0

I have a list like below word_list:

[
 [{'bottom': Decimal('58.650'),
   'text': 'Hi there!',
   'top': Decimal('40.359'),
   'x0': Decimal('21.600'),
   'x1': Decimal('65.644')}
 ],
 [{'bottom': Decimal('74.101'),
   'text': 'Your email',
   'top': Decimal('37.519'),
   'x0': Decimal('223.560'),
   'x1': Decimal('300')},
  {'bottom': Decimal('77.280'),
   'text': '[email protected]',
   'top': Decimal('62.506'),
   'x0': Decimal('21.600'),
   'x1': Decimal('140.775')}]
]

As you can see, above consists of a list, with what looks like a nested list. The text of the above can be represented:

[0] = 'Hi there!'
[1] = 'Your Email'
[1] = '[email protected]'

This is my code, that generates the row_list:

word_list = sorted(first_page.extract_words(),
                    key=lambda x: x['bottom'])
threshold = float('10')
current_row = [word_list[0], ]
row_list = [current_row, ]

for word in word_list[1:]:
    if abs(current_row[-1]['bottom'] - word['bottom']) <= threshold:
        # distance is small, use same row
        current_row.append(word)
    else:
        # distance is big, create new row
        current_row = [word, ]
        row_list.append(current_row)

What I am trying to do, is to map the output of above to something like:

new = {
       1: {
          1: {'text': 'Hi there!', 'x0': Decimal('21.600')}
       },
       2: {
          1: {'text':'Your email', 'x0': Decimal('223.560')},
          2: {'text': '[email protected]', 'x0': Decimal('21.600')}
       }
      }

I have tried all sorts of things, and just can't figure it out - as my original word_list is a list, and I am trying to show it as a dict...

4
  • the posted expected result does not correlate much with any "threshold". It's a simple mapping Commented Jun 28, 2019 at 18:17
  • But the sorting and threshold have to be applied to the list, and then after I need to map the values. Commented Jun 28, 2019 at 18:18
  • Can you clarify the sorting you want? You don't mention it in the text, but it's in your code and since the desired output seems to be sorted the same as the input it's not clear if it matters. Commented Jun 28, 2019 at 18:34
  • The sorting only matters in the first loop (the code that I posted), since I use it to check against a threshold. The row_list is sorted but don’t need to be in the final format. Commented Jun 28, 2019 at 18:36

2 Answers 2

2

For succinct code with reliable input, you can use a short recursive function. This will work with multiple levels of nesting (if that's needed):

def nest(l):
    if not isinstance(l, list):
        return {'text': l['text'], 'x0': l['x0']}
    return {i+1:nest(v) for i,v in enumerate(l)}

With your input, it returns:

> pp.pprint(nest(l))

> { 1: {1: {'text': 'Hi there!', 'x0': Decimal('21.600')}},
    2: {1: {'text': 'Your email', 'x0': Decimal('223.560')},
        2: {'text': '[email protected]', 'x0': Decimal('21.600')}
    }
  }
Sign up to request clarification or add additional context in comments.

Comments

1

The could be onelined but it would be nasty:

result = {}
for index in range(len(l)):
    append = {}
    for index2 in range(len(l[index])):
        append[index2 + 1] = {key: val for key, val in l[index][index2].items() if key in ('x0', 'text')}
    result[index + 1] = append

#result = {index + 1: {index2:  for index in range(len(l))}

import json
print(json.dumps(result, indent=2))

Output:

{
  "1": {
    "1": {
      "text": "Hi there!",
      "x0": "21.600"
    }
  },
  "2": {
    "1": {
      "text": "Your email",
      "x0": "223.560"
    },
    "2": {
      "text": "[email protected]",
      "x0": "21.600"
    }
  }
}

Note that it prints the keys as strings but they are actually ints. The json.dumps(...), which I use to print it nicely, turned them to strings.

The one-liner:

result = {index + 1: {index2 + 1: {key: val for key, val in l[index][index2].items() if key in ('x0', 'text')} for index2 in range(len(l[index]))} for index in range(len(l))}

3 Comments

Just curious - why is it nasty?
Extremely unreadable. I will add it to the end of the question.
@oliverbj Added

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.