How to concatenate a list with a nested list?

Question

I have two lists:

The first one is a regular list which contains links of Sitemaps:

ur = ['https://www.hi.de/hu/sitemap.xml', 
      'https://www.hi.de/ma/sitemap.xml', 
      'https://www.hi.de/au/sitemap.xml', 
      ]

The second list is nested and contains links which were indexed on the sitemaps and a date for every link:

wh = [['No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],        
      ['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], 
      ['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], 
      ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]

Now I want to merge the list with the nedted list based on the sitmap they came from like this:

ui = [['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],        
      ['https://www.hi.de/ma/sitemap.xml' '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], 
      ['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], 
      ['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]

But with my code:

ui = [[(url2, x) for url2 in ur for x in y if url2.rsplit('/', 1)[0] in x] for y in wh]

The date in every sublist gets deleted and additionally the entries are stored in a tuple like this:

...
[[('https://www.hi.de/hu/sitemap.xml', 'https://www.hi.de/hu/artikel/xxx', '')],
...

How can I change the code to get the desired result in the variable ui?

Austin · Accepted Answer · 2019-11-18 13:33:25Z

5

You can use a list comprehension that checks for the matching sitemap between two lists to get your desired result:

ur = ['https://www.hi.de/hu/sitemap.xml', 
      'https://www.hi.de/ma/sitemap.xml', 
      'https://www.hi.de/au/sitemap.xml', 
      ]

wh = [['No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],        
      ['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], 
      ['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], 
      ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]

print([[[u] + x] for x in wh for u in ur if x[1].split('/')[3] == u.split('/')[3]])

which outputs:

[['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
 ['https://www.hi.de/ma/sitemap.xml' '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
 ['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
 ['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]

answered Nov 18, 2019 at 13:33

Austin

26.1k4 gold badges28 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

gython Over a year ago

Hi @Austin great solution! However it does not work if my list of sitemaps and urls is longer.

Austin Over a year ago

@gython, I assume sitemap always is at 3rd split of /. Please give an example of wrong case.

gython Over a year ago

Hi @Austin, for example I am using a list with 11 sitemaps and a nested list with several thousand links. When I use your code for this case I am gettting the error: IndexError: list index out of range. You are right, sitemap is always at 3rd split of /, but my problem is that my lists have more entries. Can you help me out?

Austin Over a year ago

Can you make sure wh is a list with each list in it has 2 elements and the second element of each list has a 3rd split, and same for ur list (has a 3rd split)? It seems either of the above cases is not true.

Austin Over a year ago

For easy debugging, convert this list comprehension to normal for loops and print element from wh and ur inside. Somewhere inside you will get IndexError; the element after just printed is the culprit.

|

Ajax1234 · Accepted Answer · 2019-11-18 13:33:48Z

You can transform ur to a dictionary for easier lookup:

import re
ur = ['https://www.hi.de/hu/sitemap.xml', 'https://www.hi.de/ma/sitemap.xml', 'https://www.hi.de/au/sitemap.xml']
data = [['No-Date', 'https://www.hi.de/hu/artikel/xxx'], ['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
d = dict((re.split('/(?=sitemap\.)', i)[0], i) for i in ur)
result = [[d[re.split('/(?=\w{3,}/)', b)[0]], a, b] for a, b in data]

Output:

[['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx'], 
['https://www.hi.de/ma/sitemap.xml', '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], 
['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], 
['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']]

Filip Młynarski · Accepted Answer · 2019-11-18 13:43:09Z

2

You could combine elements of your list with double for loop, unpack values of second list using *-operator, and save them all using list comprehension.

ui = [
    [i, *j] 
    for i in ur for j in wh 
    if i.split('/')[3] == j[1].split('/')[3]
]

print(ui)

Output:

[
    ['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''],
    ['https://www.hi.de/ma/sitemap.xml', '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'],
    ['https://www.hi.de/ma/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx'],
    ['https://www.hi.de/au/sitemap.xml', '2019-11-11', 'https://www.hi.de/au/artikel/xxx']
]

edited Nov 18, 2019 at 13:43

answered Nov 18, 2019 at 13:30

Filip Młynarski

3,6221 gold badge12 silver badges23 bronze badges

1 Comment

Karl Anka Over a year ago

What happened to this row ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']? I think OP wants a lookup to sitemap, not just merge the lists.

han solo · Accepted Answer · 2019-11-18 13:32:10Z

1

You could do a aimple list comprehension like,

>>> ur
['https://www.hi.de/hu/sitemap.xml', 'https://www.hi.de/ma/sitemap.xml', 'https://www.hi.de/au/sitemap.xml']
>>> wh
[['No-Date', 'https://www.hi.de/hu/artikel/xxx', ''], ['2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-12', 'https://www.hi.de/ma/artikel/xxx'], ['2019-11-11', 'https://www.hi.de/au/artikel/xxx']]
>>> [[u] + w for u,w in zip(ur, wh)]
[['https://www.hi.de/hu/sitemap.xml', 'No-Date', 'https://www.hi.de/hu/artikel/xxx', ''], ['https://www.hi.de/ma/sitemap.xml', '2019-11-13', 'https://www.hi.de/ma/artikel/xxx'], ['https://www.hi.de/au/sitemap.xml', '2019-11-12', 'https://www.hi.de/ma/artikel/xxx']]

answered Nov 18, 2019 at 13:32

han solo

6,6501 gold badge20 silver badges22 bronze badges

Comments

brandonbanks · Accepted Answer · 2019-11-18 15:27:18Z

1

You can also use enumerate.

ui = [[x] + wh[i] for i,x in enumerate(ur)]
print(ui)

Output:

[
    ['https://www.hi.de/hu/sitemap.xml','No-Date','https://www.hi.de/hu/artikel/xxx',''],
    ['https://www.hi.de/ma/sitemap.xml', '2019-11-13','https://www.hi.de/ma/artikel/xxx'],
    ['https://www.hi.de/au/sitemap.xml','2019-11-12','https://www.hi.de/ma/artikel/xxx']
]

edited Nov 18, 2019 at 15:27

answered Nov 18, 2019 at 13:36

brandonbanks

1,3051 gold badge16 silver badges23 bronze badges

1 Comment

Filip Młynarski Over a year ago

enumerate is useless there, you're not using x, you should use range if you just want the index, but simply iterating over our list is best option in this example.

Daniel Marchand · Accepted Answer · 2019-11-18 13:30:15Z

0

Try using a zip:

[(x[0],x[1][0],x[1][1]) for x in zip(ur, wh)]

answered Nov 18, 2019 at 13:30

Daniel Marchand

6441 gold badge10 silver badges31 bronze badges

Collectives™ on Stack Overflow

How to concatenate a list with a nested list?

6 Answers 6

12 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

12 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related