0

I'm receiving the following error in my programm:

Traceback (most recent call last):
  File "bookmarks.py", line 26, in <module>
    zipping = dict(zip(datelist, matchhref))
TypeError: unhashable type: 'list'

I want to make dictionary from two lists (datelist and matchhref), but somehow when I use zip(), it returns list instead of tuple.

Here's my code:

import re

bm_raw = open('bookmarks.txt', 'r')

bm_line = bm_raw.read()

matchhref = re.findall('(<DT><A HREF=".*?</A>)', bm_line)
massive = list()
datelist = list()
a = 0

for i in matchhref:

    temp = matchhref[a]
    found = re.findall('(\d\d\d\d\d\d\d\d\d\d)', temp)
    datelist.append(found)
    a=a+1

print datelist
print matchhref
zipping = dict(zip(datelist, matchhref))

And here's contents of bookmarks.txt:

 <DT><A HREF="some random data" ADD_DATE="1460617925" ICON="some random data">priomap</A>
 <DT><A HREF="some random data" ADD_DATE="1455024833" ICON="some random data">V.34</A>
3
  • 6
    First off don't use regex to parse html, why your code fails is findall returns a list which you append to a list and then try to use as a key, if you want a single element use re.search and call .group and append that Commented Jun 3, 2016 at 23:12
  • 2
    Why not just findall('\d{10}') on each line of the file? Commented Jun 3, 2016 at 23:16
  • 2
    Obligatory - Don't parse html/xml with regex Commented Jun 3, 2016 at 23:18

2 Answers 2

1

As I commented, you can call re.search and then .group() the add the string and not the list that findall returns so you can use the string as the key but BeautifulSoup will make your life a lot easier:

In [50]:from bs4 import BeautifulSoup, Tag

In [51]: soup = BeautifulSoup(h,"xml")

In [52]: print(dict((dt["ADD_DATE"], dt["HREF"],) for dt in soup.select("DT A[HREF]")))
{u'1455024833': u'some random data', u'1460617925': u'some random data'}

select("DT A[HREF]") finds all the anchor tags i.e A inside a DT tag that have a HREF attribute.

The regex solution would be:

    found = re.search('(\d\d\d\d\d\d\d\d\d\d)', temp)     
    datelist.append(found.group())

But use a html parser like bs4 or something similar.

Sign up to request clarification or add additional context in comments.

Comments

1

zip returns a list of tuples, not a tuple.

Besides, a tuple is only hashable if each of its elements are hashable. So a tuple of lists will not be hashable either.

That said, there's nothing wrong with dict(zip(keys, values)) if keys is a list of hashable elements. Your problem is that datelist contains lists (results of re.findall) which are not hashable and cannot be used as dict keys.

But really, read the advice given by others and don't use re to parse HTML. BeautifulSoup is my preferred tool.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.