Vcard parser with Python

Question

I am parsing my vcard info (copied to a txt file)to extract name:number and put it into a dictionary.

Data sample:

BEGIN:VCARD  
VERSION:2.1  
N:MEO;Apoio;;;  
FN:Apoio MEO  
TEL;CELL;PREF:1696  
TEL;CELL:162 00  
END:VCARD  
BEGIN:VCARD  
VERSION:2.1  
N:estrangeiro;Apoio MEO;no;;  
FN:Apoio MEO no estrangeiro  
TEL;CELL;PREF:+35196169000  
END:VCARD

import re
file = open('Contacts.txt', 'r')
contacts = dict()

    for line in file:
            name = re.findall('FN:(.*)', line)
            nm = ''.join(name)

            if len(nm) == 0:
                continue
            contacts[nm] = contacts.get(nm)
    print(contacts)

With this I am getting a dictionary with names but for numbers I am getting None. {'name': None, 'name': None}.

Can I do this with re? To extract both name and number with the same re.findall expression?

contacts[nm] = contacts.get(nm) of course this gives you None.. — wong2
– wong2, Commented Mar 6, 2016 at 10:50
Perhaps if you posted a portion of the contents of 'contacts.txt' (with personal data removed, of course), you might get more helpful responses. — PaulMcG
– PaulMcG, Commented Mar 6, 2016 at 17:57
At the point you call contacts.get(nm), nm is not in the dict so it returns None. The question, then, is what values do you want in the contacts dict? — tdelaney
– tdelaney, Commented Mar 7, 2016 at 18:55
I'm getting to the point @tdelaney I want to put in my dict the data like name:number — Paulo Vitorino
– Paulo Vitorino, Commented Mar 9, 2016 at 8:22

zmo · Accepted Answer · 2023-05-15 23:27:05Z

28

Edit 2023: the vobject library hasn't been updated since 2018, I do not recommend to work with that one, as I believe there should be more modern and better alternatives. Though, I do not know that better one, so please if you have any recommendation, tell it in a comment.

You should better use an already existing library instead of trying to reinvent the wheel:

pip install vobject

And then within python

>>> import vobject
>>> s = """\
... BEGIN:VCARD
... VERSION:2.1
... N:MEO;Apoio;;;
... FN:Apoio MEO
... TEL;CELL;PREF:0123456789
... TEL;CELL:0123456768
... END:VCARD
... BEGIN:VCARD
... VERSION:2.1
... N:estrangeiro;Apoio MEO;no;;
... FN:Apoio MEO no estrangeiro
... TEL;CELL;PREF:+0123456789
... END:VCARD """
>>> vcard = vobject.readOne(s)
>>> vcard.prettyPrint()
 VCARD
    VERSION: 2.1
    TEL: 1696
    TEL: 162 00
    FN: Apoio MEO
    N:  Apoio  MEO

and you're done!

so if you want to make a dictionary out of that, all you need to do is:

>>> {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
{'Apoio MEO': ['1696', '162 00']}

so you could make all that into a function:

def parse_vcard(path):
    with open(path, 'r') as f:
        vcard = vobject.readOne(f.read())
        return {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }

From there, you can improve the code to handle multiple vcards in a single vobject file, and update the dict with more phones.

N.B.: I leave you as an exercise to change the code above from reading one and only one vcard within a file, into a code that can read several vcards. Hint: read the documentation of vobject.

N.B.: I'm using your data, and I'm considering that whatever you wrote, it is meaningless. But in doubt, I have modified the phone numbers.

just for the fun, let's have a look at your code. First there's an indentation issue, but I'll consider this is because of bad copy/paste ☺.

① import re
② file = open('Contacts.txt', 'r')
③ contacts = dict()

④ for line in file:
⑤     name = re.findall('FN:(.*)', line)
⑥     nm = ''.join(name)

⑦     if len(nm) == 0:
⑧         continue
⑨     contacts[nm] = contacts.get(nm)

⑩ print(contacts)

so first, there are two issues at line ②. You're opening a file using open(), but you're not closing the file. If you're calling this function to open one billion files, you'll starve your system's available file descriptors because you're not closing the files. As a good habit you should always use instead the with construct:

with open('...', '...') as f:
    … your code here …

that takes care of the fd for you, and better shows where you can make use of your opened file.

The second issue is that you're calling your variable file, which is shadowing the file type. Hopefully, the file type is very rarely used, but it's a bad habit to have, as you might one day not understand a bug that happens because you've shadowed a type with a variable. Just don't use it, it'll save you trouble one day.

Line ⑤ and ⑥, you're applying a re.findall regex on each line. You should better use re.match(), as you're already iterating over each line, and you won't have FN: something within that line. That will make you avoid the unnecessary ''.join(name) But instead of using a regex for such a simple thing, you'd better use str.split():

if 'FN:' in line:
    name = line.split(':')[-1]

Line ⑦ is not only superfluous — if you use the if above, but actually wrong. Because then you'll skip all lines that does not have FN: within it, meaning that you'll never extract the phone numbers, just the name.

Finally Line ⑧ makes absolutely no sense. Basically, what you're doing is equivalent of:

if nm in contacts.keys():
    contacts[nm] = contacts[nm]
else:
    contacts[nm] = None

All in all, in your code, all you do is extract names, and you don't even bother with the telephones number. So when you say:

With this I am getting a dictionary with names but for numbers I am getting None

it makes no sense, as you're actually not trying to extract phone numbers.

Can I do this with re? To extract both name and number with the same re.findall expression?

yes, you could, with something that would look like (untested regex that's very likely to be not working), over the whole file, or at least for each vcard:

FN:(?P<name>[^\n]*).*TEL[^:]*:(?P<phone>[^\n])

but why bother, when you've got a lib that does it perfectly for you!

edited May 15, 2023 at 23:27

answered Mar 7, 2016 at 18:55

zmo

24.9k4 gold badges58 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Paulo Vitorino Over a year ago

thanks for your time @zmo ! Your answer is excelent! At this time I'm trying to get all the vcards. Thanks for your analysis of my code. I'm starting to study this data structures of Python... I have to go deeper in dictionaries

Uwe Kleine-König Over a year ago

it must be vcard.prettyPrint() (i.e. with brackets) in the code listing

zmo Over a year ago

thanks, and fixed! (though, technically, those are parenthesis not brackets ;-) )

SanthoshSolomon Over a year ago

I have used this answer for my project and it is working like a charm! Thanks! By the way I have modified the part where vCard data has been changed into dict. vcard_values = {i: vcard.contents[i][0].value for i in vcard.contents}

Mykola Vasilaki Over a year ago

Well. I got just one issue with vobject: looks like it don't understand version 2.1, just 3.0. That's why TEL;WORK;VOICE:XXX and TEL;CELL;VOICE:YYY both are just tel, but not tel.work and tel.cell. I see workaround only change string from TEL;WORK to TEL;TYPE=WORK and than access type by v.tel.type_param.

|

Schoenix · Accepted Answer · 2018-02-02 10:18:24Z

14

My answer is based on zmos answer (you need to install vobject).

To get all vobjects from a vcf-file you can do something like this:

import vobject
with open(infile) as inf:
    indata = inf.read()
    vc = vobject.readComponents(indata)
    vo = next(vc, None)
    while vo is not None:
        vo.prettyPrint()
        vo = next(vc, None)

The documentation of vobject (on GitHub) is a little bit crappy so I looked into their code and figured out that readOne is just calling a next on readComponents. So you can use readComponents to get a collection.

edited Feb 2, 2018 at 10:18

answered Dec 30, 2017 at 16:45

Schoenix

3112 silver badges4 bronze badges

1 Comment

ykoavlil Over a year ago

When using this example I get an error. Most likely it is related to the fact that I have Cyrillic. Perhaps you have recommendations on how to fix this?

raise ParseError("Failed to parse line: {0!s}".format(line), lineNumber) vobject.base.ParseError: At line 19: Failed to parse line: =D0=BD=D0=B8=D1=8F;;;

Collectives™ on Stack Overflow

Vcard parser with Python

2 Answers 2

7 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related