1

I'm writing some script which capture data from web site and save them into DB. Some of datas are merged and I need to split them. I have sth like this

Endokrynologia (bez st.),Położnictwo i ginekologia (II st.)

So i need to get:

Endokrynologia (bez st.)
Położnictwo i ginekologia (II st.)

So i wrote some code in python:

 #!/usr/bin/env python
# -*- encoding: utf-8

import MySQLdb as mdb
from lxml import html, etree
import urllib
import sys
import re

Nr = 17268
Link = "http://rpwdl.csioz.gov.pl/rpz/druk/wyswietlKsiegaServletPub?idKsiega="

sock = urllib.urlopen(Link+str(Nr))  
htmlSource = sock.read()                             
sock.close()
root = etree.HTML(htmlSource)
result = etree.tostring(root, pretty_print=True, method="html")
Spec = etree.XPath("string(//html/body/div/table[2]/tr[18]/td[2]/text())")
Specjalizacja = Spec(root)
if re.search(r'(,)\b', Specjalizacja):
    text = Specjalizacja.split()
    print text[0]
    print text[1]

and i get:

Endokrynologia
(bez

what i'm doing wrong ?

1
  • 1
    why not just just the BIF split(',')? Commented Apr 22, 2013 at 19:01

1 Answer 1

0

you would try to replace

text = Specjalizacja.split()

with

text = Specjalizacja.split(',')

Don't know whether that would fix your problem.

Sign up to request clarification or add additional context in comments.

1 Comment

I don't know why I did not write .split(',')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.