Python split string to multiple substrings with single quotations and a trailing comma

Question

I am trying to split a string on multiple lines of a csv into three substrings, which I need to remain on the same line while also adding single quotation marks on sub-string 2 and 3 on the line followed by a comma.

The lines in the csv be in the following format:

12345678/ABCDE.pdf
12345678/ABCDE.pdf
12345678/ABCDE.pdf

As I am new to Python, I have tried a split on the lines which returns the first two sub-strings without the / but I am not sure how to obtain the final desired output.

'12345678', 'ABCDE.pdf'

I would like the output to look like the below

12345678,'/ABCDE.pdf','ABCDE',
12345678,'/ABCDE.pdf','ABCDE',
12345678,'/ABCDE.pdf','ABCDE',

with the final string containing the title of the pdf without the file extension.

Any help would be greatly appreciated.

So split it again...

Eugene Sh.
– Eugene Sh.

2017-08-15 16:19:14 +00:00
Commented Aug 15, 2017 at 16:19 — Eugene Sh.
– Eugene Sh., Commented Aug 15, 2017 at 16:19

tdube · Accepted Answer · 2017-08-15 16:38:35Z

Using split again, you can easily construct the desired output string without the need for regex.

In [22]: %%timeit
    ...: s = '''12345678/ABCDE.pdf
    ...: 12345678/ABCDE.pdf
    ...: 12345678/ABCDE.pdf'''
    ...: for l in s.splitlines():
    ...:     s_parts = l.split('/')
    ...:     new_s = '{},\'/{}\',\'{}\','.format(s_parts[0], s_parts[1], s_parts[1].split('.')[0])
    ...:
100000 loops, best of 3: 3.55 µs per loop

Output:

Out[24]: "12345678,'/ABCDE.pdf','ABCDE',"

For comparison, the regex solution posted which also works fine has the following runtime performance. The performance delta here is not too significant, but with a large number of items to process, it could be a factor.

In [25]: %%timeit
    ...: s = ["12345678/ABCDE.pdf",
    ...:       "12345678/ABCDE.pdf",
    ...:       "12345678/ABCDE.pdf"]
    ...: new_s = [[re.findall("\d+", i)[0], "/"+i.split("/")[-1], re.findall("[A
    ...: -Z]+", i)[0]] for i in s]
    ...:
100000 loops, best of 3: 11.6 µs per loop

Ajax1234 · Accepted Answer · 2017-08-15 16:24:47Z

0

You can use re.split() and re.findall():

s = ["12345678/ABCDE.pdf",
      "12345678/ABCDE.pdf",
      "12345678/ABCDE.pdf"]
new_s = [[re.findall("\d+", i)[0], "/"+i.split("/")[-1], re.findall("[A-Z]+", i)[0]] for i in s]

Output:

[['12345678', '/ABCDE.pdf', 'ABCDE'], ['12345678', '/ABCDE.pdf', 'ABCDE'], ['12345678', '/ABCDE.pdf', 'ABCDE']]

answered Aug 15, 2017 at 16:24

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Collectives™ on Stack Overflow

Python split string to multiple substrings with single quotations and a trailing comma

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related