I have a text file that contain a bunch of websites.
text = '"wadouri:https:\/\/dev.pluginslab.com\/dicomviewer\/wp-content\/plugins\/pl-dicom-viewer-amazon-s3\/assets\/cases\/8255\/20191209113141\/sagittal-00000001.dcm","wadouri:https:\/\/dev.pluginslab.com\/dicomviewer\/wp-content\/plugins\/pl-dicom-viewer-amazon-s3\/assets\/cases\/8255\/20191209113141\/sagittal-00000002.dcm","wadouri:https:\/\/dev.pluginslab.com\/dicomviewer\/wp-content\/plugins\/pl-dicom-viewer-amazon-s3\/assets\/cases\/8255\/20191209113141\/sagittal-00000003.dcm", etc'
I was able to extract each website into a list
However there are '/' character in my list I cant seem to remove.
could some one tell me where I got it wrong
Thanks
import re
import bs4 as bs
import urllib.request
import os
myfile = open('C:/test/test.txt', 'r')
regex = re.compile(r'(?<=https).*?(?=dcm)')
dcm =[]
for line in myfile:
matches = regex.findall(line)
for m in matches:
dcm.append (str('https' + m + 'dcm'))
for d in dcm:
d.replace('/','')
print(d)
print(d.replace('/',''))ord = d.replace('/','')jsonmodule to parse it, not do your own string processing.