0

I'm trying to extract some ID's and their status from an xml file and I have reached a point where I have a list of strings containing this info and I just need to extract the codes and the status(pass or fail). The problem is, the strings are extremely messy and I'm a newbie to python so I'm not sure how to do this. The piece of code that needs to be looked at:

l = len(res_smaller)
print(res_smaller)
for field in range(1, l, 2):
    aux = res_smaller[field]
    print(aux)

Output:

<big class="Heading3">1.2 <a name="i__786909744_34">Test Case CODE_2096: RANDOMTEXT</a>: Failed</big>
<big class="Heading3">1.3 <a name="i__786909744_1424">Test Case CODE_2101: RANDOMTEXT</a>: Failed</big>
<big class="Heading3">1.4 <a name="i__786909744_2814">Test Case CODE_2111: RANDOMTEXT</a>: Failed</big>   
<big class="Heading3">1.5 <a name="i__786909744_2850">Test Case CODE_2098: RANDOMTEXT</a>: Failed</big>

I used the BeautifulSoup library to find_all Heading3 classes, parsed a bit more and now I have a list from which a printed the lines that are of interest to me (this is why I used an increment of 2 from 1). My idea is to create a dictionary of the form: CODE_NUMBER: STATUS, but I don't know how to extract from each field these things. My idea was to use aux.split(" ") to split them by the whitespace delimiter, and extract the 5th and 7th element from each field, but this gives me an error so I'm not sure if this is possible in python. Any ideas?

EDIT: Here's the code with the aux.split, I've also added the list printed as a whole:

l = len(res_smaller)
print(res_smaller)

for field in range(1, l, 2):
    aux = res_smaller[field]
    print(aux.split(" "))  

Output:

[<big class="Heading3">1.1 <a name="i__786909744_13">RANDOMTEXT</a>: Passed</big>, <big class="Heading3">1.2 <a name="i__786909744_34">Test Case CODE_2096: RANDOMTEXT</a>: Failed</big>, <big class="Heading3">Main Part of Test Case</big>, <big class="Heading3">1.3 <a name="i__786909744_1424">Test Case CODE_2101: RANDOMTEXT</a>: Failed</big>, <big class="Heading3">Main Part of Test Case</big>, <big class="Heading3">1.4 <a name="i__786909744_2814">Test Case CODE_2111: RANDOMTEXT</a>: Failed</big>, 
<big class="Heading3">Main Part of Test Case</big>, <big class="Heading3">1.5 <a name="i__786909744_2850">Test Case CODE_2098: RANDOMTEXT</a>: Failed</big>, <big class="Heading3">Main Part of Test Case</big>]
Traceback (most recent call last):
  File "D:\Code\Python\Projects\HTML_parser.py", line 43, in <module>
    print(aux.split(" "))
TypeError: 'NoneType' object is not callable
7
  • Please show your code using aux.split() because that should work Commented Dec 1, 2021 at 21:33
  • I hope you remember that Python uses 0-based indexing. Commented Dec 1, 2021 at 21:34
  • @PM77-1 yes I would need 4th and 6th element, I wrote it like that for better understanding Commented Dec 1, 2021 at 21:36
  • @ErikMcKelvey If I try to print(aux.split(" ")`)``, I get the error TypeError: 'NoneType' object is not callable``` Commented Dec 1, 2021 at 21:38
  • 1
    You're not iterating a list of strings, but a list of bs4 objects. I think that's why the calls to split are failing. You can use the methods and attributes from that library to get access to the data, or you can cast them to strings by going aux = str(res_smaller[field]) Commented Dec 1, 2021 at 23:48

1 Answer 1

2

Highly suggest using findall in re module. Since the input is not included, I am working with what I have:

import re
l = len(res_smaller)
print(res_smaller)
my_dict = {}
for field in range(1, l, 2):
    aux = res_smaller[field]
    status = re.findall('</a>: (.*?)</big>', aux, re.DOTALL)
    code = re.findall('Case (.*?):', aux, re.DOTALL)
    my_dict[code[0]] = status[0]
print(my_dict)

output:

{'CODE_2096': 'Failed', 'CODE_2101': 'Failed', 'CODE_2111': 'Failed', 'CODE_2098': 'Failed'}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.