0

Now I am tring to parse a url certificate in Python 3, this is my Python code:

class CertUtil:

    @staticmethod
    def check_cert_imp(domain):
        string_io = StringIO()
        parsed_url = urlparse(domain)
        if parsed_url.scheme == '' or parsed_url is None:
            domain = domain + "https://"
        comm = f"curl -Ivs {domain} --connect-timeout 10"
        result = subprocess.getstatusoutput(comm)
        string_io.write(result[1])
        cert_result = re.search('start date: (.*?)\n.*?expire date: (.*?)\n.*?common name: (.*?)\n.*?issuer: CN=(.*?)\n', string_io.getvalue(), re.S)
        if cert_result is None:
            logger.error("cert result is null:" + domain)
            return None
        start_date = cert_result.group(1)
        expire_date = cert_result.group(2)
        common_name = cert_result.group(3)
        issuer = cert_result.group(4)

to my suprise, the cert_result always return None. Where I am going wrong? what should I do to parse it correctly? BTW, what I am trying to parse domain is: https://admin.poemhub.top. the string_io.getvalue() is:

*   Trying 121.196.199.223:443...
* Connected to admin.poemhub.top (121.196.199.223) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [112 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [3815 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [115 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
*  subject: CN=*.poemhub.top
*  start date: Jul 12 03:06:28 2021 GMT
*  expire date: Oct 10 03:06:27 2021 GMT
*  subjectAltName: host "admin.poemhub.top" matched cert's "*.poemhub.top"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
} [5 bytes data]
> HEAD / HTTP/1.1
> Host: admin.poemhub.top
> User-Agent: curl/7.69.1
> Accept: */*
> 
{ [5 bytes data]
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.20.1
< Date: Sun, 25 Jul 2021 05:46:29 GMT
< Content-Type: text/html
< Content-Length: 4357
< Last-Modified: Wed, 23 Jun 2021 10:53:59 GMT
< Connection: keep-alive
< ETag: "60d312c7-1105"
< Accept-Ranges: bytes
< 
* Connection #0 to host admin.poemhub.top left intact
HTTP/1.1 200 OK
Server: nginx/1.20.1
Date: Sun, 25 Jul 2021 05:46:29 GMT
Content-Type: text/html
Content-Length: 4357
Last-Modified: Wed, 23 Jun 2021 10:53:59 GMT
Connection: keep-alive
ETag: "60d312c7-1105"
Accept-Ranges: bytes
4
  • Can you please post the output of string_io.getvalue()? Commented Jul 25, 2021 at 6:26
  • I have pasted the string_io.getvalue() value.@purple Commented Jul 25, 2021 at 6:31
  • 1
    Your regex does not appear in string_io.getvalue(). For example, the string "common name" is not in string_io.getvalue(). Therefore, re.search(...) returns None. Commented Jul 25, 2021 at 6:39
  • 1
    First thing to note is that you should review your code after urlparse(). You should check for None before trying to dereference it. Also, the domain modification is inverted. That aside, it looks like the issue is with your regular expression. Try breaking it down into discrete steps - i.e. get 'start date' then 'expire date' and so on Commented Jul 25, 2021 at 6:40

2 Answers 2

1

I've tried curl -Ivs "https://admin.poemhub.top" --connect-timeout 10" , and the interesting part or the result is this:

* Server certificate:
*  subject: CN=*.poemhub.top
*  start date: Jul 12 03:06:28 2021 GMT
*  expire date: Oct 10 03:06:27 2021 GMT
*  subjectAltName: host "admin.poemhub.top" matched cert's "*.poemhub.top"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.

So your regex string is almost right, except for 2 things:

  • common name: part is missing, I'd make it "optional" by (?:common name: (.*?)\n.*?)? (wrap optional match in (?: ... )
  • there's text after issuer: before CN= so I'd fix it to issuer: .*?CN=(.*?)\n (mind the .*? part before CN=)

Then the resulting regex string should be:

start date: (.*?)\n.*?expire date: (.*?)\n.*?(?:common name: (.*?)\n.*?)?issuer: .*?CN=(.*?)\n

(mind that common_name that is group(3) of result may contain None now in the case it is missing)

Unrelated to question I think you made a mistake in domain = domain + "https://" which should be domain = "https://" + domain (you want to prepend protocol to the domain, not domain to the protocol).

Sign up to request clarification or add additional context in comments.

1 Comment

It should work for now, but I also support Andy Knight's comment that you should parse each line with a separate regular expression since the order may eventually change.
1

Not an answer to your original question but you can use ssl.SSLSocket.getpeercert to parse the SSL certificate:

test.py:

import pprint
import socket
import ssl


HOSTNAME = "admin.poemhub.top"

def main():
    context = ssl.create_default_context()

    with socket.socket(socket.AF_INET) as s:
        conn = context.wrap_socket(s, server_hostname=HOSTNAME)
        conn.connect((HOSTNAME, 443))
        cert = conn.getpeercert()

    pprint.pprint(cert)


if __name__ == "__main__":
    main()

Test:

$ python test.py
{'OCSP': ('http://r3.o.lencr.org',),
 'caIssuers': ('http://r3.i.lencr.org/',),
 'issuer': ((('countryName', 'US'),),
            (('organizationName', "Let's Encrypt"),),
            (('commonName', 'R3'),)),
 'notAfter': 'Oct 10 03:06:27 2021 GMT',
 'notBefore': 'Jul 12 03:06:28 2021 GMT',
 'serialNumber': '04EAC6ED15CFE60424FD96AD9B24D9293D4E',
 'subject': ((('commonName', '*.poemhub.top'),),),
 'subjectAltName': (('DNS', '*.poemhub.top'),),
 'version': 3}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.