2

I am trying to download the NVD CVE. Here is my pythoncode:

import requests
import re

r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')
for filename in re.findall("nvdcve-1.0-[0-9]*\.json\.zip",r.text):
    print(filename)
    r_file = requests.get("https://static.nvd.nist.gov/feeds/json/cve/1.0/" + filename, stream=True)
    with open("nvd/" + filename, 'wb') as f:
        for chunk in r_file:
            f.write(chunk)

Now I want to write all JSON-files ina csv-file with this format:

Name, Value, Description, ..., ...  
Name, Value, Description, ..., ...

Can somebody help me?

1 Answer 1

3

The following should get you started, giving you two columns, ID,VendorName,DescriptionandVendorValues`:

import requests
import re
import zipfile
import io
import json
import csv

with open("nvdcve-1.0-2017.json") as f_json:
r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ID', 'VendorName', 'Description', 'VersionValues'])

    for filename in re.findall("nvdcve-1.0-[0-9]*\.json\.zip", r.text):
        print("Downloading {}".format(filename))
        r_zip_file = requests.get("https://static.nvd.nist.gov/feeds/json/cve/1.0/" + filename, stream=True)
        zip_file_bytes = io.BytesIO()

        for chunk in r_zip_file:
            zip_file_bytes.write(chunk)

        zip_file = zipfile.ZipFile(zip_file_bytes)

        for json_filename in zip_file.namelist():
            print("Extracting {}".format(json_filename))
            json_raw = zip_file.read(json_filename).decode('utf-8')
            json_data = json.loads(json_raw)

            for entry in json_data['CVE_Items']:
                try:
                    vendor_name = entry['cve']['affects']['vendor']['vendor_data'][0]['vendor_name']
                except IndexError:
                    vendor_name = "unknown"

                try:
                    url = entry['cve']['references']['reference_data'][0]['url']
                except IndexError:
                    url = ''

                try:
                    vv = []

                    for pd in entry['cve']['affects']['vendor']['vendor_data'][0]['product']['product_data']:
                        for vd in pd['version']['version_data']:
                            vv.append(vd['version_value'])

                    version_values = '/'.join(vv)
                except IndexError:
                    version_values = ''

                csv_output.writerow([
                    entry['cve']['CVE_data_meta']['ID'],
                    url,
                    vendor_name,
                    entry['cve']['description']['description_data'][0]['value'],
                    version_values])

This downloads the zipfile into memory. It then extracts all files one at a time into memory and converts the json into a Python datas structure using json.loads(). For each entry in CVE_Items it then extracts a couple of the fields and writes them to a CSV file.

As the JSON data is highly structured, you will need to consider how you would want to represent all of the fields in a CSV file. Currently it extras two "useful" fields and stores those.

Alternatively instead of making your own CSV you could work with Pandas:

df = pd.read_json(json_raw)
df.to_csv(f_output)

Remove the csv_output lines. This though would need some extra work to decide on how it should be formatted.

Sign up to request clarification or add additional context in comments.

12 Comments

Hi, thanks for your answer. I tryed this with pandas, but I get this format: "0,"{'cve': {'data_type': 'CVE', 'data_format': ...". Your first solution is better because I have customization options. I had a similar solution like yours but I did a mistake with the keywords in the for-loop.
Just add ID to the header and then add a new first entry to the list in csv_output.writerow(). I have updated the script to show this. I agree this approach is quite flexible depending on what you are trying to extract.
If I use this code: entry['cve']['references']['reference_data'][0]['url'] I get the error "list index out of range"
The code is correct, but not all entries have a url, so you would need to add another try except block.
you are right. For the versionnumber: version_value = entry['cve']['affects']['vendor']['vendor_data'][0]['product']['product_data'][0]['version']['version_data'][0]['version_value'] Now it works with try-catch
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.