NVD - JSON to CSV with Python

Question

I am trying to download the NVD CVE. Here is my pythoncode:

import requests
import re

r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')
for filename in re.findall("nvdcve-1.0-[0-9]*\.json\.zip",r.text):
    print(filename)
    r_file = requests.get("https://static.nvd.nist.gov/feeds/json/cve/1.0/" + filename, stream=True)
    with open("nvd/" + filename, 'wb') as f:
        for chunk in r_file:
            f.write(chunk)

Now I want to write all JSON-files ina csv-file with this format:

Name, Value, Description, ..., ...  
Name, Value, Description, ..., ...

Can somebody help me?

Martin Evans · Accepted Answer · 2017-11-16 13:17:34Z

3

The following should get you started, giving you two columns, ID,VendorName,DescriptionandVendorValues`:

import requests
import re
import zipfile
import io
import json
import csv

with open("nvdcve-1.0-2017.json") as f_json:
r = requests.get('https://nvd.nist.gov/vuln/data-feeds#JSON_FEED')

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['ID', 'VendorName', 'Description', 'VersionValues'])

    for filename in re.findall("nvdcve-1.0-[0-9]*\.json\.zip", r.text):
        print("Downloading {}".format(filename))
        r_zip_file = requests.get("https://static.nvd.nist.gov/feeds/json/cve/1.0/" + filename, stream=True)
        zip_file_bytes = io.BytesIO()

        for chunk in r_zip_file:
            zip_file_bytes.write(chunk)

        zip_file = zipfile.ZipFile(zip_file_bytes)

        for json_filename in zip_file.namelist():
            print("Extracting {}".format(json_filename))
            json_raw = zip_file.read(json_filename).decode('utf-8')
            json_data = json.loads(json_raw)

            for entry in json_data['CVE_Items']:
                try:
                    vendor_name = entry['cve']['affects']['vendor']['vendor_data'][0]['vendor_name']
                except IndexError:
                    vendor_name = "unknown"

                try:
                    url = entry['cve']['references']['reference_data'][0]['url']
                except IndexError:
                    url = ''

                try:
                    vv = []

                    for pd in entry['cve']['affects']['vendor']['vendor_data'][0]['product']['product_data']:
                        for vd in pd['version']['version_data']:
                            vv.append(vd['version_value'])

                    version_values = '/'.join(vv)
                except IndexError:
                    version_values = ''

                csv_output.writerow([
                    entry['cve']['CVE_data_meta']['ID'],
                    url,
                    vendor_name,
                    entry['cve']['description']['description_data'][0]['value'],
                    version_values])

This downloads the zipfile into memory. It then extracts all files one at a time into memory and converts the json into a Python datas structure using json.loads(). For each entry in CVE_Items it then extracts a couple of the fields and writes them to a CSV file.

As the JSON data is highly structured, you will need to consider how you would want to represent all of the fields in a CSV file. Currently it extras two "useful" fields and stores those.

Alternatively instead of making your own CSV you could work with Pandas:

df = pd.read_json(json_raw)
df.to_csv(f_output)

Remove the csv_output lines. This though would need some extra work to decide on how it should be formatted.

edited Nov 16, 2017 at 13:17

answered Nov 15, 2017 at 18:41

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

TigerClaw Over a year ago

Hi, thanks for your answer. I tryed this with pandas, but I get this format: "0,"{'cve': {'data_type': 'CVE', 'data_format': ...". Your first solution is better because I have customization options. I had a similar solution like yours but I did a mistake with the keywords in the for-loop.

Martin Evans Over a year ago

Just add ID to the header and then add a new first entry to the list in csv_output.writerow(). I have updated the script to show this. I agree this approach is quite flexible depending on what you are trying to extract.

TigerClaw Over a year ago

If I use this code: entry['cve']['references']['reference_data'][0]['url'] I get the error "list index out of range"

Martin Evans Over a year ago

The code is correct, but not all entries have a url, so you would need to add another try except block.

TigerClaw Over a year ago

you are right. For the versionnumber: version_value = entry['cve']['affects']['vendor']['vendor_data'][0]['product']['product_data'][0]['version']['version_data'][0]['version_value'] Now it works with try-catch

|

Collectives™ on Stack Overflow

NVD - JSON to CSV with Python

1 Answer 1

12 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Related