2

I'm a beginner when it comes to using Pandas. But I want to take the table of G-Sync Gaming Monitors in Nvidia's website here: https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/ and convert that to a data frame in Pandas for Python.

The first thing I tried to do was

import pandas as pd
df = pd.read_html('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

but that didn't seem to work. I got a ValueError: No tables found.

Then I tried to do

import requests
import lxml.html as lh
page = requests.get('https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/')

but somehow I got ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check')).

If someone could explain why the first two ways didn't work and how to actually get the table into a data frame, that would be very helpful. Thank you!

2
  • looks like all that data is available in an XHR: nvidia.com/content/dam/en-zz/Solutions/geforce/… you can requests.get() this then json.loads() the content use to as a python dictionary Commented May 29, 2020 at 0:28
  • 1
    See this comment " The table doesn't exist in the page html, it loads asynchronously after the rest of the page. Pandas doesn;t wait for the page to load java content. You may need some sort of automation like Selenium to load the page before trying to parse it " from this post here: stackoverflow.com/questions/53398785/… Commented May 29, 2020 at 0:28

1 Answer 1

3

The data is loaded dynamically via json request.

This script loads the json data into a dataframe and prints it:

import re
import json
import pandas as pd

url = 'https://www.nvidia.com/en-us/geforce/products/g-sync-monitors/specs/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}

html_txt = requests.get(url, headers=headers).text

json_url =  'https://www.nvidia.com' + re.search(r"'url': '(.*?)'", html_txt).group(1)

data = requests.get(json_url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

def fn(x):
    out = []
    for v in x:
        if isinstance(v, dict):
            out.append(v['en'])
        else:
            out.append(v)
    return out

df = pd.json_normalize(data['data'], max_level=0).apply(fn)
print(df)

Prints:

                  type manufacturer      model  hdr     size lcd type        resolution variable refresh rate range variable overdrive variable refresh input    driver needed
0      G-SYNC ULTIMATE         Acer    CP7271K  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
1      G-SYNC ULTIMATE         Acer        X27  Yes       27      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
2      G-SYNC ULTIMATE         Acer        X32  Yes       32      IPS    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
3      G-SYNC ULTIMATE         Acer        X35  Yes       35       VA  3440x1440 (WQHD)                     1-200Hz                Yes           Display Port              N/A
4      G-SYNC ULTIMATE         Asus       PG65  Yes       65       VA    3840x2160 (4K)                     1-144Hz                Yes           Display Port              N/A
..                 ...          ...        ...  ...      ...      ...               ...                         ...                ...                    ...              ...
159  G-SYNC Compatible           LG    2020 ZX  Yes   77, 88     OLED    7680x4320 (8K)                    40-120Hz                 No                   HDMI  445.51 or newer
160  G-SYNC Compatible          MSI   MAG251RX  Yes     24.5      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.66 or newer
161  G-SYNC Compatible        Razer  Raptor 27  Yes       27      IPS   2560x1440 (QHD)                    48-144Hz                 No           Display Port  431.60 or newer
162  G-SYNC Compatible      Samsung       CRG5   No       27       VA   1920x1080 (FHD)                    48-240Hz                 No           Display Port  430.86 or newer
163  G-SYNC Compatible    ViewSonic      XG270   No       27      IPS   1920x1080 (FHD)                    48-240Hz                 No           Display Port  441.41 or newer

[164 rows x 11 columns]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the answer. Can you explain what each part does in more detail though? Sorry if it might be trivial but I don't really understand the html and json stuff.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.