1

I'm quite new to python, the thing I'm trying to do is get data from an website and add a part of the webpage to and pandas dataframe.

This is the code I got already but I'm getting an error when adding data to the Dataframe.

The Code I got:

url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)

soup = BeautifulSoup(r.content, 'html.parser')

price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame()

for data in price_data:
  a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
  df.append(a)

print(df)

The Error I'm Getting:

ValueError                                Traceback (most recent call last)
<ipython-input-33-963d51917cf2> in <module>()
 10 
 11 for data in price_data:
---> 12   a = pd.DataFrame(data.text.split(":")[0],data.text.split(":")[1])
 13   df.append(a)
 14 

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
507                 )
508             else:
--> 509                 raise ValueError("DataFrame constructor not properly called!")
510 
511         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!
1
  • 1
    Hey, I'm a big rs fan! Commented Jul 9, 2020 at 0:36

3 Answers 3

1

It seems that the data structure you get from data.text.split(":")[0],data.text.split(":")[1] does not suit what is expected from the function pd.DataFrame(). First take a look at the documentation of the function to fully understand what is expecting and how to properly pass data to it. You can either pass a dictionary with the column name and the values (arrays must be of equal length, or an index should be specified), or lists/arrays as YOBEN_S proposed, for example:

a = pd.DataFrame({'Column_1':data.text.split(":")[0],'Column_2':data.text.split(":")[1]})

Since you are dealing with html data, you should try a different approach using pandas.read_html() which can be read here for more information

Sign up to request clarification or add additional context in comments.

Comments

0

Fix your code by

pd.DataFrame([[data.text.split(":")[0],data.text.split(":")[1]]])

Comments

0

I did some more research, the best way for me to do it was:

#get data from marketwatch

url = 'https://oldschool.runescape.wiki/w/Module:Exchange/Anglerfish/Data'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
price_data = soup.find_all('span', class_='s1')
df = pd.DataFrame(columns=['timestamp', 'price'])

for data in price_data:
  df = df.append({'timestamp': data.text.split(":")[0], 'price': data.text.split(":")[1]}, ignore_index=True)

print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.