4

Hi I'm learning data science and am trying to make a big data company list from a list with companies in various industries.

I have a list of row numbers for big data companies, named comp_rows. Now, I'm trying to make a new dataframe with the filtered companies based on the row numbers. Here I need to add rows to an existing dataframe but I got an error. Could someone help?

my datarame looks like this.

    company_url company tag_line    product data
0   https://angel.co/billguard  BillGuard   The fastest smartest way to track your spendin...   BillGuard is a personal finance security app t...   New York City · Financial Services · Security ...
1   https://angel.co/tradesparq Tradesparq  The world's largest social network for global ...   Tradesparq is Alibaba.com meets LinkedIn. Trad...   Shanghai · B2B · Marketplaces · Big Data · Soc...
2   https://angel.co/sidewalk   Sidewalk    Hoovers (D&B) for the social era    Sidewalk helps companies close more sales to s...   New York City · Lead Generation · Big Data · S...
3   https://angel.co/pangia Pangia  The Internet of Things Platform: Big data mana...   We collect and manage data from sensors embedd...   San Francisco · SaaS · Clean Technology · Big ...
4   https://angel.co/thinknum   Thinknum    Financial Data Analysis Thinknum is a powerful web platform to value c...   New York City · Enterprise Software · Financia...

My code is below:

bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data'])

for count, item in enumerate(data.iterrows()):
    for number in comp_rows:
        if int(count) == int(number):
            bigdata_comp.append(item)

Error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-234-1e4ea9bd9faa> in <module>()
      4     for number in comp_rows:
      5         if int(count) == int(number):
----> 6             bigdata_comp.append(item)
      7 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
   3814         from pandas.tools.merge import concat
   3815         if isinstance(other, (list, tuple)):
-> 3816             to_concat = [self] + other
   3817         else:
   3818             to_concat = [self, other]

TypeError: can only concatenate list (not "tuple") to list
2
  • There's probably a way to do this without loops using indexing or boolean indexing. Please post your desired output for clarification Commented May 6, 2015 at 17:39
  • Thanks! fixxxer explained for me very well. Commented May 7, 2015 at 1:25

2 Answers 2

8

It seems you are trying to filter out an existing dataframe based on indices (which are stored in your variable called comp_rows). You can do this without using loops by using loc, like shown below:

In [1161]: df1.head()
Out[1161]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255

We will get the rows with indices 'a','b' and 'c', for all columns:

In [1162]: df1.loc[['a','b','c'],:]
Out[1162]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139

You can read more about it here.

About your code:

1. You do not need to iterate through a list to see if an item is present in it: Use the in operator. For example -

In [1199]: 1 in [1,2,3,4,5]
Out[1199]: True

so, instead of

for number in comp_rows:
        if int(count) == int(number):

do this

if number in comp_rows

2. pandas append does not happen in-place. You have to store the result into another variable. See here.

3.

Append one row at a time is a slow way to do what you want. Instead, save each row that you want to add into a list of lists, make a dataframe of it and append it to the target dataframe in one-go. Something like this..

temp = []
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()):
    # if count in comp_rows:
    temp.append( list(item[1]))

## -- End pasted text --

In [1233]: temp
Out[1233]: 
[[1.9350940285526077,
  -0.16057932637141861,
  -0.17345827000000605,
  0.43326722021644282],
 [1.66963201034217,
  -1.1308932586268696,
  -1.2103527446031515,
  0.82213753819050794],
 [0.49462218161377397,
  1.0140133740187862,
  0.2156547595968879,
  1.0451391564351897]]

In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D']))

In [1237]: df2
Out[1237]: 
          A         B         C         D
a  1.935094 -0.160579 -0.173458  0.433267
b  1.669632 -1.130893 -1.210353  0.822138
c  0.494622  1.014013  0.215655  1.045139
d -0.628889  0.223170 -0.616019 -0.264982
e -0.823133  0.385790 -0.654533  0.582255
f -0.872135  2.938475 -0.099367 -1.472519
0  1.935094 -0.160579 -0.173458  0.433267
1  1.669632 -1.130893 -1.210353  0.822138
2  0.494622  1.014013  0.215655  1.045139
Sign up to request clarification or add additional context in comments.

Comments

0

Replace the following line:

for count, item in enumerate(data.iterrows()):

by

for count, (index, item) in enumerate(data.iterrows()):

or even simply as

for count, item in data.iterrows():

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.