3

I have initialised a dataframe like this:

df = pd.DataFrame(columns=["stockname","timestamp","price","volume"])
df.timestamp = pd.to_datetime(df.timestamp, format = "%Y-%m-%d %H:%M:%S:%f")
df.set_index(['stockname', 'timestamp'], inplace = True)

Now I get stream of data from somewhere but for the sake of program let me write it like this:

filehandle = open("datasource")

for line in filehandle:
    line = line.rstrip()
    data = line.split(",")
    stockname = data[4]
    price = float(data[3])
    timestamp = pd.to_datetime(data[0], format = "%Y-%m-%d %H:%M:%S:%f")
    volume = int(data[6])

    df.loc[stockname, timestamp] = [price, volume]

filehandle.close()

print df

but this is giving error:

ValueError: cannot set using a multi-index selection indexer with a different length than the value

2
  • 1
    Can you add a sample of "datasource" ? Commented Dec 10, 2017 at 9:55
  • You know, you dont have to do all the heavy work of striping, spliting, just use pd.read_csv. If you add the sample of text file I would show you how to do that. Commented Dec 10, 2017 at 9:57

3 Answers 3

11

Specify the column names you are assigning data to i.e

df = pd.DataFrame(columns=["a","b","c","d"])
df.set_index(['a', 'b'], inplace = True)

df.loc[('3','4'),['c','d']] = [4,5]

df.loc[('4','4'),['c','d']] = [3,1]

      c    d
a b          
3 4  4.0  5.0
4 4  3.0  1.0

Also if you have a comma separated file then you can use read_csv i.e :

import io
import pandas as pd
st = '''2017-12-08 15:29:58:740657,245.0,426001,248.65,APPL,190342,2075673,249.35,244.2
        2017-12-08 16:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
        2017-12-08 18:29:58:740657,245.0,426001,248.65,GOOGL,190342,2075673,249.35,244.2
        '''
#instead of `io`, add the source name
df = pd.read_csv(io.StringIO(st),header=None)
# Now set the index and select what you want 
df.set_index([0,4])[[1,7]]

                                   1       7
 0                          4                   
2017-12-08 15:29:58.740657 APPL   245.0  249.35
2017-12-08 16:29:58.740657 GOOGL  245.0  249.35
2017-12-08 18:29:58.740657 GOOGL  245.0  249.35
Sign up to request clarification or add additional context in comments.

8 Comments

this worked. Now you see I have stockname and timeframe. I want to access all data of a particular stock but I cannot write df[df.index == "XYZ"] so how do I write to access a particular stock data from dataframe ?
@Tahseen I need to see how the data actually looks like
2017-12-08 15:29:58:740657,245.0,426001,248.65,APPL,190342,2075673,249.35,244.2
this one line of log and there would be many with same or different time stamp and same or different stockname. So in a single file, I am storing, multiple stocks tick prices at any particular time. That is why I indexed time and stockname
But this has worked perfectly file for me the way you did. Thanks
|
2

You might want to use df.at[index, column_name] = value to escape this error

Comments

1

I think what you are looking for is:

df.loc[a,b,:] = [c,d]

Here is an example with your dataframe:

for i in range(3):
    for j in range(3):
        df.loc[(str(i),str(j)),:] = [i,j]

Output:

     c  d
a b      
0 0  0  0
  1  0  1
  2  0  2
1 0  1  0
  1  1  1
  2  1  2
2 0  2  0
  1  2  1
  2  2  2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.