0

I have data frame series which looks like this:

   a    b    r
1  43  630  587    

   d    b    c
1  34  30  87

I want to create a new dataframe which looks like:

 a   b     r    d   c
 43  630  587   0   0
 0    30   0    34  87

I have used the code:

appended_data= pd.concat(appended_data, axis=0)

where the list appended_data contains the individual dataframe series as elements. Earlier when I used it with other dataset it didnt throw any error but with the new dataset its showing ValueError: Plan shapes are not aligned

Note: Earlier dataset also had duplicate columns and it worked fine then and I also updated pandas.These were the solutions I found online.

full code:

dir_list = [benign_freq_dir,malign_freq_dir]

    appended_data = []

    for l in dir_list: 
        for root, dirs, files in os.walk(l):
            #print(root)
            for name in files:

                file = open(root+"/"+name,'r') 
                print(name)
                print("\n")
                df = pd.read_csv(file,header=None,error_bad_lines=False)   #In windows and python3 always pass file object not the path directly in pd.read_csv                
                #print(df)
                df = df.rename(columns={0: 'col'})
                #print(df)   
                df = pd.DataFrame(df.col.str.split(' ',1).tolist(), columns = ['col1','col2']).T.reset_index(drop=True)          
                df = df.rename(columns=df.iloc[0]).drop(df.index[0])
                print(df)


                appended_data.append(df)
                if l==benign_freq_dir:
                    df['class']=0
                else:
                    df['class']=1

    #for l in appended_data:
    #   print(l)
    #   print(type(l))  
    appended_data= pd.concat(appended_data, axis=0,sort=False)

enter image description here

edit:

output for:

for dfx in appended_data: 
        print(dfx.head(2).to_dict())

enter image description here

1
  • Those do not look like series though. Looks like you have two dataframes Commented Oct 21, 2018 at 8:30

2 Answers 2

3

You will need an outer join for that.

import pandas as pd

df1 = pd.DataFrame({
    'a': [43],
    'b': [630],
    'r': [587]
})

df2 = pd.DataFrame({
    'd': [34],
    'b': [30],
    'c': [87]
})

df3 = df1.merge(df2, how='outer').fillna(0)
print(df3)

Yields what you need.

      a    b      r     d     c
0  43.0  630  587.0   0.0   0.0
1   0.0   30    0.0  34.0  87.0

Docs on pd.merge
Docs on outer join

EDIT: OP, pd.concat should work as expected and Anton has proven that.

Since pd.merge was my answer, I have to stick with that.

Some pseudocode if you want to merge a list of dataframes.

def merge(lst, df=None):
    if df is None:
        df = lst.pop()
    to_be_merged = lst.pop()
    merged = df.merge(to_be_merged, how='outer')
    if lst:
        return merge(lst, merged)
    return merged.fillna(0)

df = merge(list_of_dfs)

That way you will know instantly which df is at fault because clearly there's a problem with your data. Catch the exception and use .describe() and .info() to debug this issue.

Sign up to request clarification or add additional context in comments.

2 Comments

I am storing the dataframes in a list and then i need to stack them up
this is giving the error-pandas.errors.MergeError: Data columns not unique: Index(['getpid', 'msgget', 'ioctl', 'mmap2', 'mprotect', 'clone', 'recv', 'close', 'munmap', 'class', 'getuid32', 'semget', 'open', 'write', 'dup', 'access', 'stat64', 'fstat64', '_llseek', 'read', 'lseek', 'fcntl64', 'flock', 'pread', 'gettimeofday', 'brk', 'sigprocmask', 'getpriority', 'getdents64', 'writev', 'ipc_subcall', 'chmod', 'sched_yield', 'pipe', 'fork', '---', '---'], dtype='object')
2

You can use pd.concat. You should however pass both dataframes.

pd.concat([df1,df2], axis=0, sort=False).fillna(0) #.astype(int) for ints

#      a    b      r     d     c
#0  43.0  630  587.0   0.0   0.0
#0   0.0   30    0.0  34.0  87.0

Sample data from Tom Wojcik.

9 Comments

As I have mentioned i have done exactly the same but still i got the error ValueError: Plan shapes are not aligned
@ubuntu_noob In that case I suggest you try to share some data you can play with. Just like Tom provided us. See minimal reproducible example for more info.
That is the type of data I have...thats why its confusing...it worked before but now with the new dataset its not
@ubuntu_noob Yes but you should share your data as a verifiable example. If you for instance look at Tom's code: it is runnable. If you create a runnable example of your problem and point out what is wrong you get help faster and it serves the community.
Yes I understand your points and they are valid ones too...the example provided by Tom is actually a correct representation of the data I am having....could you provide some help in identifying where the problem is with my data?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.