2

I am reading in some data which looks like this:

Original data

In this dataset, a number of rows have null in column 16. I need to shift the values in such rows to the right, so that the values which begin with "*" (eg. column 16 row 4, column 13 row 5 etc.) will move to the columns right of them. (Eventually I will do this in a loop so that those values will go into column 16) .

The data to the left of these values also have to move too. For example when the data in {column 7 row 16} moves to {column 8, row 16}, the data in {column 2 row 16} should move to {column 3 row 16}.

However, I do not want the data in column 1 (zero index column 0) to move as I will be using that as an index for my data.

Hence my expected output is this:

Expected output after iteration 1

I am using the code below to achieve this:

import StringIO
import pandas

# Store the csv string in a variable and turn that into a dataframe
# This string here is the same as the data in the image above.
gps_string = """2010-01-12 18:00:00,$GPGGA,180439,7249.2150,N,11754.4238,W,2.0,10,0.9,-8.1,M,-12.4,M,,*57,,,
2010-01-12 17:30:00,$GPGGA,173439,7249.2160,N,11754.4233,W,2.0,11,0.8,-4.5,M,-12.4,M,,*5B,,,
2010-01-12 17:00:00,$GPGGA,170439,7249.2152,N,11754.4235,W,2.0,11,0.8,-3.1,M,-12.4,M,,*5C,,,
2010-01-12 16:30:00,N,11754.4210,W,2,9.0,1.1,-13.1,M,-12.4,M,,*6C,,,,,,
2010-01-12 16:00:00,N,11754.4229,W,2,10.0,0.9,-2.9,M,-12.4,M,,*53,,,,,,
2010-01-12 15:30:00,N,11754.4269,W,2,9.0,0.8,-4.3,M,-12.4,M,,*54,,,,,,
2010-01-12 15:00:00,N,11754.4267,W,2,10.0,0.8,-1.6,M,-12.4,M,,*56,,,,,,
2010-01-12 14:30:00,$GPGGA,143439,7249.2152,N,11754.4253,W,2.0,11,0.7,-4.3,M,-12.4,M,,*56,,,
2010-01-12 14:00:00,N,11754.4245,W,2,10.0,0.9,-7.0,M,-12.4,M,,*50,,,,,,
2010-01-12 13:30:00,$GPGGA,133439,7249.2134,N,11754.4243,W,2.0,11,0.7,-10.7,M,-12.4,M,,*61,,,
2010-01-12 13:00:00,N,11754.4245,W,2,10.0,0.8,-5.5,M,-12.4,M,,*56,,,,,,
2010-01-12 12:30:00,N,11754.4226,W,2,10.0,0.9,-7.1,M,-12.4,M,,*59,,,,,,
2010-01-12 12:00:00,N,11754.4238,W,2,10.0,0.8,-6.5,M,-12.4,M,,*51,,,,,,
2010-01-12 11:30:00,N,11754.4227,W,2,10.0,0.8,0.1,M,-12.4,M,,*73,,,,,,
2010-01-12 11:00:00,-7.4,M,-12.4,M,,*5F,,,,,,,,,,,,
2010-01-12 10:30:00,N,11754.4271,W,2,8.0,1.1,-8.4,M,-12.4,M,,*5A,,,,,,
""" 
# Read the csv string into a dataframe, with no headers
# Make the first column with timestamp values the index column.
gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, 
index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index

# Shift the rows here.
gps_df.loc[rows_to_shift] = gps_df.loc[rows_to_shift].shift(periods=1, axis=1)
gps_df.to_csv("f.csv") # Creates a file after shift to see the output

I get the following output file when the code is executed.

Output after Execution

From this I see that the shift function creates a column of null(s) at column 5 for some reason, and it also moves the data that was originally in column 10 into column 15, any idea why this might be the case?

Could this be a bug in the dataframe.shift() function? or am I doing something wrong here?

7
  • It would be really helpful if you could provide some data as text rather than as pictures so people could test and compare solutions to what you've tried Commented Jun 17, 2019 at 20:17
  • @G.Anderson, I have added some test data to the question as requested Commented Jun 18, 2019 at 4:07
  • Please see how to create good pandas examples and show both your sample input and desired output, as your problem description is confusing Commented Jun 18, 2019 at 15:58
  • @G.Anderson I've made some edits to the question, is it better now? Commented Jun 18, 2019 at 21:19
  • 1
    I'm also seeing some very weird behavior with .shift() in my testing, trying to figure it out on my end as well. It seems to reorder text columns when applied along axis 1 Commented Jun 18, 2019 at 22:00

1 Answer 1

2

This is a bug in pandas, and more details can be found here .

it seems that shifting object columns will automatically shift to the next column that has an object dtype.

In order to work around this issue, I select the indexes I want to shift, convert all the data in my dataframe to strings, perform the shift, get the data as a csv string again, and then recreate the dataframe to get the previous datatypes.

Below is the code I have used to work around this issue:

import pandas as pd
import StringIO

gps_string = """
"2010-01-12 18:00:00","$GPGGA","180439","7249.2150","N","11754.4238","W","2","10","0.9","-8.1","M","-12.4","M","","*57","","",""
"2010-01-12 17:30:00","$GPGGA","173439","7249.2160","N","11754.4233","W","2","11","0.8","-4.5","M","-12.4","M","","*5B","","",""
"2010-01-12 17:00:00","$GPGGA","170439","7249.2152","N","11754.4235","W","2","11","0.8","-3.1","M","-12.4","M","","*5C","","",""
"2010-01-12 16:30:00","N","11754.4210","W","2","09","1.1","-13.1","M","-12.4","M","","*6C","","","","","",""
"2010-01-12 16:00:00","N","11754.4229","W","2","10","0.9","-2.9","M","-12.4","M","","*53","","","","","",""
"2010-01-12 15:30:00","N","11754.4269","W","2","09","0.8","-4.3","M","-12.4","M","","*54","","","","","",""
"2010-01-12 15:00:00","N","11754.4267","W","2","10","0.8","-1.6","M","-12.4","M","","*56","","","","","",""
"2010-01-12 14:30:00","$GPGGA","143439","7249.2152","N","11754.4253","W","2","11","0.7","-4.3","M","-12.4","M","","*56","","",""
"2010-01-12 14:00:00","N","11754.4245","W","2","10","0.9","-7.0","M","-12.4","M","","*50","","","","","",""
"2010-01-12 13:30:00","$GPGGA","133439","7249.2134","N","11754.4243","W","2","11","0.7","-10.7","M","-12.4","M","","*61","","",""
"2010-01-12 13:00:00","N","11754.4245","W","2","10","0.8","-5.5","M","-12.4","M","","*56","","","","","",""
"2010-01-12 12:30:00","N","11754.4226","W","2","10","0.9","-7.1","M","-12.4","M","","*59","","","","","",""
"2010-01-12 12:00:00","N","11754.4238","W","2","10","0.8","-6.5","M","-12.4","M","","*51","","","","","",""
"2010-01-12 11:30:00","N","11754.4227","W","2","10","0.8","0.1","M","-12.4","M","","*73","","","","","",""
"2010-01-12 11:00:00","-7.4","M","-12.4","M","","*5F","","","","","","","","","","","",""
"2010-01-12 10:30:00","N","11754.4271","W","2","08","1.1","-8.4","M","-12.4","M","","*5A","","","","","",""

 """

gps_df = pd.read_csv(StringIO.StringIO(gps_string), header=None, index_col=0)
rows_to_shift = gps_df[gps_df[15].isnull()].index  # get the indexes to shift
gps_df_all_strings = gps_df.astype(str)  # Convert all the data to be of type str (string)

# Shift the data
gps_df_all_strings.loc[rows_to_shift] = gps_df_all_strings.loc[rows_to_shift].shift(periods=1, axis=1)
s = gps_df_all_strings.to_csv(header=None)  # Put shifted csv data into a string after shifting.
new_gps_df = pd.read_csv(StringIO.StringIO(s), header=None, index_col=0)  # re read csv data.

Sign up to request clarification or add additional context in comments.

1 Comment

This is an involved and for me an unacceptable workaround. Have you found any simpler approach in the past few years?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.