3

I am trying to read a data file that looks like this (for the first 4 rows) using pandas.read_csv.

num NED IAU z N det TOF type alph.4 alph1 alph5 alph10 alph15 alph20 alph50  dSIbydnu  DWORRY UVNONE
1 ESO473-G007 J001605-234  0.06401 19.51 det  8.59 u  -0.432  -0.428  -0.413 -0.402 -0.395 -0.389 -0.369 0.017 53.53 UV
2 PKS0023-26 J0025-2602  0.32162 18.36 det  7.95 a  -0.272  -0.437  -0.726 -0.849 -0.919 -0.972 -1.135 -0.414 53.57 UV
3 NGC0315 0055+30  0.01648 18.84 det  7.41 a  -0.248  -0.306  -0.379 -0.398 -0.406 -0.411 -0.417 -0.119 53.60 UV

I type data = pandas.read_csv('radio.dat',sep=' ', header=0), but when I print data I get 3 row headers, and then the column names start 3 columns down from where they should start, and I get an extra 3 NaN columns:

                                           num      NED    IAU    z   N   det ...
1  ESO473-G007             J001605-234     NaN  0.06401  19.51  det NaN  8.59 ...
2  PKS0023-26              J0025-2602      NaN  0.32162  18.36  det NaN  7.95 ...
3  NGC0315                 0055+30         NaN  0.01648  18.84  det NaN  7.41 ...

num should be the header of the column with 1 2 3..., NED for the next column, IAU for the next, and the NaNs shouldn't even be there.

I have tried setting index_col=0 but that gives me this error:

  File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 1184, in read
    values = data.pop(self.index_col[i])
IndexError: list index out of range

Setting index_col=False gives

  File "C:\Python27\lib\site-packages\pandas\io\parsers.py", line 1164, in read
    data = self._reader.read(nrows)
  File "pandas\parser.pyx", line 758, in pandas.parser.TextReader.read (pandas\parser.c:7411)
  File "pandas\parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:7651)
  File "pandas\parser.pyx", line 855, in pandas.parser.TextReader._read_rows (pandas\parser.c:8484)
  File "pandas\parser.pyx", line 936, in pandas.parser.TextReader._convert_column_data (pandas\parser.c:9490)
  File "pandas\parser.pyx", line 1208, in pandas.parser.TextReader._get_column_name (pandas\parser.c:13172)
IndexError: list index out of range

How do I read this file properly?

1 Answer 1

4

You have varying number of spaces in your csv use sep='\s+' to handle this

In [128]:

t="""num NED IAU z N det TOF type alph.4 alph1 alph5 alph10 alph15 alph20 alph50  dSIbydnu  DWORRY UVNONE
1 ESO473-G007 J001605-234  0.06401 19.51 det  8.59 u  -0.432  -0.428  -0.413 -0.402 -0.395 -0.389 -0.369 0.017 53.53 UV
2 PKS0023-26 J0025-2602  0.32162 18.36 det  7.95 a  -0.272  -0.437  -0.726 -0.849 -0.919 -0.972 -1.135 -0.414 53.57 UV
3 NGC0315 0055+30  0.01648 18.84 det  7.41 a  -0.248  -0.306  -0.379 -0.398 -0.406 -0.411 -0.417 -0.119 53.60 UV"""
data = pd.read_csv(io.StringIO(t),sep='\s+', header=0)
data
​
Out[128]:
   num          NED          IAU        z      N  det   TOF type  alph.4  \
0    1  ESO473-G007  J001605-234  0.06401  19.51  det  8.59    u  -0.432   
1    2   PKS0023-26   J0025-2602  0.32162  18.36  det  7.95    a  -0.272   
2    3      NGC0315      0055+30  0.01648  18.84  det  7.41    a  -0.248   

   alph1  alph5  alph10  alph15  alph20  alph50  dSIbydnu  DWORRY UVNONE  
0 -0.428 -0.413  -0.402  -0.395  -0.389  -0.369     0.017   53.53     UV  
1 -0.437 -0.726  -0.849  -0.919  -0.972  -1.135    -0.414   53.57     UV  
2 -0.306 -0.379  -0.398  -0.406  -0.411  -0.417    -0.119   53.60     UV  

So here:

alph50 dSIbydnu DWORRY

you have double spaces between these columns and between your data also

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.