0

This is the question continous from my previous question. Thank to many people, I could modify my code as below.

import csv
with open("SURFACE2", "rb") as infile, open("output.txt", "wb") as outfile:
    reader = csv.reader(infile, delimiter=" ")
    writer = csv.writer(outfile, delimiter=" ")
    for row in reader:
        row[18] = "999"                  

        writer.writerow(row)

I just change delimiter from "\t" to " ". Whiel with previous delimiter, the code only worked upto row[0], with " " the code can work until row[18].

15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000   

From the data line above, row[18] is just in the middle between 15.20000 and 120.60000.

I am not sure what happens in between these two values. Maybe delimiter changes? However visually I can't notice any difference. Is there any way which I can know the delimiter changed and if so, do you have any idea to handle multiple delimiter for one code?

Any idea or help would be really appreciated.

Thank you, Isaac


The results from repr(next(infile)):

'            15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'
'            55.10000            -3.60000 03154      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                 16.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-888888.00000      0     16.00000      0    281.20001      0    279.89999      0      0.00000      0      0.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0\n'
'      1      0      0\n'

As you can see actually four first lines should be one line. For some reason, full line seems divided into 4 parts. Do you have any idea? Thank you, Isaac

16
  • 1
    Can you clarify what it means when you say "the code can now work until row[18]"? Commented Feb 27, 2015 at 3:50
  • 1
    I don't understand your question - what is the problem you are facing? Commented Feb 27, 2015 at 3:50
  • 1
    Ok so maybe there are exactly 19 fields in the row (row[18] being the last one)? Commented Feb 27, 2015 at 3:56
  • 1
    There must be a row that just doesn't have that many columns. In your loop you can say print(len(row)) to see how many columns there are in each row. Commented Feb 27, 2015 at 4:00
  • 1
    Right but the elements in the list row are fields, not characters. Commented Feb 27, 2015 at 4:02

2 Answers 2

2

N.B. The file format is discussed on page 19 of this document. This more-or-less agrees with the sample data.

EDIT

OK, after considering the various comments, additional answers, and reading the original question it would seem that the file in question is not a CSV file. It is weather observation data formatted as "little_r" which uses fixed width fields padded with spaces. There is not much info available so I'm guessing, but each group of 4 lines seem to comprise a single observation. From your previous question it seems that you want to update the 3rd column in the first line? The other 3 lines would be skipped. Then update the 3rd column in the first line of the next set of 4 lines, etc., etc.

An example from the OP:

            15.20000           120.60000 98327      get data information here.  SURFACE DATA FROM ??????????? SOURCE    FM-12 SYNOP                                                                                155.00000         1         0         0         0         0         T         F         F   -888888   -888888      20020601030000 100820.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
  99070.00000      0    155.00000      0    303.20001      0    297.79999      0      3.00000      0    140.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
-777777.00000      0-777777.00000      0      1.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0-888888.00000      0
      1      0      0

The first 2 columns of the first line are (I'm guessing) the latitude and longitude for the observations. I have no idea what the 3rd column 98327 is, but this is the column that the OP wants to update (based on previous question).

It's not a CSV file, so don't process it as one. Instead, because there are fixed width fields, we know the offset and width of the field that needs to be updated. Based on the sample data the 3rd column occupies characters 41-46. So, to update the data and write to a new file:

offset_col_3 = 41
length_col_3 = 5

with open('SURFACE2') as infile, open('output.txt', 'w') as outfile:
    for line_no, line in enumerate(infile):
        if line_no % 4 == 0:    # every 4th line starting with the first
            line = '{}{:>5}{}'.format(line[:offset_col_3], 999, line[offset_col_3+length_col_3:])
        outfile.write(line)

Original answer

Try reading line 20 (row[19]) (assuming no header line in the CSV file, otherwise line 21) from the file and inspecting it in Python:

with open("SURFACE2") as infile:
    for i in range(20):
        print repr(next(infile))

The last line displayed will be row 18. If, for example, tabs are delimiters then you might see \t in between the columns of data. Compare the previous line to the last line to see if there is a difference in the delimiter used.

If you find that your CSV file is mixing delimiters, then you might have to split the fields manually.

Sign up to request clarification or add additional context in comments.

7 Comments

He doesn't seem to be talking about row 18 but rather column 18 in a particular row.
@AndrewMagee. Oh, well if rows and columns are being confused then it's a difficult to know what is being asked.
@Isaac : no problem. This answer is still useful for you to inspect the data.
Well since my post was deleted by the moderator I assume that means we do not need no one else around here. That is what I have been in the USMC for the past 10 years, where we can handle things like that as gentlemen in a secluded location. I do this in good faith but your answer is helpful mine is not. Sorry kid but the Tyranny of the Majority has spoken. I do not live from programming, that is why I can live for it, by pure interest, but that is not helpful around here, got it, enjoy.
I spent near 10 hours editing and adding information to that post, but that annoys people and take some valuable resources that we can spend on your answers. Again thank you assholes, enjoy.
|
1

The csv module is not the right tool to use when you have fixed-width fields in your file. What you need to do is explicitly use the field lengths to split up the lines. For example:

# This would be your whole file
data = "\n".join([
    "abc  def gh i",
    "jk   lm  n  o",
    "p    q   r  s",
])
field_widths = [5, 4, 3, 1]

def fields(line, field_widths):
    pos = 0
    for length in field_widths:
        yield line[pos:pos + length].strip()
        pos += length

for line in data.split("\n"):
    print(list(fields(line, field_widths)))

will give you:

['abc', 'def', 'gh', 'i']
['jk', 'lm', 'n', 'o']
['p', 'q', 'r', 's']

1 Comment

thank you, I will try your solution and let you know the results.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.