1

I have csv file which has multiple lines of numeric string values of following format:

csv sample of 2 lines:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]

[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]

How can I split as well as join the csv file into lists, such that out put is:

list[0] = ['ASA00211063', '2005'], ['AFR02516075', '1998']...
list[1] = [-0.434358, -0.793407, -1.070576, nan, nan,..., 0.354615, -0.108102,nan,...(**730** values)]
list[2] = [-0.434358, -0.7934039, -1.0705767, nan, nan,..., 0.3546153, -0.1081022, nan,...(**730** values)]
2
  • does the csv contain the [[ and ]] symbols? Commented Jun 7, 2015 at 21:53
  • Yes, it does have the [[ and ]] symbols and gets treated as strings Commented Jun 7, 2015 at 21:57

2 Answers 2

1

To read a pythonic structure from a text file always use ast.literal_eval() it will only read in python structures and prevents anyone from embedding anything nasty in an input file.

This code will go through each line in your input file and append it to a list from which you can decide what to do.

import ast

l = []
for line in open('inputfile.txt'):
    edited_line = line.replace('nan','"nan"')
    l.append(ast.literal_eval(edited_line))

This will also replace all nan with numpy.nan objects:

import ast
from numpy import nan

l = []
for line in open('inputfile.txt'):
    edited_line = line.replace('nan','"nan"')
    edited_line = ast.literal_eval(edited_line)
    edited_line =  [[nan if v == 'nan' else v for v in vals] for vals in edited_line]
    l.append(edited_line)

# combine elements [1] and [2] in the sublist to a list of len = 730
# element l[0] is list of ['code', 'yyyy']
# element l[1 ... n] is list of data by row of length 730
l = [[subl[0] for subl in l]] + [subl[1]+subl[2] for subl in l]

gives output:

for row in l: print row
>>> [['ASA00211063', '2005'], ['AFR02516075', '1998']]
    [-0.434358, -0.793407, -1.070576, nan, nan, 0.354615, -0.108102, nan]
    [-0.434358, -0.7934039, -1.0705767, nan, nan, 0.3546153, -0.1081022, nan]
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for your guidence. I am getting following TypeError. any further guidence. TypeError: literal_eval() takes exactly 1 argument (0 given)
Thanks again. Yes, I get a "ValueError: malformed string" and I was suspecting it could be nan in my strings
@ASG hope that's solved your problem... I've also added an adjustment to convert all nan to numpy objects
Thanks so much @alexmcf, I will try that code snippet now
Quick question. I see that len(l[0][1]) is 365 and len(l[0][2]) is 365. If I were to combine these two one list. what is the most efficient way to do that?
|
0

I think I satisfied your requirements with this code:

#!/usr/bin/python

import re

data = [[]]

for line in open('in'):
    line = line.strip()
    line = re.match(r'\[?(.*)\]', line).group(1)

    res = re.split(r', (?=\[)', line)

    data[0].append(res[0])
    string = res[1] + res[2]
    data.append([string])

for i, v in enumerate(data):
    print("{}\n".format(data[i]))

Input:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]
[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]
[['XXX02516075', '1998'], [-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)], [0.7546153, -0.7081022, nan,...(365 values)]]

Output:

data[0]:
["['ASA00211063', '2005']", "['AFR02516075', '1998']", "['XXX02516075', '1998']"]

data[1]:
['[-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)][0.354615, -0.108102,nan,...(365 values)]']

data[2]:
['[-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)][0.3546153, -0.1081022, nan,...(365 values)]']

data[3]:
['[-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)][0.7546153, -0.7081022, nan,...(365 values)]']

1 Comment

Thanks @Stevieb. I will try this code snippet as well to learn how to use regex better as I was struggling with these. Much thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.