1

Is there a Pythonic way to refer to columns of 2D lists by name?

I import a lot of tables from the web so I made a general purpose function that creates 2 dimensional lists out of various HTML tables. So far so good. But the next step is often to parse the table row by row.

# Sample table. 
# In real life I would do something like: table = HTML_table('url', 'table id')
table = 
[
    ['Column A', 'Column B', 'Column C'],
    ['One', 'Two', 3],
    ['Four', 'Five', 6]
]

# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
    process_row(row[iA], row[iC])

# Desired code:
for row in table[1:]:
    process_row(row['Column A'], row['Column C'])
0

3 Answers 3

2

I think you'll really like the pandas module! http://pandas.pydata.org/

Put your list into a DataFrame

This could also be done directly from html, csv, etc.

df = pd.DataFrame(table[1:], columns=table[0]).astype(str)

Access columns

df['Column A']

Access first row by index

df.iloc[0]

Process row by row

df.apply(lambda x: '_'.join(x), axis=0)

for index,row in df.iterrows():
    process_row(row['Column A'], row['Column C'])

Process a column

df['Column C'].astype(int).sum()
Sign up to request clarification or add additional context in comments.

5 Comments

Does the pandas module work well with text in tables or is it primarily for numbers? (I edited the question to include text in the table to show representative data).
No, it's ideal for mixed data. Numbers, strings, dates, whatever! Think of it as an in memory database. I'll edit my answer with the pandas version of your question in just a minute
Sounds very cool! What is the pandas equivalent to: for row in table[1:]: process_row(row['Column A'], row['Column C'])?
I added the answer to your question under "process row by row"
Under Process row by row shouldn't it be: process_row(row['Column A'], row['Column C'])?
0

Wouldn't a ordereddict of keys being columns names and values a list of rows be a better approach for your problem? I would go with something like:

table = {
    'Column A': [1, 4],
    'Column B': [2, 5],
    'Column C': [3, 6]
}

# And you would parse column by column...

for col, rows in table.iteritems():
    #do something

1 Comment

I should add that this is example implementation with just a dict, not an ordereddict.
0

My QueryList is simple to use.

ql.filter(portfolio='123')

ql.group_by(['portfolio', 'ticker'])

class QueryList(list):
    """filter and/or group_by a list of objects."""

    def group_by(self, attrs) -> dict:
        """Like a database group_by function.

        args:
            attrs: str or list.

        Returns:
            {value_of_the_group: list_of_matching_objects, ...}
            When attrs is a list, each key is a tuple.
            Ex:
            {'AMZN': QueryList(),
            'MSFT': QueryList(),
            ...
            }
            -- or --
            {('Momentum', 'FB'): QueryList(),
             ...,
            }
        """
        result = defaultdict(QueryList)
        if isinstance(attrs, str):
            for item in self:
                result[getattr(item, attrs)].append(item)
        else:
            for item in self:
                result[tuple(getattr(item, x) for x in attrs)].append(item)

        return result

   def filter(self, **kwargs):
        """Returns the subset of IndexedList that has matching attributes.
        args:
            kwargs: Attribute name/value pairs.

        Example:
            foo.filter(portfolio='123', account='ABC').
        """
        ordered_kwargs = OrderedDict(kwargs)
        match = tuple(ordered_kwargs.values())

        def is_match(item):
            if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
                return True
            else:
                return False

        result = IndexedList([x for x in self if is_match(x)])

        return result

    def scalar(self, default=None, attr=None):
        """Returns the first item in this QueryList.

        args:
            default: The value to return if there is less than one item,
                or if the attr is not found.
            attr: Returns getattr(item, attr) if not None.
        """
        item, = self[0:1] or [default]

        if attr is None:
            result = item
        else:
            result = getattr(item, attr, default)
        return result

I tried pandas. I wanted to like it, I really did. But ultimately it is too complicated for my needs.

For example:

df[df['portfolio'] == '123'] & df['ticker'] == 'MSFT']]

is not as simple as

ql.filter(portfolio='123', ticker='MSFT')

Furthermore, creating a QueryList is simpler than creating a df.

That's because you tend to use custom classes with a QueryList. The data conversion code would naturally be placed into the custom class which keeps that separate from the rest of the logic. But data conversion for a df would normally be done inline with the rest of the code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.