Reference a table column by its column header in Python

Question

Is there a Pythonic way to refer to columns of 2D lists by name?

I import a lot of tables from the web so I made a general purpose function that creates 2 dimensional lists out of various HTML tables. So far so good. But the next step is often to parse the table row by row.

# Sample table. 
# In real life I would do something like: table = HTML_table('url', 'table id')
table = 
[
    ['Column A', 'Column B', 'Column C'],
    ['One', 'Two', 3],
    ['Four', 'Five', 6]
]

# Current code:
iA = table[0].index('Column A')
iB = tabel[0].index('Column B')
for row in table[1:]:
    process_row(row[iA], row[iC])

# Desired code:
for row in table[1:]:
    process_row(row['Column A'], row['Column C'])

fivetentaylor · Accepted Answer · 2015-07-01 15:54:37Z

2

I think you'll really like the pandas module! http://pandas.pydata.org/

Put your list into a DataFrame

This could also be done directly from html, csv, etc.

df = pd.DataFrame(table[1:], columns=table[0]).astype(str)

Access columns

df['Column A']

Access first row by index

df.iloc[0]

Process row by row

df.apply(lambda x: '_'.join(x), axis=0)

for index,row in df.iterrows():
    process_row(row['Column A'], row['Column C'])

Process a column

df['Column C'].astype(int).sum()

edited Jul 1, 2015 at 15:54

answered Jun 18, 2015 at 4:48

fivetentaylor

1,2978 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ChaimG Over a year ago

Does the pandas module work well with text in tables or is it primarily for numbers? (I edited the question to include text in the table to show representative data).

fivetentaylor Over a year ago

No, it's ideal for mixed data. Numbers, strings, dates, whatever! Think of it as an in memory database. I'll edit my answer with the pandas version of your question in just a minute

ChaimG Over a year ago

Sounds very cool! What is the pandas equivalent to: for row in table[1:]: process_row(row['Column A'], row['Column C'])?

fivetentaylor Over a year ago

I added the answer to your question under "process row by row"

ChaimG Over a year ago

Under Process row by row shouldn't it be: process_row(row['Column A'], row['Column C'])?

Diogo Martins · Accepted Answer · 2015-06-18 04:52:18Z

0

Wouldn't a ordereddict of keys being columns names and values a list of rows be a better approach for your problem? I would go with something like:

table = {
    'Column A': [1, 4],
    'Column B': [2, 5],
    'Column C': [3, 6]
}

# And you would parse column by column...

for col, rows in table.iteritems():
    #do something

answered Jun 18, 2015 at 4:52

Diogo Martins

9377 silver badges15 bronze badges

1 Comment

Diogo Martins Over a year ago

I should add that this is example implementation with just a dict, not an ordereddict.

Community · Accepted Answer · 2020-06-20 09:12:55Z

My QueryList is simple to use.

ql.filter(portfolio='123')

ql.group_by(['portfolio', 'ticker'])

class QueryList(list):
    """filter and/or group_by a list of objects."""

    def group_by(self, attrs) -> dict:
        """Like a database group_by function.

        args:
            attrs: str or list.

        Returns:
            {value_of_the_group: list_of_matching_objects, ...}
            When attrs is a list, each key is a tuple.
            Ex:
            {'AMZN': QueryList(),
            'MSFT': QueryList(),
            ...
            }
            -- or --
            {('Momentum', 'FB'): QueryList(),
             ...,
            }
        """
        result = defaultdict(QueryList)
        if isinstance(attrs, str):
            for item in self:
                result[getattr(item, attrs)].append(item)
        else:
            for item in self:
                result[tuple(getattr(item, x) for x in attrs)].append(item)

        return result

   def filter(self, **kwargs):
        """Returns the subset of IndexedList that has matching attributes.
        args:
            kwargs: Attribute name/value pairs.

        Example:
            foo.filter(portfolio='123', account='ABC').
        """
        ordered_kwargs = OrderedDict(kwargs)
        match = tuple(ordered_kwargs.values())

        def is_match(item):
            if tuple(getattr(item, y) for y in ordered_kwargs.keys()) == match:
                return True
            else:
                return False

        result = IndexedList([x for x in self if is_match(x)])

        return result

    def scalar(self, default=None, attr=None):
        """Returns the first item in this QueryList.

        args:
            default: The value to return if there is less than one item,
                or if the attr is not found.
            attr: Returns getattr(item, attr) if not None.
        """
        item, = self[0:1] or [default]

        if attr is None:
            result = item
        else:
            result = getattr(item, attr, default)
        return result

I tried pandas. I wanted to like it, I really did. But ultimately it is too complicated for my needs.

For example:

df[df['portfolio'] == '123'] & df['ticker'] == 'MSFT']]

is not as simple as

ql.filter(portfolio='123', ticker='MSFT')

Furthermore, creating a QueryList is simpler than creating a df.

That's because you tend to use custom classes with a QueryList. The data conversion code would naturally be placed into the custom class which keeps that separate from the rest of the logic. But data conversion for a df would normally be done inline with the rest of the code.

Collectives™ on Stack Overflow

Reference a table column by its column header in Python

3 Answers 3

Put your list into a DataFrame

Access columns

Access first row by index

Process row by row

Process a column

5 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Put your list into a DataFrame

Access columns

Access first row by index

Process row by row

Process a column

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related