2

Assume that I have a pandas dataframe called df similar to:

source      tables
src1        table1       
src1        table2          
src1        table3       
src2        table1        
src2        table2 

I'm currently able to output a JSON file that iterates through the various sources, creating an object for each, with the code below:

all_data = [] 

    for src in df['source']:
        source_data = {
            src: {
            }
        }
        all_data.append(source_data)

    with open('data.json', 'w') as f:
        json.dump(all_data, f, indent = 2)

This yields the following output:

[
  {
    "src1": {}
  },
  {
    "src2": {}
  }
]

Essentially, what I want to do is also iterate through those list of sources and add the table objects corresponding to each source respectively. My desired output would look similar to as follows:

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

Any assistance on how I can modify my code to also iterate through the tables column and append that to the respective source values would be greatly appreciated. Thanks in advance.

1 Answer 1

1

Is this what you're looking for?

data = [
    {k: v} 
    for k, v in df.groupby('source')['tables'].agg(
        lambda x: {v: {} for v in x}).items()
]

with open('data.json', 'w') as f:
    json.dump(data, f, indent=2)  

There are two layers to the answer here. To group the tables by source, use groupby first with an inner comprehension. You can use a list comprehension to assemble your data in this specific format overall.

[
  {
    "src1": {
      "table1": {},
      "table2": {},
      "table3": {}
    }
  },
  {
    "src2": {
      "table1": {},
      "table2": {}
    }
  }
]

Example using .apply with arbitrary data

df['tables2'] = 'abc'

def func(g): 
    return {x: y for x, y in zip(g['tables'], g['tables2'])}

data = [{k: v} for k, v in df.groupby('source').apply(func).items()]
data
# [{'src1': {'table1': 'abc', 'table2': 'abc', 'table3': 'abc'}},
#  {'src2': {'table1': 'abc', 'table2': 'abc'}}]

Note that this will not work with pandas 1.0 (probably because of a bug)

Sign up to request clarification or add additional context in comments.

6 Comments

Yes, this works perfectly, thank you! Assuming I'd need to take this a step further and also add a list of columns within each respective table (similar to how the list of tables was added to the respective source), how would I be able to do this though?
@weovibewvoibweoivwoiv Change agg to apply in the groupby condition, and then you can do arbitrary stuff with your data there, similar to how I've shown you.
not quite sure I see how exactly this works yet. If you don't mind, can you append this extra step to your original answer? Would be really helpful, thanks
@weovibewvoibweoivwoiv There is a bug in pandas 1.0 that prevents such expressions, what is your version? I've added an example. Hope it helps you.
I don't have pandas 1.0 so the code runs fine. However, the output isn't exactly the same as what I'm asking for. Looking for something more like [{'src1': {'table1': {'col1':{}, 'col2':{}}, 'table2': {'col1':{}, 'col2':{}, 'col3':{}}}] Essentially the same as before with just the srcs and tables but now with another column layer as well. Can we take this to private messages?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.