0

I am confused with insert counter (collections) into a dataframe:

My dataframe looks like,

doc_cluster_key_freq=pd.DataFrame(index=[], columns=['doc_parent_id','keyword_id','key_count_in_doc_cluster'])

sim_docs_ids=[3342,3783]  

the counters generated in for the sim_docs_ids are given below

id=3342
Counter({133: 9, 79749: 7})

id=3783
Counter({133: 10, 12072: 5, 79749: 1})

The counter is generated in loop for each sim_docs_id

My code looks like:

for doc_ids in sim_docs_ids:
    #generate counter for doc_ids
    #insert the counter into dataframe (doc_cluster_key_freq) here

The output I am looking for is as below:

 doc_cluster_key_freq=
     doc_parent_id       Keyword_id          key_count_in_doc_cluster     
 0     3342                  133                       9
 1     3342                 79749                      7
 2     3783                  133                       10
 3     3783                 12072                      5
 4     3783                 79749                      1

I tried by using counter.keys() and counter.values but I get something like below, I have no idea how to separate them into different rows:

    doc_parent_id       Keyword_id          key_count_in_doc_cluster     
 0      33342          [133, 79749]                [9, 7]
 1      3783        [12072, 133, 79749]          [5, 10, 1]

1 Answer 1

1

If you have the same number of keyword for each doc_id, you may pre-allocate proper row number for each record, and use the code below to ensure one row for each keyword in every doc_id:

keywords = ['key1', 'key2', 'key3', ...]
number_of_keywords = len(keywords)

for i, doc_id in enumerate(sim_doc_ids):
    # Generate keyword Counter (counter) for doc_id
    for j, key in enumerate(keywords):
        doc_cluster_key_freq.loc[i * number_of_keywords + j] = [doc_id, key, counter[key]]

An example:

keywords = ['a', 'b', 'c']
N = len(keywords)
ids = range(5)

for i, idd in enumerate(ids):
    counter = Counter({'a': random.randint(0, 10),
                      'b': random.randint(0, 10),
                      'c': random.randint(0, 10),})
    for j, key in enumerate(keywords):
        a.loc[i*N+j] = [idd, key, counter[key]]

Output:

    id  keyword count
0   0   a   10
1   0   b   9
2   0   c   9
3   1   a   1
4   1   b   10
5   1   c   10
6   2   a   9
7   2   b   0
8   2   c   5
9   3   a   6
10  3   b   0
11  3   c   8
12  4   a   0
13  4   b   3
14  4   c   8
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.