Split a dataframe into two dataframe using first column that have a string values in python

Question

I have two .txt file where I want to separate the data frame into two parts using the first column value. If the value is less than "H1000", we want in a first dataframe and if it is greater or equal to "H1000" we want in a second dataframe.First column starts the value with H followed by a four numbers. I want to ignore H when comparing numbers less than 1000 or greater than 1000 in python.

What I have tried this thing,but it is not working.

ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
print(ht_data)

Here is my code:

    if (".txt" in str(path_txt).lower()) and path_txt.is_file():
        txt_files = [Path(path_txt)]
    else:
        txt_files = list(Path(path_txt).glob("*.txt"))
            
    for fn in txt_files:
        all_dfs = pd.read_csv(fn,sep="\t", header=None) #Reading file
        all_dfs = all_dfs.dropna(axis=1, how='all') #Drop the columns where all columns are NaN
        all_dfs = all_dfs.dropna(axis=0, how='all') #Drop the rows where all columns are NaN
        print(all_dfs)
        
        ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
        print(ht_data)
        
        df_h = all_dfs[0:ht_data]  # Head Data
        df_t = all_dfs[ht_data:]  # Tene Data

Can anyone help me how to achieve this task in python?

Nicolai B. Thomsen · Accepted Answer · 2021-12-19 01:06:06Z

1

Assuming this data

import pandas as pd
data = pd.DataFrame(
    [
        ["H0002",   "Version", "5"], 
        ["H0003",   "Date_generated",  "8-Aug-11"], 
        ["H0004",   "Reporting_period_end_date",   "19-Jun-11"],
        ["H0005",   "State",   "AW"],                                                                                           
        ["H1000",   "Tene_no/Combined_rept_no",    "E75/3794"],                                                                                                                      
        ["H1001",   "Tenem_holder Magnetic Resources", "NL"],
    ],
    columns = ["id", "col1", "col2"]
)

We can create a mask of over and under a pre set threshold, like 1000.

mask = data["id"].str.strip("H").astype(int) < 1000
df_h = data[mask]
df_t = data[~mask]

answered Dec 19, 2021 at 1:06

Nicolai B. Thomsen

9047 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Muhammed Jaseem · Accepted Answer · 2021-12-19 00:38:24Z

0

If you want to compare values of the format val = HXXXX where X is a digit represented as a character, try this:

val = 'H1003'
val_cmp = int(val[1:])
if val_cmp < 1000:
   # First Dataframe
else:
   # Second Dataframe

answered Dec 19, 2021 at 0:38

Muhammed Jaseem

85010 silver badges21 bronze badges

1 Comment

xyz Over a year ago

Can you tell me what is the code according to the data I have provided.@muhammed jaseem. ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]

Collectives™ on Stack Overflow

Split a dataframe into two dataframe using first column that have a string values in python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related