0

I have two .txt file where I want to separate the data frame into two parts using the first column value. If the value is less than "H1000", we want in a first dataframe and if it is greater or equal to "H1000" we want in a second dataframe.First column starts the value with H followed by a four numbers. I want to ignore H when comparing numbers less than 1000 or greater than 1000 in python.

What I have tried this thing,but it is not working.

ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
print(ht_data)

Here is my code:

    if (".txt" in str(path_txt).lower()) and path_txt.is_file():
        txt_files = [Path(path_txt)]
    else:
        txt_files = list(Path(path_txt).glob("*.txt"))
            
    for fn in txt_files:
        all_dfs = pd.read_csv(fn,sep="\t", header=None) #Reading file
        all_dfs = all_dfs.dropna(axis=1, how='all') #Drop the columns where all columns are NaN
        all_dfs = all_dfs.dropna(axis=0, how='all') #Drop the rows where all columns are NaN
        print(all_dfs)
        
        ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]
        print(ht_data)
        
        df_h = all_dfs[0:ht_data]  # Head Data
        df_t = all_dfs[ht_data:]  # Tene Data

Can anyone help me how to achieve this task in python?

2 Answers 2

1

Assuming this data

import pandas as pd
data = pd.DataFrame(
    [
        ["H0002",   "Version", "5"], 
        ["H0003",   "Date_generated",  "8-Aug-11"], 
        ["H0004",   "Reporting_period_end_date",   "19-Jun-11"],
        ["H0005",   "State",   "AW"],                                                                                           
        ["H1000",   "Tene_no/Combined_rept_no",    "E75/3794"],                                                                                                                      
        ["H1001",   "Tenem_holder Magnetic Resources", "NL"],
    ],
    columns = ["id", "col1", "col2"]
)

We can create a mask of over and under a pre set threshold, like 1000.

mask = data["id"].str.strip("H").astype(int) < 1000
df_h = data[mask]
df_t = data[~mask]
Sign up to request clarification or add additional context in comments.

Comments

0

If you want to compare values of the format val = HXXXX where X is a digit represented as a character, try this:

val = 'H1003'
val_cmp = int(val[1:])
if val_cmp < 1000:
   # First Dataframe
else:
   # Second Dataframe

1 Comment

Can you tell me what is the code according to the data I have provided.@muhammed jaseem. ht_data = all_dfs.index[all_dfs.iloc[:, 0] == "H1000"][0]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.