1

I have the following data frame:

    window_start    window_end  dataset
29125   1828457 1828868 129C
29126   1891493 1891904 129C
29127   2312557 2312968 129C
29128   3745905 3746316 129C
29129   5036701 5037112 129C
... ... ... ...
49838   185443673   185444084   172C
49840   186261905   186262316   172C
49841   186888969   186889380   172C
49980   187896721   187897132   172C
49987   190067549   190067960   172C
530 rows × 3 columns

I wish to get two results: 1. identify the overlapping regions numerically over all the intervals (e.g [1828450, 1828860], etc); 2. visualize all the intervals with a matplot diagram similar to the one I report below.

enter image description here

I already tried the following code to solve the point 2, but it shows nothing:

x_start_df = AllC_chr1[AllC_chr1.dataset=='129C'].window_start
xstart = x_start_df.to_numpy()
x_end_df   = AllC_chr1[AllC_chr1.dataset=='129C'].window_end
xstart = x_end_df.to_numpy()
y       = AllC_chr1[AllC_chr1.dataset=='129C'].index
pl.figure()
pl.barh(y/1000, width=x_end-x_start, left = x_start)

Any suggestions will be welcome.

Thank you for your support

0

1 Answer 1

1

The main problem is that the width of the vertical bars is extremely small compared to the distance between the bars. That way, you only see the outlines of the bars, not their interior. You can change the default white edge color to something else.

You can use the 'dataset' column for the y-axis, to get them automatically labeled. Bar plots are drawn with "sticky edges" (setting the left margin to zero). If that isn't desired, ax.use_sticky_edges can be turned off.

With matplotlib, it is highly recommended to import matplotlib.pyplot as plt, making the code easier to compare with example code (and for others to understand the code more rapidly). Also, the object-oriented interface helps to easier understand what's going on.

import matplotlib.pyplot as plt
import pandas as pd

AllC_chr1 = pd.DataFrame({
    'window_start': [1828457, 1891493, 2312557, 3745905, 5036701, 185443673, 186261905, 186888969, 187896721,
                     190067549],
    'window_end': [1828868, 1891904, 2312968, 3746316, 5037112, 185444084, 186262316, 186889380, 187897132, 190067960],
    'dataset': ['129C', '129C', '129C', '129C', '129C', '172C', '172C', '172C', '172C', '172C']},
    index=[29125, 29126, 29127, 29128, 29129, 49838, 49840, 49841, 49980, 49987])

df = AllC_chr1
# df = AllC_chr1 [AllC_chr1['dataset']=='129C']

fig, ax = plt.subplots(figsize=(15, 3))
ax.barh(df['dataset'], left=df['window_start'],
        width=df['window_end'] - df['window_start'], edgecolor='blue')
# Disable sticky edges
ax.use_sticky_edges = False
# Set the x-axis tick labels to millions
ax.xaxis.set_major_formatter(lambda x, pos: f"{x / 1000000:g}M")

plt.tight_layout()
plt.show()

horizontal bar plot showing differences

Sign up to request clarification or add additional context in comments.

2 Comments

Dear @JohanC, thank you very much for your suggestion. Please, find here my comments (I used the "Post Answer" to share the picture)
You can create a wider plot by replacing fig, ax = plt.subplots() with e.g. fig, ax = plt.subplots(figsize=(20, 5)). The maximum sizes depend on the environment where you run matplotlib. As your screen (or printout) is limited, it won't really help in viewing bars. In an interactive plot, you could show bar information while hovering. And/or the bars could be color-coded proportional to their width. It depends on what you want to achieve.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.