0

Here I'm making a scatterplot that contains data from 3 columns out of my dataframe. The columns are the different samples and row names are genes (thus the same for each col). I would like my y-axis to be log-transformed. However the standard way to use yscale('log') does not work. Some how it limits the y-axis, without making it logarithmic and data gets lost.

This is before:

ax = mini_df.plot(kind="scatter", x="gene",y="sample_02A", color="b", label="sample 02A")
    mini_df.plot(kind="scatter", x="gene",y="sample_04A", color="g", label="sample 04A", ax=ax)
    mini_df.plot(kind="scatter", x="gene",y="sample_06A", color="r", label="sample 06A", ax=ax)
    mini_df.plot(kind="scatter", x="gene",y="sample_08A", color="purple", label="sample 08A", ax=ax)
    ax.set_xlabel("gene")
    ax.set_ylabel("expression value")
    
    plt.show()

enter image description here

This is after logarithmic transformation

 ax = mini_df.plot(kind="scatter", x="gene",y="sample_02A", color="b", label="sample 02A")
mini_df.plot(kind="scatter", x="gene",y="sample_04A", color="g", label="sample 04A", ax=ax)
mini_df.plot(kind="scatter", x="gene",y="sample_06A", color="r", label="sample 06A", ax=ax)
mini_df.plot(kind="scatter", x="gene",y="sample_08A", color="purple", label="sample 08A", ax=ax)
ax.set_xlabel("gene")
ax.set_ylabel("expression value")
ax.set_yscale('log')

plt.show()

enter image description here

2

1 Answer 1

1

Option 1:

you can just log the y-axis yourself before plotting it. For example: (you can modify all the columns to logarithmic together instead of do it 'column by column' like my example..(

mini_df_copy = mini_df.copy(deep=True)
mini_df_copy.sample_02A = np.log(mini_df_copy.sample_02A)
mini_df_copy.sample_04A= np.log(mini_df_copy.sample_04A)
mini_df_copy.sample_06A= np.log(mini_df_copy.sample_06A)
mini_df_copy.sample_08A= np.log(mini_df_copy.sample_08A)

ax = mini_df_copy.plot(kind="scatter", x="gene",y="sample_02A", color="b", label="sample 02A")
mini_df_copy.plot(kind="scatter", x="gene",y="sample_04A", color="g", label="sample 04A", ax=ax)
mini_df_copy.plot(kind="scatter", x="gene",y="sample_06A", color="r", label="sample 06A", ax=ax)
mini_df_copy.plot(kind="scatter", x="gene",y="sample_08A", color="purple", label="sample 08A", ax=ax)
ax.set_xlabel("gene")
ax.set_ylabel("expression value")

Option 2:

You can just specify the 'ylim' bounds and then no points will disappear. ax.set_ylim((1, "max point value you want to display"))

Option 3:

Use 'symlog' instead of 'log': ax.set_yscale('symlog')

According to the points with value 0: log(0) is undefined so yscale('log') modify it to None, Matplotlib just ignores the NaNs and doesn't visualize them.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.