I think the good comments of @TobiasFünke and the good answer of @hft already showed the missing point, but I want to give you two examples.
Ex. 1: No degeneracy
Consider the matrices (I'll consider finite dimension Hilbert spaces for simplicity)
$$
A =
\begin{pmatrix}
a_1 & 0 & 0\newline
0 & a_2 & 0\newline
0 & 0 & a_3
\end{pmatrix}
\quad
B =
\begin{pmatrix}
b_1 & 0 & 0\newline
0 & b_2 & 0\newline
0 & 0 & b_3
\end{pmatrix}
$$
Both observables are already diagonal in our initial basis, let's say $\{|0\rangle,|1\rangle,|2\rangle\}$. The probabilities of the results are given by the expressions
$$
p(a_j) = \operatorname{tr}(|\psi\rangle\!\langle \psi| \, |j\rangle \!\langle j |)\quad
p(b_k) = \operatorname{tr}(|\psi\rangle\!\langle \psi| \, |k\rangle \!\langle k |)
$$
where I'm considering the linear projectors related to each measurement result. Since there are no degeneracy and they are simultaneously diagonalized, the same protectors describe the probabilities of $A$ results and $B$ results. Now, if we want to calculate joint probability distributions
$$
p(a_j,b_k) = \operatorname{tr}(|\psi\rangle\!\langle\psi| \, |j\rangle\!\langle j|k\rangle\!\langle k| ) = 0, \quad j\neq k.
$$
So in this case, the probability of results of $A$ and $B$ are zero for results with different labels. But labelling is just an artificial property of nature, and we could in principle give any other label for $A$ results. In this case, the list of one-dimensional projectors is enough to describe the physics, and it's quite nonsense to talk about $A$ and $B$ as being different physical quantities.
This example is very artificial and it is rare to find commuting observables without degeneracy. Usually, degeneracies are a consequence of the symmetry of the problem, which is connected with commutation relations between the system's Hamiltonian and some unitary operator representing the symmetry.
Ex.2: Degenerate case
Consider the case
$$
A =
\begin{pmatrix}
a_1 & 0 & 0\newline
0 & a_2 & 0\newline
0 & 0 & a_3
\end{pmatrix}
\quad
B =
\begin{pmatrix}
b_1 & 0 & 0\newline
0 & b_1 & 0\newline
0 & 0 & b_2
\end{pmatrix}
$$
Now we have degeneracy in $B$ observable in the initial basis. It means that the projector associated to the result $b_2$ is a two-dimensional projector
$$
|b_1\rangle\!\langle b_1| = |0\rangle\!\langle 0| + |1\rangle \!\langle 1|
$$
In this case, $p(a_1,b_1) = p(a_2,b_1)$ and these probabilities are not necessarily zero (depending on the state, of course). But more than that, the system could be in the state
$$
|\psi\rangle = \frac{1}{\sqrt 2} \left ( |0\rangle + |1\rangle\right )
$$
which is an eigenstate of $B$ but not an eigenstate of $A$, for example. This example shows that not all eigenstates of $B$ are eigenstates of $A$ even if they commute.
There are other situations where a commutative property is obtained, for example due to tensor product structure in composite systems. I recommend you to work out specific examples, preferable with clear physical meaning like angular momentum and other stuff. To get intuition through mathematical examples is quite hard, but if you want, try to work out finite dimensional example.
You tried to assume a mathematical condition, get mathematically consistent results and asked the physical plausibility of the results, but I recommend you to assume physically plausible condition, get mathematically consistent results and then ask the physical plausibility of the results.