What kinds of things might be going on here?
There could be all kinds of things going on here. Without details of how the study was conducted, and what the research question is, all we can do is offer a few ideas.
You are right to point our the small sample size. With only 10 observations, and. apparently, 8 variables (including the intercept) this model is likely to be over-fitted. In such a case you would expect to find a large $R^2$, which indeed is the case.
For the second model it seems that the concern is that $R^2$ is high. Well, "high" in relation to what ? 0.44 is not particularly high. Moreover there is no reason why you can't get "high" $R^2$ values when only 1 variable in the model is "significant". Here is a simple simulation in R and Python to demonstrate:
This is a simple simulation, where we use a 7 x 7 correlation matrix to create 7 variables, as per the OP, one of which will be used to obtain a response vector - so that the regression results should show a "significant" p-value for that one variable, and "non-significant" ones for the 6 others: First with R:
library(MASS) # needed for the mvrnorm function
N <- 43
# Correlation matrix for the covariates. Not strictly necessary, but more realistic than having them independent:
sigma <- matrix(c(1, 0.2, 0.3, 0.1, 0.4, 0.2, 0.3,
0.2, 1, 0.5, 0.2, 0.3, 0.1, 0.2,
0.3, 0.5, 1, 0.3, 0.4, 0.2, 0.1,
0.1, 0.2, 0.3, 1, 0.3, 0.5, 0.2,
0.4, 0.3, 0.4, 0.3, 1, 0.2, 0.1,
0.2, 0.1, 0.2, 0.5, 0.2, 1, 0.4,
0.3, 0.2, 0.1, 0.2, 0.1, 0.4, 1),
nrow = 7, ncol = 7)
# Means
mu <- rep(0, 7)
# Now generate the dataset
set.seed(15)
data <- as.data.frame(mvrnorm(n = N, mu = mu, Sigma = sigma))
data$Y =10 + # intercept
data$V1 + # beta = 1
rnorm(N, 0, 1)
lm(Y ~ ., data = data) |> summary()
which yields:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.08403 0.17363 58.078 < 2e-16 ***
V1 0.99665 0.16281 6.122 5.34e-07 ***
V2 0.08413 0.18381 0.458 0.650
V3 -0.20236 0.18788 -1.077 0.289
V4 0.17449 0.19707 0.885 0.382
V5 0.23005 0.16789 1.370 0.179
V6 -0.23291 0.18362 -1.268 0.213
V7 -0.19477 0.19615 -0.993 0.328
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9352 on 35 degrees of freedom
Multiple R-squared: 0.606, Adjusted R-squared: 0.5272
F-statistic: 7.691 on 7 and 35 DF, p-value: 1.271e-05
As you can see, here we have a similar adjusted $R^2$ of 0.53 and a "significant" coefficient for V1 , as per the 2nd model in the question
For those who prefer Python, here is some equivalent code:
import numpy as np
import pandas as pd
from scipy.stats import norm
from sklearn.linear_model import LinearRegression
import statsmodels.api as sm
np.random.seed(15)
N = 43
# Correlation matrix for the covariates. Not strictly necessary, but moer realistic than having them independent
sigma = np.array([
[1, 0.2, 0.3, 0.1, 0.4, 0.2, 0.3],
[0.2, 1, 0.5, 0.2, 0.3, 0.1, 0.2],
[0.3, 0.5, 1, 0.3, 0.4, 0.2, 0.1],
[0.1, 0.2, 0.3, 1, 0.3, 0.5, 0.2],
[0.4, 0.3, 0.4, 0.3, 1, 0.2, 0.1],
[0.2, 0.1, 0.2, 0.5, 0.2, 1, 0.4],
[0.3, 0.2, 0.1, 0.2, 0.1, 0.4, 1]
])
# Now generate the dataset
data = np.random.multivariate_normal(mean=np.zeros(7), cov=sigma, size=N)
df = pd.DataFrame(data, columns=[f'V{i+1}' for i in range(7)])
# Compute the response variable Y
df['Y'] = 10 + df['V1'] + np.random.normal(0, 1, N)
# Fit the linear model
X = sm.add_constant(df.drop(columns=['Y'])) # Add intercept
model = sm.OLS(df['Y'], X).fit()
# Retrieve adjusted R2, p-values, and coefficients
adjusted_r_squared = model.rsquared_adj
p_values = model.pvalues
coefficients = model.params
# Combine coefficients and p-values into a single DataFrame for display
results_df = pd.DataFrame({
'Coefficient': coefficients,
'P-value': p_values
})
# Display results
print("Adjusted R²:", adjusted_r_squared)
print("\nCoefficients and P-values:\n", results_df)
which gives us:
Adjusted R²: 0.47987115035473205
Coefficients and P-values:
Coefficient P-value
const 9.852533 2.418140e-36
V1 1.093086 4.513510e-07
V2 -0.091330 6.783820e-01
V3 -0.031855 8.752859e-01
V4 0.081758 6.878198e-01
V5 0.045772 8.283457e-01
V6 -0.012240 9.509285e-01
V7 -0.206075 2.707206e-01
which is quite similar to what we obtained in R - it's not identical because of random variation.
Other things that could be going on include:
- unmeasured confounding
- multicollinearity
- mediation
- over-adjustment
to name a few.
R^2for the second model is not that high imho... it only "explains" (or better covers) ~50% of the variance, right ? $\endgroup$