I am working on the orthogonal decomposition of a signal to compute the best linear approximation of a system. Given an input signal $x[n]$ and system output $y[n]$, the output can be decomposed into two components: $$y[n] = y_o[n] + y_c[n]$$ The correlated component $y_c[n]$ is the part of the output that can be expressed as the output of an equivalent linear system: $$y_c[n] = x[n]*h_{eq}[n]$$ where $h_{eq}[n]$ is the equivalent impulse response of the system.
In matrix form, the problem is $$ \textbf{X} \textbf{h} = \textbf{y}$$
with $$ \textbf{h} = \begin{bmatrix} h_0 \\ h_1 \\ \vdots \\ h_{M-1} \end{bmatrix}, \quad \textbf{y} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_{N-1} \end{bmatrix}, \quad \textbf{X} = \begin{bmatrix} x_0 & x_{-1} & \cdots & x_{-M+1} \\ x_1 & x_0 & \cdots & x_{-M+2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{N-1} & x_{N-2} & \cdots & x_{N-M} \end{bmatrix}. $$
M is the lenght of the estimated impulse response, N is the length of the input and output sequences, and the values of $x[n]$ with negative indices are of course set to zero.
To test my implementation in MATLAB, I started with a purely linear system, defined by a known impulse response $h[n]$. Thus I set $M$ to the length of $h[n]$. As for the input sequence, I am using a normalized PAM4 sequence, generated using MATLAB's randi function.
After solving the system, I compare $h_{eq}$ with $h$ and $y$ with $y_{c}$. The difference between these reaches numerical precision. Same result for the magnitude of $y_o$.
Now, if I add an AWGN source before the decomposition, when looking at the average of $|h_{eq} - h|$ and the average of $|y_o - n|$, where $n$ is the AWGN vector, these magnitudes are no longer at the level of numerical precision. In fact, they are roughly on the order of the inverse of the SNR used in the simulation.
My questions are:
Is this in fact what we would expect? How can I see this analytically?
Since the noise is in theory orthogonal to the sequence, I understand it should not have a projection into the space of past samples. But as opposed to the theory, in MATLAB the actual cross correlation between noise and signal is very small, but not zero. And this explains the difference in results. Is this correct?
Finally, can I reduce this (take it closer to numerical precision) by simulating the same sequence multiple times, with different noise vectors on each iteration, and averaging the results?
Edit : my original orthogonal decomposition code is the following
function [y_correlated,y_orthogonal,h_eq,X] = OD(x,y,K)
% Inputs
% x : input of the nonlinear system x[n]
% y : output of the nonlinear system y[n]
% M : order of the filter
% Outputs
% y_orthogonal : orthogonal component
% y_correlated : correlated component
% c : vector of coefficients
% X : regressors matrix
x = x(:);
y = y(:);
N = length(y);
X = zeros(N,K);
for j = 1:N % row index
for i = 1:K % column index
idx_in_x = j-i + 1;
if (idx_in_x > 0 && idx_in_x<N+1)
X(j,i) = x(idx_in_x);
else
X(j,i) = 0;
end
end
end
A = (X')*X;
B = (X')*y;
h_eq = A\B; % Solution to Ah = B
y_correlated = X*h_eq;
y_orthogonal = y-y_correlated;
And the way I use it is:
N = 200000; % Number of symbols
v_m = [-3 -1 1 3]; % PAM4 levels
d = v_m(randi(4,1,N)); % PAM4 sequence
d = d/max(d); % Normalized
d = d(:); % Column vector
%% Impulse response
h = impulse_response_channel;
h = (1 / sum(abs(h)))*h;
OS = 1;
% Received sequence
y_linear = filter(h,1,d_up); % Linear component
% AWGN
SNR = 25; SNR_linear = 10^(SNR/10);
P_signal = mean(y_linear.^2);
noise_variance = P_signal/SNR_linear;
n = sqrt(noise_variance)*randn(size(y_linear));
y= y_linear + n;
% Orthogonal decomposition
M = length(h);
[y_c, y_o, h_eq,X] = OD(d,y_down,M);
convmtx()in Matlab. $\endgroup$