Let's first do some maths.
- You want 40,000 cycles per second.
- Each cycle needs splitting up into samples. For example, let's use 128 samples.
- Assume you want an 8-bit amplitude resolution.
Each second of signal would need 40,000 cycles * 128 samples. That's 5120000 samples per second.
If using PWM to generate the audio that's 256 clock cycles (28) per sample. That means 5120000 * 256 = 1,310,720,000.
That means a "carrier frequency" of over 1.3GHz. There's no way an Arduino can do that.
If instead you used a simple DAC (e.g., an R-2R ladder) connected to the 8 pins of port D (that's pins 0-7 on the Uno) you don't need the 256 clock cycles per sample. That means that you have to output a new sample to PORTD 5,120,000 times per second. That's outputting at over 5MHz.
With some well crafted assembly code that might be possible. The Arduino certainly won't be able to do anything else at all at the same time.
You would be much better off using a chip that has audio capabilities and DMA. Maybe a Teensy 3.x would be a better choice for your project.