I am building a python service to compute Greek values using python, but I have never worked on python, and I would appreciate advice from the experienced python devs as it would help me a lot.
FLOW
The service will later integrate with an existing Spring Boot backend.
Currently, I upload an Excel/CSV file (up to 700k rows) from a UI, which contains option data for which I need to calculate Greeks.
I’m using:
FastAPI → async API server (for streaming response)
Pandas → data manipulation, reading Excel/CSV
NumPy → vectorized math
SciPy → Black-Scholes & Greeks computations
orjson → fast JSON serialization
ProcessPoolExecutor → for parallel chunk-based processing
File reading (main process) – pandas for CSV (C engine), openpyxl for Excel
Split into chunks – about 20,000 rows per chunk
Parallel computation (ProcessPoolExecutor)
Vectorized Black-Scholes calculations using NumPy
Error checks (NaN, negatives, type mismatches)
Convert results to dict and calculate aggregates
Merge results – combine all chunk outputs and totals
Serialize & stream – use orjson and StreamingResponse
Below is my performance chart, response time for 700k records through excel is 9-11 secs right now
### 700k Rows
| Configuration | Read File | Calculate | Build Results | JSON | Total |
|---------------------|-----------|-----------|---------------|------|--------|
| **Single Process** | 1-2s | 5-6s | 8-10s | 3-4s | 17-22s |
| **4 Workers** | 1-2s | 3-4s* | 3-4s* | 3-4s | 10-14s |
| **8 Workers** | 1-2s | 2-3s* | 2-3s* | 3-4s | 8-12s |
*Parallel processing time (multiple chunks at once)
### 60k Rows
| Configuration | Total Time | Notes |
|---------------------|------------|------------------------------------|
| **Single Process** | 2-3s | No overhead, pure speed |
| **4 Workers** | 3-4s | ⚠️ Overhead > benefit |
| **8 Workers** | 4-5s | ⚠️ Too much overhead
Questions (Sorry if it sounds stupid but I want to build production-based applications and learn best practises)
Is it ideal to use workers in this api as they take decent amount of memory and might affect the server, do people use it in production and what things to keep in mind?
Is my tech stack (FastAPI + Pandas + NumPy + SciPy + orjson) appropriate for this type of workload, or should I consider something else (e.g., Polars, Cython, or PyPy)?
Apart from JSON serialization overhead, are there other bottlenecks I should be aware of (e.g., inter-process communication, GIL, or I/O blocking)?.
Any help would be appreciated