1

Environment:

  • Database: AWS Aurora PostgreSQL
  • ORM: SQLAlchemy
  • API Framework: Python FastAPI

Issue: I'm experiencing significant query performance degradation when my API receives concurrent requests. I ran a performance test comparing single execution vs. concurrent execution of the same query, and the results are concerning.

Real-World Observations: When monitoring our production API endpoint during load tests with 100 concurrent users, I've observed concerning behavior:

When running the same complex query through PGAdmin without concurrent load, it consistently completes in ~60ms However, during periods of high concurrency (100 simultaneous users), response times for this same query become wildly inconsistent:

Some executions still complete in 60-100ms Others suddenly take up to 2 seconds No clear pattern to which queries are slow

Test Results:

Single query execution time: 0.3098 seconds

Simulating 100 concurrent clients - all requests starting simultaneously...

Results Summary:
Total execution time: 32.7863 seconds
Successful queries: 100 out of 100
Failed queries: 0
Average query time: 0.5591 seconds (559ms)
Min time: 0.2756s, Max time: 1.9853s
Queries exceeding 500ms threshold: 21 (21.0%)
50th percentile (median): 0.3114s (311ms)
95th percentile: 1.7712s (1771ms)
99th percentile: 1.9853s (1985ms)

With 100 concurrent threads:

  • Each query takes ~12.4x longer on average (3.62s vs 0.29s)
  • Huge variance between fastest (0.5s) and slowest (4.8s) query
  • Overall throughput is ~17.2 queries/second (better than sequential, but still concerning)

Query Details: The query is moderately complex, involving: Several JOINs across multiple tables, a subquery using EXISTS, ORDER BY and LIMIT clauses.

My Setup

SQLAlchemy Configuration:

engine = create_async_engine(
    settings.ASYNC_DATABASE_URL,
    echo=settings.SQL_DEBUG,
    pool_pre_ping=True,
    pool_use_lifo=True,
    pool_size=20,
    max_overflow=100,
    pool_timeout=30,
    pool_recycle=30,
)

AsyncSessionLocal = async_sessionmaker(
    bind=engine,
    class_=AsyncSession,
    expire_on_commit=False,
    autocommit=False,
    autoflush=False,
)

FastAPI Dependency:

async def get_db() -> AsyncGenerator[AsyncSession, None]:
    """Get database session"""
    async with AsyncSessionLocal() as session:
        try:
            yield session
            await session.commit()
        except Exception:
            await session.rollback()
            raise

Questions:

  • Connection Pool Settings: Are my SQLAlchemy pool settings appropriate for handling 100 concurrent requests? What would be optimal?
  • Aurora Configuration: What Aurora PostgreSQL parameters should I tune to improve concurrent query performance?
  • Query Optimization: Is there a standard approach to optimize complex queries with JOINs and EXISTS subqueries for better concurrency?
  • ORM vs Raw SQL: Would bypassing SQLAlchemy ORM help performance?

Any guidance or best practices would be greatly appreciated. I'd be happy to provide additional details if needed.

Update:

Hardware Configuration

  1. Aurora regional cluster with 1 instance
  2. Capacity Type: Provisioned (Min: 0.5 ACUs (1GiB), Max: 16 ACUs (32 GiB))
  3. Storage Config: Standard

Performance Insights

  1. Max ACU utilization: 70%
  2. Max CPU Utilization: 45%
  3. Max DB connection: 111
  4. EBS IO Balance: 100%
  5. Buffer Cache Hit Ratio: 100%
4
  • Have you called AWS support? What is the hardware configuration in use? What does Performance Insights show - where is the resource usage and waits? With 100 concurrent queries - how many CPUs (back to the hardware configuration)? Are explain's done on the queries to check how much IO and buffer in use? Does that align with the hardware? Are the queries spilling over work_mem and there is thrashing on temp IO? A lot more research can be done to narrow the issue. Commented May 21 at 12:35
  • @Craig Thanks for your response. 1. No, I haven't called AWS support yet, since I wanted to rule out problems on my end first. 2. I have updated the original post with what you asked. 3. Yes I have done explain analyze on the query and there is no evidence of spill. 4. I am not so sure how to check the work_mem spilling and temp IO thrashing, but I checked temp files number is 20663 and temp_bytes is 100 MB while work_mem is 4 MB. Is that the problem? Commented May 21 at 15:19
  • I cannot answer the question. My questions now are to help you debug. If you are looking into performance, start with moving off of serverless. Hard to find non-uniform performance issues when the hardware and buffer sizes and IO is changing.If there are temp files created then there is spilling of the queries. Check the explains for where temp IO is used. Removing all temp IO is not always right since there might be less buffer space, but it is a debugging step. Compare explains from single multi-user. These are pointers to help you. Call AWS who can watch and look at the system. Commented May 23 at 12:13
  • I looked up ACU's. I do not use serverless since I want predictable, explainable, and repeatable performance. My guess is an 16 ACU will give you 4 vCPUs but you can verify. If you are running 100 concurrent queries, then most are waiting for an available vCPU. So every query will take longer. This is not if you have 100 concurrent connections but only 2 concurrent queries. Try it on a dedicated instance with enough vCPUs to handle the concurrent load. Also look in Performance Insights for the longest waits. I find that PI plot is useful for initial debugging direction. Commented May 24 at 12:13

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.