I’m trying to deploy a model on GCP Vertex AI, but I keep encountering the error:
Machine type temporarily unavailable, please deploy with a different machine type or retry.
I've attempted the following machine types (with and without GPU) in the us-central1 region:
n1-standard-16n1-standard-32a2-highgpu-1g(1x A100 GPU)
The error takes a long time to appear, and the deployment ultimately fails. I’ve successfully deployed the Gemma and Llama3 models from Model Garden, but I cannot get them to deploy to an endpoint.
Here’s one stack trace I encountered, though most of the time I only receive the "Machine type temporarily unavailable" error:
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/workspace/vllm/vllm/entrypoints/api_server.py", line 458, in asyncio.run(run_server(args)) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/workspace/vllm/vllm/entrypoints/api_server.py", line 421, in run_server app = await init_app(args, llm_engine) File "/workspace/vllm/vllm/entrypoints/api_server.py", line 402, in init_app if llm_engine is not None else AsyncLLMEngine.from_engine_args( File "/workspace/vllm/vllm/engine/async_llm_engine.py", line 679, in from_engine_args engine_config = engine_args.create_engine_config() File "/workspace/vllm/vllm/engine/arg_utils.py", line 951, in create_engine_config device_config = DeviceConfig(device=self.device) File "/workspace/vllm/vllm/config.py", line 1088, in __init__ raise RuntimeError("Failed to infer device type") RuntimeError: Failed to infer device type
I have the following questions:
- Is there any way to check if a machine type is available before deploying an endpoint to avoid long wait times followed by failure?
- What machine types should I be using to successfully deploy language generation models on GCP Vertex AI?
- Are there any recommended regions for this type of deployment?