'Machine type temporarily unavailable' when deploying model endpoint

Question

I’m trying to deploy a model on GCP Vertex AI, but I keep encountering the error:

Machine type temporarily unavailable, please deploy with a different machine type or retry.

I've attempted the following machine types (with and without GPU) in the us-central1 region:

n1-standard-16
n1-standard-32
a2-highgpu-1g (1x A100 GPU)

The error takes a long time to appear, and the deployment ultimately fails. I’ve successfully deployed the Gemma and Llama3 models from Model Garden, but I cannot get them to deploy to an endpoint.

Here’s one stack trace I encountered, though most of the time I only receive the "Machine type temporarily unavailable" error:

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/workspace/vllm/vllm/entrypoints/api_server.py", line 458, in asyncio.run(run_server(args)) File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/workspace/vllm/vllm/entrypoints/api_server.py", line 421, in run_server app = await init_app(args, llm_engine) File "/workspace/vllm/vllm/entrypoints/api_server.py", line 402, in init_app if llm_engine is not None else AsyncLLMEngine.from_engine_args( File "/workspace/vllm/vllm/engine/async_llm_engine.py", line 679, in from_engine_args engine_config = engine_args.create_engine_config() File "/workspace/vllm/vllm/engine/arg_utils.py", line 951, in create_engine_config device_config = DeviceConfig(device=self.device) File "/workspace/vllm/vllm/config.py", line 1088, in __init__ raise RuntimeError("Failed to infer device type") RuntimeError: Failed to infer device type

I have the following questions:

Is there any way to check if a machine type is available before deploying an endpoint to avoid long wait times followed by failure?
What machine types should I be using to successfully deploy language generation models on GCP Vertex AI?
Are there any recommended regions for this type of deployment?

SO is a question and answer site. If you have more than one question, you need to create more than one post in which to ask each one. For more information about how the site works, see the help center. — Ken White
– Ken White, Commented Oct 23, 2024 at 23:57

Sourav Dutta · Accepted Answer · 2024-10-25 06:03:04Z

1

Normally Vertex AI automatically allocate a machine for you but you can try a custom job instead, where you can set all those parameters.

You can check out this Configure compute resources for custom training

.set_memory_request('208G').set_cpu_request("32")

To check if a machine type is available before deploying an endpoint Run the following command:

gcloud compute machine-types list --zones=YOUR\_ZONE

Or you can check through this doc what are the Accelerators available in that reginos Region considerations

You can also refer Configure compute resources for prediction and Configure compute resources for custom training

For detailed investigation you can create a public issue tracker describing your issue and vote [+1] and Eng team will look on the issue .

edited Oct 25, 2024 at 6:03

answered Oct 24, 2024 at 21:39

Sourav Dutta

4903 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

'Machine type temporarily unavailable' when deploying model endpoint

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related