# Create a Hugging Face inference endpoint **PUT /_inference/{task_type}/{huggingface_inference_id}** Create an inference endpoint to perform an inference task with the `hugging_face` service. Supported tasks include: `text_embedding`, `completion`, and `chat_completion`. To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use. For Elastic's `text_embedding` task: The selected model must support the `Sentence Embeddings` task. On the new endpoint creation page, select the `Sentence Embeddings` task under the `Advanced Configuration` section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models for `text_embedding` task: * `all-MiniLM-L6-v2` * `all-MiniLM-L12-v2` * `all-mpnet-base-v2` * `e5-base-v2` * `e5-small-v2` * `multilingual-e5-base` * `multilingual-e5-small` For Elastic's `chat_completion` and `completion` tasks: The selected model must support the `Text Generation` task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for `Text Generation`. When creating dedicated endpoint select the `Text Generation` task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use. Recommended models for `chat_completion` and `completion` tasks: * `Mistral-7B-Instruct-v0.2` * `QwQ-32B` * `Phi-3-mini-128k-instruct` For Elastic's `rerank` task: The selected model must support the `sentence-ranking` task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints for `Rerank` so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models for `rerank` task: * `bge-reranker-base` * `jina-reranker-v1-turbo-en-GGUF` ## Required authorization * Cluster privileges: `manage_inference` ## Servers - http://api.example.com: http://api.example.com () ## Authentication methods - Api key auth - Basic auth - Bearer auth ## Parameters ### Path parameters - **task_type** (string) The type of the inference task that the model will perform. - **huggingface_inference_id** (string) The unique identifier of the inference endpoint. ### Query parameters - **timeout** (string) Specifies the amount of time to wait for the inference endpoint to be created. ### Body: application/json (object) - **chunking_settings** (object) The chunking configuration object. - **service** (string) The type of service supported for the specified task type. In this case, `hugging_face`. - **service_settings** (object) Settings used to install the inference model. These settings are specific to the `hugging_face` service. - **task_settings** (object) Settings to configure the inference task. These settings are specific to the task type you specified. ## Responses ### 200 #### Body: application/json (object) - **chunking_settings** (object) The chunking configuration object. Applies only to the `sparse_embedding` and `text_embedding` task types. Not applicable to the `rerank`, `completion`, or `chat_completion` task types. - **service** (string) The service type - **service_settings** (object) Settings specific to the service - **task_settings** (object) Task settings specific to the service and task type - **inference_id** (string) The inference Id - **task_type** (string) The task type [Powered by Bump.sh](https://bump.sh)