# Create a Hugging Face inference endpoint

**PUT /_inference/{task_type}/{huggingface_inference_id}**

Create an inference endpoint to perform an inference task with the `hugging_face` service.
Supported tasks include: `text_embedding`, `completion`, and `chat_completion`.

To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint.
Select a model that supports the task you intend to use.

For Elastic's `text_embedding` task:
The selected model must support the `Sentence Embeddings` task. On the new endpoint creation page, select the `Sentence Embeddings` task under the `Advanced Configuration` section.
After the endpoint has initialized, copy the generated endpoint URL.
Recommended models for `text_embedding` task:

* `all-MiniLM-L6-v2`
* `all-MiniLM-L12-v2`
* `all-mpnet-base-v2`
* `e5-base-v2`
* `e5-small-v2`
* `multilingual-e5-base`
* `multilingual-e5-small`

For Elastic's `chat_completion` and `completion` tasks:
The selected model must support the `Text Generation` task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for `Text Generation`. When creating dedicated endpoint select the `Text Generation` task.
After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes `/v1/chat/completions` part in URL. Then, copy the full endpoint URL for use.
Recommended models for `chat_completion` and `completion` tasks:

* `Mistral-7B-Instruct-v0.2`
* `QwQ-32B`
* `Phi-3-mini-128k-instruct`

For Elastic's `rerank` task:
The selected model must support the `sentence-ranking` task and expose OpenAI API.
HuggingFace supports only dedicated (not serverless) endpoints for `Rerank` so far.
After the endpoint is initialized, copy the full endpoint URL for use.
Tested models for `rerank` task:

* `bge-reranker-base`
* `jina-reranker-v1-turbo-en-GGUF`

## Required authorization

* Cluster privileges: `manage_inference`


## Servers
- http://api.example.com: http://api.example.com ()


## Authentication methods
- Api key auth
- Basic auth
- Bearer auth


## Parameters

### Path parameters
- **task_type** (string)
  The type of the inference task that the model will perform.
- **huggingface_inference_id** (string)
  The unique identifier of the inference endpoint.

### Query parameters
- **timeout** (string)
  Specifies the amount of time to wait for the inference endpoint to be created.

### Body: application/json (object)

- **chunking_settings** (object)
  The chunking configuration object.
- **service** (string)
  The type of service supported for the specified task type. In this case, `hugging_face`.
- **service_settings** (object)
  Settings used to install the inference model. These settings are specific to the `hugging_face` service.
- **task_settings** (object)
  Settings to configure the inference task.
  These settings are specific to the task type you specified.


## Responses
### 200


#### Body: application/json (object)
- **chunking_settings** (object)
  The chunking configuration object.
  Applies only to the `sparse_embedding` and `text_embedding` task types.
  Not applicable to the `rerank`, `completion`, or `chat_completion` task types.
- **service** (string)
  The service type
- **service_settings** (object)
  Settings specific to the service
- **task_settings** (object)
  Task settings specific to the service and task type
- **inference_id** (string)
  The inference Id
- **task_type** (string)
  The task type


[Powered by Bump.sh](https://bump.sh)