1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
|
---
base_model:
- {base_model}
---
# {model_name} GGUF
Recommended way to run this model:
```sh
llama-server -hf {namespace}/{model_name}-GGUF --embeddings
```
Then the endpoint can be accessed at http://localhost:8080/embedding, for
example using `curl`:
```console
curl --request POST \
--url http://localhost:8080/embedding \
--header "Content-Type: application/json" \
--data '{{"input": "Hello embeddings"}}' \
--silent
```
Alternatively, the `llama-embedding` command line tool can be used:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
```
#### embd_normalize
When a model uses pooling, or the pooling method is specified using `--pooling`,
the normalization can be controlled by the `embd_normalize` parameter.
The default value is `2` which means that the embeddings are normalized using
the Euclidean norm (L2). Other options are:
* -1 No normalization
* 0 Max absolute
* 1 Taxicab
* 2 Euclidean/L2
* \>2 P-Norm
This can be passed in the request body to `llama-server`, for example:
```sh
--data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
```
And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF --embd-normalize -1 -p "Hello embeddings"
```
|