File: modelcard.template

package info (click to toggle)

llama.cpp 6641%2Bdfsg-2

links: PTS, VCS
area: main
in suites: sid
size: 43,824 kB
sloc: cpp: 218,020; ansic: 117,624; python: 29,020; lisp: 9,094; sh: 5,776; objc: 1,045; javascript: 828; xml: 259; makefile: 219

file content (48 lines) | stat: -rw-r--r-- 1,330 bytes

---
base_model:
- {base_model}
---
# {model_name} GGUF

Recommended way to run this model:

```sh
llama-server -hf {namespace}/{model_name}-GGUF --embeddings
```

Then the endpoint can be accessed at http://localhost:8080/embedding, for
example using `curl`:
```console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{{"input": "Hello embeddings"}}' \
    --silent
```

Alternatively, the `llama-embedding` command line tool can be used:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF --verbose-prompt -p "Hello embeddings"
```

#### embd_normalize
When a model uses pooling, or the pooling method is specified using `--pooling`,
the normalization can be controlled by the `embd_normalize` parameter.

The default value is `2` which means that the embeddings are normalized using
the Euclidean norm (L2). Other options are:
* -1 No normalization
*  0 Max absolute
*  1 Taxicab
*  2 Euclidean/L2
* \>2 P-Norm

This can be passed in the request body to `llama-server`, for example:
```sh
    --data '{{"input": "Hello embeddings", "embd_normalize": -1}}' \
```

And for `llama-embedding`, by passing `--embd-normalize <value>`, for example:
```sh
llama-embedding -hf {namespace}/{model_name}-GGUF  --embd-normalize -1 -p "Hello embeddings"
```