# Vector Similarity
**Vectors** (also called "Embeddings"), represent an AI model's impression (or understanding) of a piece of unstructured data like text, images, audio, videos, etc. Vector Similarity Search (VSS) is the process of finding vectors in the vector database that are similar to a given query vector. Popular VSS uses include recommendation systems, image and video search, document retrieval, and question answering.

## Index Creation
Before doing vector search, first define the schema and create an index.

In [1]:
import redis
from redis.commands.search.field import TagField, VectorField
from redis.commands.search.indexDefinition import IndexDefinition, IndexType
from redis.commands.search.query import Query

r = redis.Redis(host="localhost", port=6379)

INDEX_NAME = "index"                              # Vector Index Name
DOC_PREFIX = "doc:"                               # RediSearch Key Prefix for the Index

def create_index(vector_dimensions: int):
    try:
        # check to see if index exists
        r.ft(INDEX_NAME).info()
        print("Index already exists!")
    except:
        # schema
        schema = (
            TagField("tag"),                       # Tag Field Name
            VectorField("vector",                  # Vector Field Name
                "FLAT", {                          # Vector Index Type: FLAT or HNSW
                    "TYPE": "FLOAT32",             # FLOAT32 or FLOAT64
                    "DIM": vector_dimensions,      # Number of Vector Dimensions
                    "DISTANCE_METRIC": "COSINE",   # Vector Search Distance Metric
                }
            ),
        )

        # index Definition
        definition = IndexDefinition(prefix=[DOC_PREFIX], index_type=IndexType.HASH)

        # create Index
        r.ft(INDEX_NAME).create_index(fields=schema, definition=definition)

We'll start by working with vectors that have 1536 dimensions.

In [2]:
# define vector dimensions
VECTOR_DIMENSIONS = 1536

# create the index
create_index(vector_dimensions=VECTOR_DIMENSIONS)

## Adding Vectors to Redis

Next, we add vectors (dummy data) to Redis using `hset`. The search index listens to keyspace notifications and will include any written HASH objects prefixed by `DOC_PREFIX`.

In [None]:
%pip install numpy

In [3]:
import numpy as np

In [4]:
# instantiate a redis pipeline
pipe = r.pipeline()

# define some dummy data
objects = [
    {"name": "a", "tag": "foo"},
    {"name": "b", "tag": "foo"},
    {"name": "c", "tag": "bar"},
]

# write data
for obj in objects:
    # define key
    key = f"doc:{obj['name']}"
    # create a random "dummy" vector
    obj["vector"] = np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
    # HSET
    pipe.hset(key, mapping=obj)

res = pipe.execute()

## Searching
You can use VSS queries with the `.ft(...).search(...)` query command. To use a VSS query, you must specify the option `.dialect(2)`.

There are two supported types of vector queries in Redis: `KNN` and `Range`. `Hybrid` queries can work in both settings and combine elements of traditional search and VSS.

### KNN Queries
KNN queries are for finding the topK most similar vectors given a query vector.

In [5]:
query = (
    Query("*=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "score")
     .return_field("vector", decode_field=False) # return the vector field as bytes
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:c', 'payload': None, 'score': '0.251625061035', 'vector': b'\xf8\x1ed?\xbf\t\x90<\xfd\x9b\x10?\xe3\x1b\xed>\xbc\xea%=lp\x1a>\x11hC?\x84 :?\x8d\x7f\xe4>\xfd\xff\x94>n\x9c4?\x0e\x9fy?\xd6\x8a\x97<\xf6\x0b"?Kg\x99>\xc4\xde0>\xa5\r\xb9>\xb0R(>\xd3\x1d\xcd>?\xab\xbb>\x9cx\x0c?\xd7\xa3\x9e>\xad\xee\xf4<\x0c\x93\xf6>aW\x0b?\xd8F0<0\x9e(?\xc5Pn>\x03\xf4\xb0>B\xaay?\xa9~\x7f?Gh\x18>\x15\x8e\xf1>]\xc8\xea>x\xc5\x9c>\xa1\xeb>?\xbb\n-?aDZ?\x92\x9bL<4\xa4\r?\x1d\xe1\xcd>cO\xa3>\'\xed<?\x8a\x15\xf5>vPk?\xa7\xdch?\x02\x14\x8a>\xb6:\x07;O\x139?\x8d$5?^e5?\x06\x10\x89>\x88+\xd2>\xea\xb7\xa4>\xf9\x0e-?\x9c\xbf\xb5>\x81\x8e\x03?\x00\xc43?/\xdb\xfb>\xe8\'e>\xbe\xaa9?\xf2\xe88?\x1b\xa8\x03>\x91\x9fO?%\xb2;?\xb7}w?\xd0/\x08?\x1aD\x1c?\xf9E??\x9bB.?\x96)\x19?\x10a\xda>+\xbfV>\x83\xbd}>\x0bTz>\x82Mz?\xf0EY?:\x99\x19?"\x1ep?\xafX\xcb>*\xa0\x0c>X\xf5\xb9>\x8d\t8?Q\xba\xf4>\x1e\x97x>\xc0Q@?\xd2\x1a\xa6>M\xed\xcf>\x15\x90$>\xb7\x99[?o\x84e?\x8a2P?\x8c\x92^?\'\xe3\xd9>@\x83(?E\x91V?\xad\x1b\x

### Range Queries
Range queries provide a way to filter results by the distance between a vector field in Redis and a query vector based on some pre-defined threshold (radius).

In [6]:
query = (
    Query("@vector:[VECTOR_RANGE $radius $vec]=>{$YIELD_DISTANCE_AS: score}")
     .sort_by("score")
     .return_fields("id", "score")
     .paging(0, 3)
     .dialect(2)
)

# Find all vectors within 0.8 of the query vector
query_params = {
    "radius": 0.8,
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:a', 'payload': None, 'score': '0.243115246296'},
 Document {'id': 'doc:c', 'payload': None, 'score': '0.24981123209'},
 Document {'id': 'doc:b', 'payload': None, 'score': '0.251443207264'}]

See additional Range Query examples in [this Jupyter notebook](https://github.com/RediSearch/RediSearch/blob/master/docs/docs/vecsim-range_queries_examples.ipynb).

### Hybrid Queries
Hybrid queries contain both traditional filters (numeric, tags, text) and VSS in one single Redis command.

In [7]:
query = (
    Query("(@tag:{ foo })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("id", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {
    "vec": np.random.rand(VECTOR_DIMENSIONS).astype(np.float32).tobytes()
}
r.ft(INDEX_NAME).search(query, query_params).docs

[Document {'id': 'doc:b', 'payload': None, 'score': '0.24422544241', 'tag': 'foo'},
 Document {'id': 'doc:a', 'payload': None, 'score': '0.259926855564', 'tag': 'foo'}]

See additional Hybrid Query examples in [this Jupyter notebook](https://github.com/RediSearch/RediSearch/blob/master/docs/docs/vecsim-hybrid_queries_examples.ipynb).

## Vector Creation and Storage Examples
The above examples use dummy data as vectors. However, in reality, most use cases leverage production-grade AI models for creating embeddings. Below we will take some sample text data, pass it to the OpenAI and Cohere API's respectively, and then write them to Redis.

In [8]:
texts = [
    "Today is a really great day!",
    "The dog next door barks really loudly.",
    "My cat escaped and got out before I could close the door.",
    "It's supposed to rain and thunder tomorrow."
]

### OpenAI Embeddings
Before working with OpenAI Embeddings, we clean up our existing search index and create a new one.

In [9]:
# delete index
r.ft(INDEX_NAME).dropindex(delete_documents=True)

# make a new one
create_index(vector_dimensions=VECTOR_DIMENSIONS)

In [None]:
%pip install openai

In [10]:
import openai

# set your OpenAI API key - get one at https://platform.openai.com
openai.api_key = "YOUR OPENAI API KEY"

In [11]:
# Create Embeddings with OpenAI text-embedding-ada-002
# https://openai.com/blog/new-and-improved-embedding-model
response = openai.Embedding.create(input=texts, engine="text-embedding-ada-002")
embeddings = np.array([r["embedding"] for r in response["data"]], dtype=np.float32)

# Write to Redis
pipe = r.pipeline()
for i, embedding in enumerate(embeddings):
    pipe.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "content": texts[i],
        "tag": "openai"
    })
res = pipe.execute()

In [12]:
embeddings

array([[ 0.00509819,  0.0010873 , -0.00228475, ..., -0.00457579,
         0.01329307, -0.03167175],
       [-0.00357223, -0.00550784, -0.01314328, ..., -0.02915693,
         0.01470436, -0.01367203],
       [-0.01284631,  0.0034875 , -0.01719686, ..., -0.01537451,
         0.01953256, -0.05048691],
       [-0.01145045, -0.00785481,  0.00206323, ..., -0.02070181,
        -0.01629098, -0.00300795]], dtype=float32)

### Search with OpenAI Embeddings

Now that we've created embeddings with OpenAI, we can also perform a search to find relevant documents to some input text.


In [13]:
text = "animals"

# create query embedding
response = openai.Embedding.create(input=[text], engine="text-embedding-ada-002")
query_embedding = np.array([r["embedding"] for r in response["data"]], dtype=np.float32)[0]

query_embedding

array([ 0.00062901, -0.0070723 , -0.00148926, ..., -0.01904645,
       -0.00436092, -0.01117944], dtype=float32)

In [14]:
# query for similar documents that have the openai tag
query = (
    Query("(@tag:{ openai })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("content", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {"vec": query_embedding.tobytes()}
r.ft(INDEX_NAME).search(query, query_params).docs

# the two pieces of content related to animals are returned

[Document {'id': 'doc:1', 'payload': None, 'score': '0.214349985123', 'content': 'The dog next door barks really loudly.', 'tag': 'openai'},
 Document {'id': 'doc:2', 'payload': None, 'score': '0.237052619457', 'content': 'My cat escaped and got out before I could close the door.', 'tag': 'openai'}]

### Cohere Embeddings
Before working with Cohere Embeddings, we clean up our existing search index and create a new one.

In [15]:
# delete index
r.ft(INDEX_NAME).dropindex(delete_documents=True)

# make a new one for cohere embeddings (1024 dimensions)
VECTOR_DIMENSIONS = 1024
create_index(vector_dimensions=VECTOR_DIMENSIONS)

In [None]:
%pip install cohere

In [16]:
import cohere

co = cohere.Client("YOUR COHERE API KEY")

In [17]:
# Create Embeddings with Cohere
# https://docs.cohere.ai/docs/embeddings
response = co.embed(texts=texts, model="small")
embeddings = np.array(response.embeddings, dtype=np.float32)

# Write to Redis
for i, embedding in enumerate(embeddings):
    r.hset(f"doc:{i}", mapping = {
        "vector": embedding.tobytes(),
        "content": texts[i],
        "tag": "cohere"
    })

In [18]:
embeddings

array([[-0.3034668 , -0.71533203, -0.2836914 , ...,  0.81152344,
         1.0253906 , -0.8095703 ],
       [-0.02560425, -1.4912109 ,  0.24267578, ..., -0.89746094,
         0.15625   , -3.203125  ],
       [ 0.10125732,  0.7246094 , -0.29516602, ..., -1.9638672 ,
         1.6630859 , -0.23291016],
       [-2.09375   ,  0.8588867 , -0.23352051, ..., -0.01541138,
         0.17053223, -3.4042969 ]], dtype=float32)

### Search with Cohere Embeddings

Now that we've created embeddings with Cohere, we can also perform a search to find relevant documents to some input text.

In [19]:
text = "animals"

# create query embedding
response = co.embed(texts=[text], model="small")
query_embedding = np.array(response.embeddings[0], dtype=np.float32)

query_embedding

array([-0.49682617,  1.7070312 ,  0.3466797 , ...,  0.58984375,
        0.1060791 , -2.9023438 ], dtype=float32)

In [20]:
# query for similar documents that have the cohere tag
query = (
    Query("(@tag:{ cohere })=>[KNN 2 @vector $vec as score]")
     .sort_by("score")
     .return_fields("content", "tag", "score")
     .paging(0, 2)
     .dialect(2)
)

query_params = {"vec": query_embedding.tobytes()}
r.ft(INDEX_NAME).search(query, query_params).docs

# the two pieces of content related to animals are returned

[Document {'id': 'doc:1', 'payload': None, 'score': '0.658673524857', 'content': 'The dog next door barks really loudly.', 'tag': 'cohere'},
 Document {'id': 'doc:2', 'payload': None, 'score': '0.662699103355', 'content': 'My cat escaped and got out before I could close the door.', 'tag': 'cohere'}]

Find more example apps, tutorials, and projects using Redis Vector Similarity Search check out the [Redis AI resources repo](https://github.com/redis-developer/redis-ai-resources/tree/main).