Vector Types

We have seen vector type in the previous section. In this section, we will show other vector types.

`bvector` binary vector

The bvector type is a binary vector type in pgvecto.rs. It represents a binary vector, which is a vector where each component can take on two possible values, typically 0 and 1.

Here's an example of creating a table with a bvector column and inserting values:

sql

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding bvector(3) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[1,0,1]'), ('[0,1,0]');

Index can be created on bvector type as well.

sql

CREATE INDEX your_index_name ON items USING vectors (embedding bvector_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[1,0,1]' LIMIT 5;

We support three operators to calculate the distance between two bvector values.

<-> (bvector_l2_ops): squared Euclidean distance, defined as . The Hamming distance is equivalent to the squared Euclidean distance for binary vectors.
<#> (bvector_dot_ops): negative dot product, defined as .
<=> (bvector_cos_ops): cosine distance, defined as .
<~> (bvector_jaccard_ops): Jaccard distance, defined as .

There is also a function binarize to build bvector from vector:

sql

SELECT binarize('[-2, -1, 0, 1, 2]');;
-- [0, 0, 0, 1, 1]

### Performance

The `bvector` type is optimized for storage and performance. It uses a bit-packed representation to store the binary vector. The distance calculation is also optimized for binary vectors.

Here are some performance benchmarks for the `bvector` type. We use the [dbpedia-entities-openai3-text-embedding-3-large-3072-1M](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-large-3072-1M) dataset for the benchmark. The VM is n2-standard-8 (8 vCPUs, 32 GB memory) on Google Cloud.

We upsert 1M binary vectors into the table and then run a KNN query for each embedding. It only takes about 600MB memory to index 1M binary vectors, while the `vector` type takes about 18GB memory to index the same number of vectors.

![bvector](./images/bvector.png)

We can see that the `bvector`'s accuracy is not as good as the `vector` type, but it exceeds 95%  if we adopt [adaptive retrieval](/use-case/adaptive-retrieval).

## `svector` sparse vector

Unlike dense vectors, sparse vectors are very high-dimensional but contain few non-zero values.

Typically, sparse vectors can be created from:
- Word co-occurrence matrices
- Term frequency-inverse document frequency (TF-IDF) vectors
- User-item interaction matrices
- Network adjacency matrices

Sparse vectors in `pgvecto.rs` are called `svector`.

Here's an example of creating a table with a svector column and inserting values:

```sql {3}
CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding svector(10) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[0.1,0,0,0,0,0,0,0,0,0]'), ('[0,0,0,0,0,0,0,0,0,0.5]');

Index can be created on svector type as well.

sql

CREATE INDEX your_index_name ON items USING vectors (embedding svector_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[0.3,0,0,0,0,0,0,0,0,0]' LIMIT 1;

We support three operators to calculate the distance between two svector values.

<-> (svector_l2_ops): squared Euclidean distance, defined as .
<#> (svector_dot_ops): negative dot product, defined as .
<=> (svector_cos_ops): cosine distance, defined as .

There is also a function to_svector to build svector:

sql

SELECT to_svector(5, '{0,4}', '{0.3,0.5}');
-- [0.3, 0, 0, 0, 0.5]

## `vecf16` half-precision vector

Stored as a half-precision number format, `vecf16` takes advantage of the 16-bit float, which uses half the memory and bandwidth compared to `vector`.
It is often faster than the regular `vector` datatype, but may lose some precision.

Here's an example of creating a table with a vecf16 column and inserting values:

```sql {3}
CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding vecf16(3) NOT NULL
);

INSERT INTO items (embedding) VALUES ('[0.1, 0.2, 0]'), ('[0, 0.1, 0.2]');

Index can be created on vecf16 type as well.

sql

CREATE INDEX your_index_name ON items USING vectors (embedding vecf16_l2_ops);

SELECT * FROM items ORDER BY embedding <-> '[0.3,0.2,0.1]' LIMIT 1;

We support three operators to calculate the distance between two vecf16 values.

<-> (vecf16_l2_ops): squared Euclidean distance, defined as .
<#> (vecf16_dot_ops): negative dot product, defined as .
<=> (vecf16_cos_ops): cosine distance, defined as .

Casts

For vector types, these casts are guaranteed:

From ARRAY to vector
From vector to bvector or svector
From vector to vecf16 or veci8
From svector to svecf16

This diagram shows the conversion between different data types, where the types connected by arrows can be cast to each other:

Among them, ARRAY is a native type of postgreSQL, others are types defined by pgvecto.rs.

Vector Types ​

bvector binary vector ​

Casts ​

Vector Types

`bvector` binary vector

Casts