Dgraph 24.0.0-alpha3 is now available on Github and DockerHub

Supports similarity search in GraphQL

We are excited to announce v24-alpha3, including GraphQL support for Vector type and similarity search. We recently announced the Dgraph v24-alpha release which includes the addition of vector type in Dgraph and DQL-based similarity search queries.

If you’d like a free dedicated cluster to test vector capabilities, please use these steps: Create a Dedicated cluster with 1 vCPU on General Purpose Class Email [email protected] with the subject “v24 for {{YOUR BACKEND NAME}} and {{OWNER EMAIL ADDRESS}}” Once we get your email, we’ll upgrade your instance with v24-alpha, ensure the account is credited correctly.

The remainder of this post show a simple example of a schema with vector embeddings and corresponding mutation and query.

Deploy the following GraphQL schema:

type Project {
  id: ID!
  title: String! @id
  title_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidean, exponent: 4)"])
}

In this schema, the field title_v is an embedding on which the HNSW algorithm is used to create a vector search index. The metric used to compute the distance between vectors (in this example) is Euclidean distance. A new directive, @embedding, has been introduced to designate one or more fields as vector embeddings. The @search directive has been extended to define the HNSW index based on Euclidean distance. The exponent value is used to set reasonable defaults for HNSW internal tuning parameters. It is an integer representing an approximate number for the vectors expected in the index, in terms of power of 10. Default is “4” (10^4 vectors).

GraphQL schema deployed

Once deployed successfully:

Schema deployed successfully

Let’s add some data via the auto-generated addProject mutation type.

mutation {
addProject(input: [
{ title: "iCreate with a Mini iPad", title_v: [0.12, 0.53, 0.9, 0.11, 0.32] },
{ title: "Resistive Touchscreen", title_v: [0.72, 0.89, 0.54, 0.15, 0.26] },
{ title: "Fitness Band", title_v: [0.56, 0.91, 0.93, 0.71, 0.24] },
{ title: "Smart Ring", title_v: [0.38, 0.62, 0.99, 0.44, 0.25] }]) 
  {
    project {
      id
      title
      title_v
    }
  }
}

Mutation to insert data

The auto-generated querySimilarProjectByEmbedding query allows us to run semantic (aka similarity) search using the vector index specified in our schema.

Execute the query:

query {    
    querySimilarProjectByEmbedding(by: title_v, topK: 3, vector: [0.1, 0.2, 0.3, 0.4, 0.5]) {
        id
        title
        vector_distance
     }
}

Running the auto-generated query

The results obtained for the querySimilarProjectByEmbedding function includes the 3 closest Projects ordered by vector_distance. The vector_distance is the Euclidean distance between the title_v embedding vector and the input vector used in our query.

Note: you can omit vector_distance predicate in the query, the result will still be ordered by vector_distance.

The distance metric used is specified in the index creation. In this example we have used:

title_v: [Float!] @embedding @search(by: ["hnsw(metric: euclidian, exponent: 4)"])

We can also query for similar objects to an existing object, given it’s Id, using the getSimilar<Object>ById function.

query {
  querySimilarProjectById(by: title_v, topK: 3, id: "0xef7") {
    id
    title
    vector_distance
  }
}

Query using existing object based on id

In the example below, we use title to identify a project for which we want to find similar projects. In this case, the title field is an external ID and annotated using the @id directive in the schema. You can have multiple fields designated as external IDs, using the @id directive.

query {
  querySimilarProjectById(by: title_v, topK: 3, title: "Smart Ring") {
    title
    vector_distance
  }
}

Query using existing object based on title