Full-text search

As you insert data into Xata, it is automatically indexed for full-text search. You can run a search by using the /search endpoint. While the /query endpoint exists only at the table level, the /search endpoint exists both at the database branch and the table level, because it is possible to search across table.

The search index is updated asynchronously after each insert/update, meaning that the search results are eventually consistent with the results that you get from the /query endpoint. Another fundamental difference to the /query endpoint is that /search doesn't support following links. This means that for links records you can filter by the ID of the linked record, but not any of the other columns. If you need filter or search by linked columns, other than the ID, it is recommended that you denormalize the data.

Searching across tables

The format of a search request at the branch level (across tables) is as follows:

const records = await xata.search.all("<search phrase>", {
    tables: [
      { 
        table: "...",
        target: [...],
        filter: {...},
        boosters: [...]
      },
      { ... },
      ...
    ],
    fuzziness: 1,
    prefix: "phrase",
    highlight: {...}
});

A simple example, which searches across all tables with default relevancy settings, looks like this:

const results = await xata.search.all("new st")

Which returns results in the following format:

for (const res of results) {
  // result record
  console.log(res.record);
  /*
  {
    "email": "carrie@example.com",
    "id": "rec_cd8s4c0avc42pi67m14g",
    "name": "Carrie-Anne Moss",
    "bio": null,
    "address": null,
    "team": null
  }
  */

  // found in table
  console.log(res.table);
  // Users

  // meta information about the result
  console.log(res.record.getMetadata())
  /*
  {
    "highlight": {
      "address": {
        "street": [
          "123 Main <em>St</em>"
        ]
      }
    },
    "score": 0.2876821,
    "table": "Users",
    "version": 0
  }
  */
}

The REST API responses include the special xata field, which contains metadata about the result. This is accessible via the SDK by calling the getMetadata() function on the record.

The metadata includes:

  • the table in which the results was found.
  • the relevancy score of the result. See Relevancy control for more information.
  • the version of the record.
  • the highlight field, which contains the highlighted search terms.

Restricting Search By Tables

If you'd like to search only some of the tables in the database, you can specify the tables to search in the tables field of the request. It expects an array, for example:

  const results = await xata.search.all("new st", {
    tables: ["Users", "Posts"]
  })

  // equivalent to:
  const results = await xata.search.all("new st", {
    tables: [
      { table: "Users" }, 
      { table "Posts" }
  })

Searching in a single table

If you want to search in a single table, it's easier to use the table-level search API. It looks like this:

const records = await xata.db.Users.search("<search phrase>", {
  target: [...],
  filter: {...},
  boosters: [...],
  fuzziness: 1,
  prefix: "phrase",
  highlight: {...}
});

In other words, the table level settings from the branch-level search API (filter, target, boosters) are top level settings in the per-table search API.

Fuzziness and Typo Tolerance

By default, Xata searches tolerate typos of one character. You can control this behavior by setting the fuzziness parameter, which represents the maximum Levenshtein distance for the search terms. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Xata accepts 3 possible values:

  • 0: no typo tolerance
  • 1: one letter changed/added/removed (default)
  • 2: two letters changed/added/removed

For example, instead of Keanu you can search for kaanu (one letter replaced) or kenu (one letter missing) and still get the same result:

const results = await xata.search.all("kaanu")

The above matches Keanu. You can disable this typo tolerance by setting the fuzziness field to 0:

const results = await xata.search.all("kaanu", { fuzziness: 0 })

The above won't match Keanu.

You can also increase the fuzzyness to accept two typos per word:

const results = await xata.search.all("kaano", { fuzziness: 2 })

The above will match records containing Keanu.

Filtering

Filtering allows you to filter out records before passing them through the search algorithm. The filtering syntax is the same as for the query API, with the limitation that you cannot filter by linked columns.

The filtering is applied at the table level. For example:

const records = await xata.search.all("new st", {
tables: [
    {
      table: "Users",
      filter: {
        "address.city": "New York",
      },
    },
  ],
});

Targeting specific columns

By default, Xata searches across all columns from the selected tables. You can restrict the search to specific columns by using the target field.

const records = await xata.search.all("new st", {
  tables: [
    {
      table: "Users",
      target: ["name", "address.street"],
    },
  ],
});

Relevancy control

When using the search API, Xata assigns a relevancy score to each result and the results are returned sorted by their relevancy to the provided query. Behind the scenes, Xata uses a BM25 algorithm to rank the results. The algorithm takes into account the frequency of the search terms in the document, the length of the document, and the frequency of the search terms in the database.

To the relevancy score is returned for each result in the metadata. See Searching across tables for sample responses.

You can fine-tune the relevancy of your searches by using column weights and boosters. We recommend using the web UI to experiment with these settings, then use the "Get Code Snippet" button to get the code to use in your app.

Column weights

You can assign an integer weight to each column. The default weight is 1. The higher the weight, the highest the relevancy score will be for matches in that column.

const records = await xata.search.all("matrix", {
  tables: [
    {
      table: "Posts",
      target: [
        { column: "title", weight: 5 },
        { column: "labels", weight: 2 },
        "*"
      ],
    },
  ],
});

In the above example, all columns are still targeted (* is included in target) but the titles and labels columns are boosted.

Numeric Booster

The numeric booster allows making use of numeric columns to influence the relevancy score. This is particularly useful when you have columns that contain metrics relevant for the relevancy, like "number of stars", or "number of views'.

const records = await xata.search.all("matrix", {
  tables: [
    {
      table: "Posts",
      boosters: [{ numericBooster: { column: "views", factor: 3 } }],
    },
  ]
});

In this example, the views column is multiplied with the factor of 3 and then added to the relevancy score.

Exact value booster

The exact value booster allows boosting the relevancy of records that have an exact value in a column. This can be useful to boost, for example, articles in a given category. Or you can use it to "pin" a particular result at the top of the results.

  const records = await xata.search.all("matrix", {
  tables: [
    {
      table: "Posts",
      boosters: [
        { valueBooster: { column: "labels", value: "movies", factor: 5 } },
      ],
    },
  ],
});

In this example, records that have the label "movies" in the labels column will have the factor of 5 added to their relevancy score.

Date booster

The date booster allows boosting the relevancy of records that have a date in a column depending on the proximity of the date to a particular date. This can be used to boost, for example, more recent articles.

const records = await xata.search.all("matrix", {
  tables: [
    {
      table: "Posts",
      boosters: [
        { 
          dateBooster: { 
            column: "createdAt", 
            decay: 0.5, 
            scale: "30d" 
          } 
        },
      ],
    },
  ]
});

The date booster is configured via origin, scale, and decay parameters. The further away from the origin, the more the score is decayed. The decay function uses an exponential function.

In the example above, the parameters can be interpreted as: the posts from 30 days ago will be scored 50% of what the equivalent post from today would be scored.

The parameter definitions are:

  • column: the column in which to look for the value.
  • origin: The datetime from where to apply the score decay function. If it is not specified, the current date and time is used.
  • scale: The duration at which distance from origin the score is decayed with factor, using an exponential function. It is formatted as number + units, for example: 5d, 20m, 10s.
  • decay: The decay factor to expect at "scale" distance from the "origin".

Next Steps

If you're here, there's a great chance you've completed our guide on working with records. If so, we'd recommend exploring the API reference to get more familiar with our API. If not, feel free to visit the pages on getting data from a database. Alternatively, we can also look into updating data, inserting data, or deleting data. We've got guides for each of these operations.