From natural language to SQL queries: How we built Generate SQL with AI

Discover how Xata used Large Language Models (LLMs) to turn natural language into accurate PostgreSQL queries, simplifying BI workflows and boosting productivity.

Author

Divyendu Singh

Date published

Large Language Models (LLMs) have made an enormous progress in the last few months and are very well suited to help with precise tasks embedded in applications. One of the most common business intelligence (BI) task is to query the database to find information to make key decisions.

In the past, this was a slow workflow where PMs and marketing folks (or anyone who doesn’t speak SQL, usually the people closest to the bottom line) needed some data. A simple SQL query would turn into a support request and the database engineers or data scientists would unintentionally become data gatekeepers, constantly unblocking others for even the simplest requests.

What if you could ask for data in plain English (or German or your mother tongue)? With modern LLMs this is possible and we have built this feature in Xata.

Let’s start with a quick demo (or your can play with it yourself here):

How we did it

Simplicity (in implementation)

We have a simple one message chat context window, roughly following the structure below:

Then we initialize the model with the following user message:

There aren’t too many moving parts and the simplicity is by design. Each request to the model to generate SQL takes the following inputs:

  • The database schema (we automatically cache and fetch it, as we host the database we have access to it on-demand).
  • The current SQL contents in the UI (or CLI) editor. It doesn’t matter if the current SQL is to be discarded and the user needs a new SQL. That handling happens in the system prompt.
  • A solid system prompt that tells the model to produce an executable SQL and instructions on how to check its work.

Simplicity (for the customer)

The simplistic implementation above translates directly to a simpler interface for the customer. Since the context window is always two messages long, we don’t need to maintain the context window length with features like compaction and summarization.

Each message can

  1. Generate a new SQL and it has access to the database schema, that’s all it needs for that
  2. Edit / fix an SQL query and it has access to the database schema and the current query, and that’s all that it needs

As you saw in the demo, from the user’s perspective, this turns into a simple natural language query to see the insights they seek.

Premium tokens

There are several schools of thought about which models to use for a task. Many tools resort to model pickers and thereby transferring the responsibility to the end user. We make this choice for the user (as it should be) and we trust a premium model to do the task for generating the right SQL.

At the same time, using right tool for the job is important, so for tasks like naming a query automatically with AI, we use a cheaper commodity model.

This way, we get the best of both worlds, meaningful generated query names and high quality SQL queries.

Platform bonus

Xata is a PostgreSQL database platform designed to help developers operate and scale databases with enhanced productivity and performance, featuring instant copy-on-write database branches, zero-downtime schema changes, data anonymization, and AI-powered performance monitoring.

Naturally for this use case there are a lot of benefits that you derive from the platform like being able to run heavy analytical queries on a separate branch created instantly from production.

What we learned

Simplicity is very important in building and debugging AI features. When something breaks, you want to have a smaller set of potential breaking points to debug. It also led to a cohesive and simple UI to break the data silos and get insights fast with natural language, our customers love it.

Conclusion

In this post, we explored how we built the natural language to SQL feature at Xata.

Thank you for reading! We look forward to having you try the platform. If you'd like early access, you can join our private beta today.

Related Posts

From DBA to DB Agents

Discover how a PostgreSQL DBA’s decade of expertise evolved into designing Xata’s AI-powered Postgres agent, automating monitoring and observability.