XataByte: Basics of database indexing

A basic overview of database indexing - what is a database index and what does it do?

Written by

J Edwards

Published on

August 11, 2023

Want to grasp a complex concept in the time it takes to brew your morning coffee? In our XataByte: 3-minute byte-sized read series we offer concise explanations to help you unravel intricate database concepts without the time commitment. Today we are looking at database indexing.

#

What is database indexing?

At its core, a database index is pretty much like an index in a book - it allows you to quickly find information without having to read through all of the content. In a database, indexing involves creating a structured roadmap that speeds up the retrieval of specific information from a huge collection of data.

Imagine you’ve created a movie database app. Users want to explore films based on filters: genre, release year, lead actors, and more. But as your user base grows, running a search for these filters becomes as slow as watching a buffering video.

Here's where indexes step in, like movie genres neatly organized on your shelf. Let's say you decide to index the "release year" column. Behind the scenes, your database engine creates a digital list with two columns: a release year value and a link to the corresponding movie record.

When a user hunts for movies from a specific year, your query instantly knows where to find the desired films. Thanks to this index magic, the query's efficiency is supercharged, taking mere seconds instead of super long wait times.

Without indexing, databases might need to scan through every row of data to find the desired information. By creating a roadmap that guides database systems to relevant data points, indexing can reduce the time required for queries and can help ensure that query tasks are performed quickly, without unnecessary strain on system resources.

When dealing with database indexing, there are a few points to consider:

  • Selectivity: Not all columns are created equal for indexing. Columns with high selectivity, meaning they have a wide range of distinct values, are better candidates for indexing because they can quickly narrow down search results.
  • Size vs performance: While indexing speeds up queries, it does come with some overhead in terms of storage and update times. Striking the right balance between indexing and overall system performance is important.
  • Update impact: As data changes, indexes must be updated as well. This update process can impact database performance, so choosing when and how to update indexes is a strategic decision.

Database indexing is applied using SQL statements that define which columns to index and how. Different types of indexes, such as B-tree, hash, and full-text indexes, cater to different data structures and use cases.

Types of database indexes include:

The name "B-Tree" comes from the structure it forms - a balanced tree. This index type is good at maintaining balance between efficient range queries (like selecting values between A and Z) and quick equality searches (finding a specific value).

The term "Hash" refers to the method this index uses. It uses a hash function to create a unique identifier (hash) for each data value. The identifier acts as a pointer to the data, making searches for exact matches super fast.

"Bitmap" uses a series of 0s and 1s, similar to a bitmap image. Each bit represents the presence or absence of a data value in a particular category. This is pretty efficient for low-cardinality columns.

"Full-Text" indicates the scope of this index type. It's tailored for analyzing large text fields. Instead of just keywords, it considers the entire text, which is great for thorough keyword searches.

"Spatial" refers to the index's focus on spatial, or geographic, data. This index type arranges geographical coordinates (latitude, longitude) to allow for quick spatial queries.

Database indexes are typically created on columns in database tables. These columns are usually used in search conditions or as part of the sorting and filtering operations in queries. When a query is executed, the database engine first checks the index associated with the relevant column(s). This index provides information about the data distribution and the location of data values in the table. By using this information, the database engine can navigate directly to the relevant rows, avoiding the need to scan the entire table.

Let's consider a scenario where we have a products table with millions of records. We want to find products with a specific category:

// Without an index (slower)
SELECT * FROM products WHERE category = 'Tees';
// With an index (faster)
CREATE INDEX idx_category ON products(category);
SELECT * FROM products WHERE category = 'Tees';

In the first query, without an index, the database would need to scan the entire products table to find matching rows. In the second query, after creating an index on the category column, the database engine can jump to the rows related to 'Tees'.

Indexes are not limited to single-column scenarios. They can work in join operations too. Consider a scenario where you're joining the orders and customers tables:

// Without indexes (slower)
SELECT * FROM orders
JOIN customers ON orders.customer_id = customers.id;
// With indexes (faster)
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_id ON customers(id);
SELECT * FROM orders
JOIN customers ON orders.customer_id = customers.id;

Creating indexes on the columns involved in the join improves the efficiency of the join operation; the query runs smoothly even as the dataset grows.

Aggregation functions like SUM, AVG, and COUNT can benefit from indexes too. Let's say you're calculating the total sales amount from the sales table:

// Without index (slower)
SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';
// With index (faster)
CREATE INDEX idx_date ON sales(date);
SELECT SUM(amount) FROM sales WHERE date BETWEEN '2023-01-01' AND '2023-08-30';

By indexing the date column, the query's execution time is reduced.

So there you have it – database indexing demystified (a little). While we've only scratched the surface of this topic, you now hold the key to understanding how indexes work their magic in optimizing data retrieval.

Want to add to the discussion? Reach out to us on Discord or follow us on Twitter. We'd love to hear your thoughts, answer your questions, and keep you updated on the latest at Xata.

Start free,
pay as you grow

Xata provides the best free plan in the industry. It is production ready by default and doesn't pause or cool-down. Take your time to build your business and upgrade when you're ready to scale.

Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage
Start freeExplore all plans
Free plan includes
  • Single team member
  • 10 database branches
  • High availability
  • 15 GB data storage

Sign up to our newsletter

By subscribing, you agree with Xata’s Terms of Service and Privacy Policy.

Copyright © 2024 Xatabase Inc.
All rights reserved.

Product

Feature requestsPricingStatusAI solutions