How Indexes Improve Database Read Performance

Mon Apr 14 2025 5 min read

Have you ever noticed how everything in building apps or software engineering revolves around handling data and performing operations? It involves operations like create, read, update, and delete, all usually done on a database. Fascinating, isn't it? The data stored in the database is then used for a variety of business logic and manipulation. So, at its core, it's all about data, right? Now, here's something intriguing: among these four (CRUD) operations, reading data happens more frequently than the others and occurs multiple times. We know that database operations can be quite costly due to various engineering constraints. Every query to a database consumes resources such as CPU, memory, and disk I/O. As the database size grows, these operations—especially those that require scanning large data sets—can slow down response times significantly. Furthermore, the need to handle concurrent requests, ensure data consistency, and manage network latencies adds another layer of complexity, doesn't it? So, optimizing these operations, especially the "read" operation, becomes crucial. And that's where an interesting database engineering concept comes into play—"indexing."

But what exactly is indexing?! Well, let's explore an example of indexing. Have you ever used a dictionary? Yes, the big book, not the Python dictionary :) When you're searching for a word in a dictionary, you begin with the first letter and locate the word from there. The first letter serves as an index. The words in the dictionary are arranged by their first letter at the highest level.

Alright, enough with the analogies and beating around the bush—let's dive into the technical stuff!

In any data-driven application, a well-designed database is the backbone of performance. One of the most critical optimizations is indexing—a technique that can dramatically speed up how quickly your database retrieves data.In this post, we'll see how indexes work behind the scenes and why they are essential for efficient data retrieval.

The Cost of Disk I/O

At its core, a database consists of records stored on disk. Consider a simple SQL table, such as a users table, with columns like id, name, age, bio, and username. Each row in this table is converted into a series of bytes, for instance, 200 bytes per row. Data is written to disk in blocks, often 4 KB or, for our example, 600 bytes. This means that even reading a small piece of data requires loading an entire block into memory.

Imagine a table with 100 rows that needs 34 blocks to store all the data. In the worst-case scenario, if you need to scan the entire table, you have to read all 34 disk blocks. Even if each block read takes just one second, your query could take 34 seconds to finish. This is a significant performance cost, especially when dealing with millions of records or frequent queries for a large app.

Here's How Indexes Can Help You Out

Think of indexes as your database's table of contents. Instead of going through every row one by one to find what you need, an index cuts down the number of disk reads you have to do. Here's the scoop:

  1. Without an Index:

    • The database has to go through all 34 blocks to find every row with age = 20.
  2. With an Index:

    • The index is tiny (just 8 bytes per entry in our example) and might only need 2 disk blocks.

    • The database checks these 2 index blocks to find the right row IDs. Then, it only reads the 2 blocks that have the actual data.

    • All in all, this means just 4 disk block reads—a big drop from 34 blocks.

This kind of reduction (an 8x improvement in our example) can turn a slow, expensive task into a much quicker one. Just think about the impact in a real-world system where database speed really matters.

Behind the Scenes: How It Works

When you run a query like SELECT * FROM users WHERE age = 20, Here's the lowdown:

You see in environments where queries are executed frequently or databases are large, optimizing the read operations makes a dramatic difference. Without proper indexing, even a simple query might lead to a full table scan, causing heavy disk I/O that can degrade performance and overwhelm system resources. Conversely, with indexing, the performance gains are substantial, allowing databases to scale and run queries at lightning speeds.