How Indexes Improve Database Read Performance

Have you ever noticed how everything in building apps or software engineering revolves around handling data and performing operations? It involves operations like create, read, update, and delete, all usually done on a database. Fascinating, isn't it? The data stored in the database is then used for a variety of business logic and manipulation. So, at its core, it's all about data, right? Now, here's something intriguing: among these four (CRUD) operations, reading data happens more frequently than the others and occurs multiple times. We know that database operations can be quite costly due to various engineering constraints. Every query to a database consumes resources such as CPU, memory, and disk I/O. As the database size grows, these operations—especially those that require scanning large data sets—can slow down response times significantly. Furthermore, the need to handle concurrent requests, ensure data consistency, and manage network latencies adds another layer of complexity, doesn't it? So, optimizing these operations, especially the "read" operation, becomes crucial. And that's where an interesting database engineering concept comes into play—"indexing."

But what exactly is indexing?! Well, let's explore an example of indexing. Have you ever used a dictionary? Yes, the big book, not the Python dictionary :) When you're searching for a word in a dictionary, you begin with the first letter and locate the word from there. The first letter serves as an index. The words in the dictionary are arranged by their first letter at the highest level.

Alright, enough with the analogies and beating around the bush—let's dive into the technical stuff!

In any data-driven application, a well-designed database is the backbone of performance. One of the most critical optimizations is indexing—a technique that can dramatically speed up how quickly your database retrieves data.In this post, we'll see how indexes work behind the scenes and why they are essential for efficient data retrieval.

The Cost of Disk I/O

At its core, a database consists of records stored on disk. Consider a simple SQL table, such as a users table, with columns like id, name, age, bio, and username. Each row in this table is converted into a series of bytes, for instance, 200 bytes per row. Data is written to disk in blocks, often 4 KB or, for our example, 600 bytes. This means that even reading a small piece of data requires loading an entire block into memory.

Imagine a table with 100 rows that needs 34 blocks to store all the data. In the worst-case scenario, if you need to scan the entire table, you have to read all 34 disk blocks. Even if each block read takes just one second, your query could take 34 seconds to finish. This is a significant performance cost, especially when dealing with millions of records or frequent queries for a large app.

Here's How Indexes Can Help You Out

Think of indexes as your database's table of contents. Instead of going through every row one by one to find what you need, an index cuts down the number of disk reads you have to do. Here's the scoop:

Efficient Data Mapping: An index is like a mini table that links key values (like age or name) to where the records are. So, if you're looking for users who are 20 years old, the index has pairs of the indexed field and the row ID.
Sorted Structure: Since the index is sorted by the column you're interested in, a query can quickly jump to the right spot for any value (like 20) and only read what's needed, saving you from unnecessary reads.Let’s break down a simplified example:

Without an Index:
- The database has to go through all 34 blocks to find every row with age = 20.
With an Index:
- The index is tiny (just 8 bytes per entry in our example) and might only need 2 disk blocks.
- The database checks these 2 index blocks to find the right row IDs. Then, it only reads the 2 blocks that have the actual data.
- All in all, this means just 4 disk block reads—a big drop from 34 blocks.

This kind of reduction (an 8x improvement in our example) can turn a slow, expensive task into a much quicker one. Just think about the impact in a real-world system where database speed really matters.

Behind the Scenes: How It Works

When you run a query like SELECT * FROM users WHERE age = 20, Here's the lowdown:

Step 1: Index Scan
The system quickly checks the small index table to find where the value 20 pops up. Since the index is smaller and sorted, this scan is super fast, using just a couple of blocks.
Step 2: Data Fetch
With the row IDs from the index, the system goes straight to the right rows in the main table. This skips the need to scan the whole table.
Net Result:
The time it takes to get the query results is way shorter because there are fewer disk I/O operations.

You see in environments where queries are executed frequently or databases are large, optimizing the read operations makes a dramatic difference. Without proper indexing, even a simple query might lead to a full table scan, causing heavy disk I/O that can degrade performance and overwhelm system resources. Conversely, with indexing, the performance gains are substantial, allowing databases to scale and run queries at lightning speeds.