Database Indexing Strategy: B-Tree, Hash, and GIN Index for Developers

Developer May 30, 2026 · OTPZap Team

Classic scenario: you develop an app fast on 1000-row dataset. Push to production, dataset becomes 100,000 rows, queries start slowing. 1 million rows, queries take 5-10 seconds. Production crashes.

Diagnosis: missing index. Add index, query 50ms. Problem solved.

But indexing isn't as simple as "add to all columns". Wrong indexes can slow writes, bloat storage, and confuse the query optimizer to pick weird plans. Understanding indexing trade-offs is the skill separating mid-level from senior backend devs.

This article covers the main index types and when to use which, focusing on PostgreSQL (most popular open source DB in 2026).

Basic Index Concepts

Without index, finding row by column = full table scan. Database has to read all rows, check conditions.

With index, database has sorted/hashed data structure that can be searched efficiently. Tradeoff: index stores additional data (storage cost), needs updating when rows change (write cost).

Practical rule: indexing speeds up reads, slows down writes. Optimization depends on workload.

B-Tree Index: Default and Most Common

B-Tree (Balanced Tree) is default index in PostgreSQL, MySQL, and most databases. Designed for operations: equality, range, sort.

Simple How It Works

Data organized into balanced tree. Each node has multiple keys. Search uses tree navigation: O(log n) complexity.

For 1 million rows, B-Tree depth around 3-4 levels. Only 3-4 disk reads needed to find specific value. Compare to full scan needing 1 million reads.

Suitable Operations

WHERE col = value (equality)
WHERE col < value, WHERE col BETWEEN a AND b (range)
WHERE col LIKE 'prefix%' (prefix match, NOT contains/suffix)
ORDER BY col (sort)
GROUP BY col (if high cardinality)

Unsuitable Operations

WHERE col LIKE '%suffix' or '%contains%'
WHERE function(col) = value - unless creating functional index
Boolean columns with skewed distribution (90% true, 10% false). Index doesn't help much.

Composite Index: Order Matters

Multi-column index CREATE INDEX ON users (country, age):

Helps: WHERE country = 'ID' AND age = 25 ✓
Helps: WHERE country = 'ID' ✓ (leftmost prefix)
Helps: WHERE country = 'ID' AND age > 25 ✓
Doesn't help: WHERE age = 25 ✗ (skips leftmost)
Sub-optimal: WHERE country = 'ID' AND city = 'Jakarta' (city not in index)

Rule: most selective column at leftmost. Equality columns before range columns.

Hash Index: For Equality Only

Hash index uses hash function for lookups. Faster than B-Tree for equality, but doesn't support range or sort.

Specific Use Cases

Lookup by exact value: session tokens, cache keys, hash-based lookups
Columns with high cardinality (many unique values)
Read-dominant workloads

Drawbacks

No range support
No ordering
PostgreSQL hash index before version 10 wasn't WAL-logged (not safe for replication). Modern versions OK.

Practical advice: 99% of cases, B-Tree is enough. Hash index useful for specific use cases where you've measured a bottleneck.

GIN Index: For Arrays, JSON, Full-Text Search

GIN (Generalized Inverted Index) designed for multi-value columns. Each "element" indexed, not whole value.

Powerful Use Cases

1. Array Columns

-- Table with tag array
CREATE TABLE posts (
  id serial,
  tags text[]
);

CREATE INDEX idx_tags ON posts USING GIN (tags);

-- Query: find posts with tag 'tech' or 'ai'
SELECT * FROM posts WHERE tags && ARRAY['tech', 'ai'];

2. JSONB Columns

CREATE INDEX idx_data ON events USING GIN (data jsonb_path_ops);

-- Query: find events with specific key
SELECT * FROM events WHERE data @> '{"user_id": 123}';

3. Full-Text Search

CREATE INDEX idx_content_fts ON articles 
  USING GIN (to_tsvector('english', content));

SELECT * FROM articles 
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');

GIN Trade-Offs

Build time slow (much slower than B-Tree)
Large storage size
Insert/update slow (many elements, many index entries)
Read super fast for multi-value queries

Suitable for read-heavy workloads with tag/jsonb/fulltext search.

BRIN Index: For Sequential Big Data

BRIN (Block Range Index) suitable for big tables where values correlate with physical order. Stores min/max per block range, not per row.

Classic Use Case

Time-series data: logs, metrics, audit trails. Insert by time, naturally sorted by timestamp.

CREATE TABLE access_logs (
  id bigserial,
  ip varchar(45),
  url text,
  timestamp timestamp
);

-- Data inserted in time order, BRIN works well
CREATE INDEX idx_logs_time ON access_logs USING BRIN (timestamp);

-- Query: logs in date range
SELECT * FROM access_logs 
WHERE timestamp BETWEEN '2026-01-01' AND '2026-01-07';

Why BRIN

Storage 1000x smaller than B-Tree (only min/max per block range)
Fast build
Insert doesn't slow down significantly
Query speed acceptable for range queries

Drawback: data not sequentially correlated, BRIN doesn't help much.

GiST Index: Geometric, Range, Custom

GiST (Generalized Search Tree) framework for custom indexes. Built-in support for: geometric data (PostGIS), range types, full-text search alternatives.

Use Cases

1. Geographic Search

-- PostGIS extension
CREATE INDEX idx_location ON places USING GIST (geom);

-- Find nearest places
SELECT name FROM places 
ORDER BY geom <-> ST_MakePoint(106.8, -6.2)
LIMIT 10;

2. Range Types

CREATE TABLE bookings (
  id serial,
  room_id int,
  during tsrange  -- timestamp range
);

CREATE INDEX idx_during ON bookings USING GIST (during);

-- Find overlapping bookings
SELECT * FROM bookings 
WHERE during && '[2026-05-30 10:00, 2026-05-30 12:00]'::tsrange;

Common Indexing Pitfalls

1. Over-Indexing

Creating index for every column. Result: write performance destroyed, storage bloat 5-10x, query optimizer confused.

Rule: create index because there's a query needing it, not "for safety". Profile with EXPLAIN ANALYZE first, then add index if needed.

2. Index for Low-Cardinality Columns

Boolean or enum columns with only 2-5 unique values. B-Tree index doesn't help much, might even slow things down (because optimizer chooses sequential scan anyway).

Exception: if distribution is severely skewed (99% value A, 1% value B), partial index for minority value is useful:

CREATE INDEX idx_unread_messages 
  ON messages(user_id) WHERE read = false;

3. Functional Operations Without Functional Index

-- Slow: doesn't use existing index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- Fast: create functional index
CREATE INDEX idx_email_lower ON users (LOWER(email));

4. Not Following Rebuild Schedule

Indexes can "bloat" over time, especially after many DELETE/UPDATE. Performance degrades. Schedule periodic REINDEX or use CREATE INDEX CONCURRENTLY for no-downtime rebuilds.

5. Wrong Composite Order

See earlier. Common mistake: creating (date, user_id) but often querying WHERE user_id = X. Index doesn't help. Reverse order: (user_id, date).

How to Decide: Practical Workflow

Profile first: EXPLAIN (ANALYZE, BUFFERS) SELECT ... identify slow queries.
Check query plan: if there's "Seq Scan" on large table, likely needs index.
Identify columns in WHERE / JOIN / ORDER BY clauses.
Pick type:
- Equality + range + sort on scalar column → B-Tree
- Array, JSONB, full-text search → GIN
- Time-series large data → BRIN
- Geographic, range types → GiST
- Pure equality high cardinality → Hash (sometimes)
Test impact: EXPLAIN again after index. Speedup makes sense? Also check write performance impact.
Monitor production: query performance metrics, index hit rates.

Helpful Tools

pg_stat_statements: PostgreSQL extension tracking slow queries. Identify indexing candidates.
pg_stat_user_indexes: see index hit rates. Indexes with low rates = candidates to drop.
EXPLAIN.depesz.com: visualize query plans, easier to identify bottlenecks.
pgBadger: log analyzer that can highlight problem query patterns.

Closing

Indexing is a trade-off art. Speed up reads, slow down writes. Storage cost. Maintenance burden. But for most apps, query performance bottlenecks can be solved with right index setup.

Start simple: B-Tree on columns frequently in WHERE/JOIN. Composite index for multi-column predicates. Profile first, add indexes later. Avoid premature indexing.

For apps with complex data types (JSONB, arrays, geographic), specialized indexes (GIN, GiST) can be game-changers for specific queries. But still, validate via profiling first.

What you shouldn't do: skip learning EXPLAIN. Reading query plans is core skill for backend devs. 30 minutes invested learning saves a lot of debugging time in production.