Database Indexing Strategy: B-Tree, Hash, and GIN Index for Developers
Classic scenario: you develop an app fast on 1000-row dataset. Push to production, dataset becomes 100,000 rows, queries start slowing. 1 million rows, queries take 5-10 seconds. Production crashes.
Diagnosis: missing index. Add index, query 50ms. Problem solved.
But indexing isn't as simple as "add to all columns". Wrong indexes can slow writes, bloat storage, and confuse the query optimizer to pick weird plans. Understanding indexing trade-offs is the skill separating mid-level from senior backend devs.
This article covers the main index types and when to use which, focusing on PostgreSQL (most popular open source DB in 2026).
Basic Index Concepts
Without index, finding row by column = full table scan. Database has to read all rows, check conditions.
With index, database has sorted/hashed data structure that can be searched efficiently. Tradeoff: index stores additional data (storage cost), needs updating when rows change (write cost).
Practical rule: indexing speeds up reads, slows down writes. Optimization depends on workload.
B-Tree Index: Default and Most Common
B-Tree (Balanced Tree) is default index in PostgreSQL, MySQL, and most databases. Designed for operations: equality, range, sort.
Simple How It Works
Data organized into balanced tree. Each node has multiple keys. Search uses tree navigation: O(log n) complexity.
For 1 million rows, B-Tree depth around 3-4 levels. Only 3-4 disk reads needed to find specific value. Compare to full scan needing 1 million reads.
Suitable Operations
WHERE col = value(equality)WHERE col < value,WHERE col BETWEEN a AND b(range)WHERE col LIKE 'prefix%'(prefix match, NOT contains/suffix)ORDER BY col(sort)GROUP BY col(if high cardinality)
Unsuitable Operations
WHERE col LIKE '%suffix'or'%contains%'WHERE function(col) = value- unless creating functional index- Boolean columns with skewed distribution (90% true, 10% false). Index doesn't help much.
Composite Index: Order Matters
Multi-column index CREATE INDEX ON users (country, age):
- Helps:
WHERE country = 'ID' AND age = 25✓ - Helps:
WHERE country = 'ID'✓ (leftmost prefix) - Helps:
WHERE country = 'ID' AND age > 25✓ - Doesn't help:
WHERE age = 25✗ (skips leftmost) - Sub-optimal:
WHERE country = 'ID' AND city = 'Jakarta'(city not in index)
Rule: most selective column at leftmost. Equality columns before range columns.
Hash Index: For Equality Only
Hash index uses hash function for lookups. Faster than B-Tree for equality, but doesn't support range or sort.
Specific Use Cases
- Lookup by exact value: session tokens, cache keys, hash-based lookups
- Columns with high cardinality (many unique values)
- Read-dominant workloads
Drawbacks
- No range support
- No ordering
- PostgreSQL hash index before version 10 wasn't WAL-logged (not safe for replication). Modern versions OK.
Practical advice: 99% of cases, B-Tree is enough. Hash index useful for specific use cases where you've measured a bottleneck.
GIN Index: For Arrays, JSON, Full-Text Search
GIN (Generalized Inverted Index) designed for multi-value columns. Each "element" indexed, not whole value.
Powerful Use Cases
1. Array Columns
-- Table with tag array
CREATE TABLE posts (
id serial,
tags text[]
);
CREATE INDEX idx_tags ON posts USING GIN (tags);
-- Query: find posts with tag 'tech' or 'ai'
SELECT * FROM posts WHERE tags && ARRAY['tech', 'ai'];
2. JSONB Columns
CREATE INDEX idx_data ON events USING GIN (data jsonb_path_ops);
-- Query: find events with specific key
SELECT * FROM events WHERE data @> '{"user_id": 123}';
3. Full-Text Search
CREATE INDEX idx_content_fts ON articles
USING GIN (to_tsvector('english', content));
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');
GIN Trade-Offs
- Build time slow (much slower than B-Tree)
- Large storage size
- Insert/update slow (many elements, many index entries)
- Read super fast for multi-value queries
Suitable for read-heavy workloads with tag/jsonb/fulltext search.
BRIN Index: For Sequential Big Data
BRIN (Block Range Index) suitable for big tables where values correlate with physical order. Stores min/max per block range, not per row.
Classic Use Case
Time-series data: logs, metrics, audit trails. Insert by time, naturally sorted by timestamp.
CREATE TABLE access_logs (
id bigserial,
ip varchar(45),
url text,
timestamp timestamp
);
-- Data inserted in time order, BRIN works well
CREATE INDEX idx_logs_time ON access_logs USING BRIN (timestamp);
-- Query: logs in date range
SELECT * FROM access_logs
WHERE timestamp BETWEEN '2026-01-01' AND '2026-01-07';
Why BRIN
- Storage 1000x smaller than B-Tree (only min/max per block range)
- Fast build
- Insert doesn't slow down significantly
- Query speed acceptable for range queries
Drawback: data not sequentially correlated, BRIN doesn't help much.
GiST Index: Geometric, Range, Custom
GiST (Generalized Search Tree) framework for custom indexes. Built-in support for: geometric data (PostGIS), range types, full-text search alternatives.
Use Cases
1. Geographic Search
-- PostGIS extension
CREATE INDEX idx_location ON places USING GIST (geom);
-- Find nearest places
SELECT name FROM places
ORDER BY geom <-> ST_MakePoint(106.8, -6.2)
LIMIT 10;
2. Range Types
CREATE TABLE bookings (
id serial,
room_id int,
during tsrange -- timestamp range
);
CREATE INDEX idx_during ON bookings USING GIST (during);
-- Find overlapping bookings
SELECT * FROM bookings
WHERE during && '[2026-05-30 10:00, 2026-05-30 12:00]'::tsrange;
Common Indexing Pitfalls
1. Over-Indexing
Creating index for every column. Result: write performance destroyed, storage bloat 5-10x, query optimizer confused.
Rule: create index because there's a query needing it, not "for safety". Profile with EXPLAIN ANALYZE first, then add index if needed.
2. Index for Low-Cardinality Columns
Boolean or enum columns with only 2-5 unique values. B-Tree index doesn't help much, might even slow things down (because optimizer chooses sequential scan anyway).
Exception: if distribution is severely skewed (99% value A, 1% value B), partial index for minority value is useful:
CREATE INDEX idx_unread_messages
ON messages(user_id) WHERE read = false;
3. Functional Operations Without Functional Index
-- Slow: doesn't use existing index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Fast: create functional index
CREATE INDEX idx_email_lower ON users (LOWER(email));
4. Not Following Rebuild Schedule
Indexes can "bloat" over time, especially after many DELETE/UPDATE. Performance degrades. Schedule periodic REINDEX or use CREATE INDEX CONCURRENTLY for no-downtime rebuilds.
5. Wrong Composite Order
See earlier. Common mistake: creating (date, user_id) but often querying WHERE user_id = X. Index doesn't help. Reverse order: (user_id, date).
How to Decide: Practical Workflow
- Profile first:
EXPLAIN (ANALYZE, BUFFERS) SELECT ...identify slow queries. - Check query plan: if there's "Seq Scan" on large table, likely needs index.
- Identify columns in WHERE / JOIN / ORDER BY clauses.
- Pick type:
- Equality + range + sort on scalar column → B-Tree
- Array, JSONB, full-text search → GIN
- Time-series large data → BRIN
- Geographic, range types → GiST
- Pure equality high cardinality → Hash (sometimes)
- Test impact:
EXPLAINagain after index. Speedup makes sense? Also check write performance impact. - Monitor production: query performance metrics, index hit rates.
Helpful Tools
- pg_stat_statements: PostgreSQL extension tracking slow queries. Identify indexing candidates.
- pg_stat_user_indexes: see index hit rates. Indexes with low rates = candidates to drop.
- EXPLAIN.depesz.com: visualize query plans, easier to identify bottlenecks.
- pgBadger: log analyzer that can highlight problem query patterns.
Closing
Indexing is a trade-off art. Speed up reads, slow down writes. Storage cost. Maintenance burden. But for most apps, query performance bottlenecks can be solved with right index setup.
Start simple: B-Tree on columns frequently in WHERE/JOIN. Composite index for multi-column predicates. Profile first, add indexes later. Avoid premature indexing.
For apps with complex data types (JSONB, arrays, geographic), specialized indexes (GIN, GiST) can be game-changers for specific queries. But still, validate via profiling first.
What you shouldn't do: skip learning EXPLAIN. Reading query plans is core skill for backend devs. 30 minutes invested learning saves a lot of debugging time in production.