Database Indexing Strategy: B-Tree, Hash, dan GIN Index Untuk Developer

Developer 30 Mei 2026 · OTPZap Team

Skenario klasik: lo dev app yang fast saat dataset 1000 row. Push to production, dataset jadi 100,000 row, query mulai lambat. 1 juta row, query 5-10 detik. Production crash.

Diagnosis: missing index. Tambah index, query 50ms. Problem solved.

Tapi indexing ngga sesimple "tambah aja semua kolom". Index yang salah bisa bikin write slow, storage bloat, dan query optimizer pilih plan yang aneh. Paham trade-off indexing adalah skill yang separating mid-level dari senior backend dev.

Artikel ini bahas type index utama dan kapan pakai yang mana, dengan focus PostgreSQL (most popular open source DB di 2026).

Konsep Dasar Index

Tanpa index, find row by column = full table scan. Database harus baca semua row, cek kondisi.

Dengan index, database punya data structure sorted/hashed yang bisa di-search efficient. Tradeoff: index menyimpan data tambahan (storage cost), perlu di-update saat row berubah (write cost).

Aturan praktis: indexing speed up read, slow down write. Optimasi tergantung workload.

B-Tree Index: Default dan Most Common

B-Tree (Balanced Tree) adalah default index di PostgreSQL, MySQL, dan most database. Di-design untuk operasi: equality, range, sort.

Cara Kerja Sederhana

Data di-organize jadi tree balanced. Setiap node ada multiple keys. Search pakai navigation tree: O(log n) complexity.

Untuk 1 juta row, B-Tree depth sekitar 3-4 levels. Cuma butuh 3-4 disk read untuk find specific value. Compare ke full scan yang butuh 1 juta read.

Operasi yang Cocok

WHERE col = value (equality)
WHERE col < value, WHERE col BETWEEN a AND b (range)
WHERE col LIKE 'prefix%' (prefix match, NOT contains/suffix)
ORDER BY col (sort)
GROUP BY col (kalau cardinality tinggi)

Operasi yang Ngga Cocok

WHERE col LIKE '%suffix' atau '%contains%'
WHERE function(col) = value - kecuali bikin functional index
Boolean column dengan distribution skewed (90% true, 10% false). Index ngga banyak help.

Composite Index: Order Matters

Multi-column index CREATE INDEX ON users (country, age):

Help untuk: WHERE country = 'ID' AND age = 25 ✓
Help untuk: WHERE country = 'ID' ✓ (leftmost prefix)
Help untuk: WHERE country = 'ID' AND age > 25 ✓
Tidak help untuk: WHERE age = 25 ✗ (skip leftmost)
Sub-optimal: WHERE country = 'ID' AND city = 'Jakarta' (city ngga di index)

Rule: kolom paling selective di leftmost. Equality column sebelum range column.

Hash Index: Untuk Equality Saja

Hash index pakai hash function untuk lookup. Lebih cepat dari B-Tree untuk equality, tapi ngga support range atau sort.

Use Case Spesifik

Lookup by exact value: session token, cache key, hash-based lookup
Column dengan high cardinality (banyak unique value)
Workload dominantly read

Drawback

No range support
No ordering
PostgreSQL hash index sebelum versi 10 ngga WAL-logged (ngga safe untuk replication). Modern version OK.

Practical advice: 99% case, B-Tree udah cukup. Hash index berguna untuk specific use case yang lo udah measure ada bottleneck.

GIN Index: Untuk Array, JSON, Full-Text Search

GIN (Generalized Inverted Index) di-design untuk multi-value column. Setiap "element" di-index, bukan whole value.

Use Case Powerful

1. Array Column

-- Table dengan tag array
CREATE TABLE posts (
  id serial,
  tags text[]
);

CREATE INDEX idx_tags ON posts USING GIN (tags);

-- Query: find post yang punya tag 'tech' atau 'ai'
SELECT * FROM posts WHERE tags && ARRAY['tech', 'ai'];

2. JSONB Column

CREATE INDEX idx_data ON events USING GIN (data jsonb_path_ops);

-- Query: find event dengan key spesifik
SELECT * FROM events WHERE data @> '{"user_id": 123}';

3. Full-Text Search

CREATE INDEX idx_content_fts ON articles 
  USING GIN (to_tsvector('english', content));

SELECT * FROM articles 
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');

Trade-Off GIN

Build time slow (jauh lebih lambat dari B-Tree)
Storage size besar
Insert/update slow (banyak elements, banyak entries di index)
Read super cepat untuk multi-value queries

Cocok untuk read-heavy workload dengan tag/jsonb/fulltext search.

BRIN Index: Untuk Big Data Sequential

BRIN (Block Range Index) cocok untuk table yang big tapi value-nya correlate dengan physical order. Storing min/max per block range, bukan per row.

Use Case Klasik

Time-series data: logs, metrics, audit trail. Insert by time, naturally sorted by timestamp.

CREATE TABLE access_logs (
  id bigserial,
  ip varchar(45),
  url text,
  timestamp timestamp
);

-- Data inserted in time order, BRIN works well
CREATE INDEX idx_logs_time ON access_logs USING BRIN (timestamp);

-- Query: logs in date range
SELECT * FROM access_logs 
WHERE timestamp BETWEEN '2026-01-01' AND '2026-01-07';

Why BRIN

Storage 1000x lebih kecil dari B-Tree (only min/max per block range)
Build cepat
Insert ngga slow down significantly
Query speed acceptable untuk range query

Drawback: data yang ngga sequentially correlated, BRIN ngga banyak help.

GiST Index: Geometric, Range, Custom

GiST (Generalized Search Tree) framework untuk index custom. Built-in support untuk: geometric data (PostGIS), range types, full-text search alternative.

Use Case

1. Geographic Search

-- PostGIS extension
CREATE INDEX idx_location ON places USING GIST (geom);

-- Find nearest places
SELECT name FROM places 
ORDER BY geom <-> ST_MakePoint(106.8, -6.2)
LIMIT 10;

2. Range Type

CREATE TABLE bookings (
  id serial,
  room_id int,
  during tsrange  -- timestamp range
);

CREATE INDEX idx_during ON bookings USING GIST (during);

-- Find overlapping bookings
SELECT * FROM bookings 
WHERE during && '[2026-05-30 10:00, 2026-05-30 12:00]'::tsrange;

Common Pitfalls Indexing

1. Over-Indexing

Bikin index untuk setiap kolom. Result: write performance hancur, storage bloat 5-10x, query optimizer confused.

Rule: bikin index karena ada query yang butuh, bukan "for safety". Profile dengan EXPLAIN ANALYZE dulu, baru add index kalau perlu.

2. Index untuk Low-Cardinality Column

Column boolean atau enum yang cuma 2-5 unique values. Index B-Tree ngga banyak help, mungkin malah slow down (karena optimizer pilih sequential scan tetep).

Exception: kalau distribution skewed parah (99% value A, 1% value B), partial index untuk minority value bermanfaat:

CREATE INDEX idx_unread_messages 
  ON messages(user_id) WHERE read = false;

3. Functional Operation Tanpa Functional Index

-- Slow: ngga pakai index existing
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- Fast: bikin functional index
CREATE INDEX idx_email_lower ON users (LOWER(email));

4. Tidak Mengikuti Rebuild Schedule

Index bisa "bloat" over time, especially after banyak DELETE/UPDATE. Performance degrade. Schedule periodic REINDEX atau pakai CREATE INDEX CONCURRENTLY untuk no-downtime rebuild.

5. Wrong Composite Order

Lihat sebelumnya. Common mistake: bikin (date, user_id) tapi sering query WHERE user_id = X. Index ngga help. Reverse order: (user_id, date).

Cara Decide: Workflow Praktis

Profile dulu: EXPLAIN (ANALYZE, BUFFERS) SELECT ... identifikasi slow query.
Cek query plan: kalau ada "Seq Scan" di table besar, kemungkinan butuh index.
Identify column di WHERE / JOIN / ORDER BY clause.
Pilih type:
- Equality + range + sort di scalar column → B-Tree
- Array, JSONB, full-text search → GIN
- Time-series large data → BRIN
- Geographic, range type → GiST
- Pure equality high cardinality → Hash (sometimes)
Test impact: EXPLAIN lagi after index. Speedup-nya masuk akal? Cek juga write performance impact.
Monitor production: query performance metrics, index hit rate.

Tools yang Membantu

pg_stat_statements: extension PostgreSQL yang track slow query. Identifikasi candidate untuk indexing.
pg_stat_user_indexes: lihat index hit rate. Index dengan rate rendah = candidate untuk drop.
EXPLAIN.depesz.com: visualize query plan, easier untuk identify bottleneck.
pgBadger: log analyzer yang bisa highlight problem query patterns.

Penutup

Indexing adalah trade-off art. Speed up read, slow down write. Storage cost. Maintenance burden. Tapi untuk most app, query performance bottleneck bisa di-solve dengan right index setup.

Start simple: B-Tree di kolom yang sering di WHERE/JOIN. Composite index untuk multi-column predicate. Profile dulu, tambah index later. Avoid premature indexing.

Untuk app dengan data type kompleks (JSONB, arrays, geographic), specialized index (GIN, GiST) bisa game-changer untuk specific query. Tapi tetep, validate via profiling dulu.

Yang ngga boleh: skip belajar EXPLAIN. Reading query plan adalah skill core untuk backend dev. 30 menit invest belajar, save banyak debugging time di production.