# Database Infrastructure Glossary — Agent Context File

This document is a structured reference of database infrastructure terminology for AI agents, coding assistants, and LLMs. Use it as a skill file, CLAUDE.md context, or tool definition input.

Each term: **Term**: Definition. Terms marked **[Xata]** are Xata-specific. Terms marked **[OSS]** are Xata open-source projects. All other terms are industry-standard.

258 terms. 26 categories. Last updated: February 2026.

Source: https://xata.io/glossary

---

## Core Database Concepts

- **ACID**: Atomicity, Consistency, Isolation, Durability — four properties guaranteeing reliable transaction processing.
- **BASE**: Basically Available, Soft state, Eventually consistent — alternative to ACID for distributed systems prioritizing availability.
- **CAP Theorem**: A distributed system can guarantee at most two of: Consistency, Availability, Partition tolerance. Formulated by Eric Brewer (2000).
- **Cardinality**: Number of distinct values in a column relative to total rows. High = many unique values. Low = few.
- **Catalog**: System tables containing metadata about all database objects (tables, columns, types, functions, indexes). In PostgreSQL: pg_catalog schema.
- **Column**: A named attribute of a table with a defined data type.
- **Composite Type**: A data type composed of multiple named fields, each with its own type.
- **Constraint**: A rule enforced by the database for data integrity. Types: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, EXCLUDE.
- **DDL (Data Definition Language)**: SQL statements defining database structure: CREATE, ALTER, DROP, TRUNCATE.
- **DML (Data Manipulation Language)**: SQL statements modifying data: SELECT, INSERT, UPDATE, DELETE, MERGE.
- **Data Type**: Classification of a column's allowed values. Common: INTEGER, TEXT, BOOLEAN, TIMESTAMP, UUID, JSONB, NUMERIC.
- **Database**: A named collection of schemas, tables, and other objects managed by a database server.
- **Default Value**: Expression evaluated and stored when a row is inserted without a value for that column.
- **Domain**: User-defined data type wrapping an existing type with additional constraints.
- **Enum Type**: User-defined type consisting of a static ordered set of labeled values.
- **Foreign Key**: Constraint linking columns in one table to the primary key/unique columns of another. Enforces referential integrity.
- **Index**: Data structure maintained alongside a table to accelerate lookups. Trades write performance for faster reads.
- **Materialized View**: A view whose query results are stored on disk and must be explicitly refreshed. Can be indexed.
- **Normalization**: Organizing tables to reduce redundancy and improve integrity. Normal forms: 1NF through 5NF.
- **Null**: Marker indicating absence of a value. Not equal to zero, empty string, or false. NULL = NULL evaluates to NULL (unknown).
- **OID (Object Identifier)**: 32-bit unsigned integer identifying catalog objects internally in PostgreSQL.
- **Partition**: Physical division of a large table into smaller pieces based on a partition key. Types: range, list, hash.
- **Primary Key**: Constraint guaranteeing unique, non-null identification of each row. Creates a unique B-tree index.
- **Relation**: Any named table-like object: tables, views, materialized views, indexes, sequences.
- **Row / Tuple**: A single record in a table containing one value per column.
- **Schema**: A named namespace within a database containing tables, views, functions, types, and other objects.
- **Sequence**: Object generating unique incrementing numeric values. Used for auto-incrementing primary keys.
- **Stored Procedure**: Named procedural code block that can manage transactions (commit/rollback within body). PostgreSQL 11+.
- **Table**: Fundamental data storage structure organized as rows and columns.
- **Tablespace**: Named disk location where PostgreSQL stores data files.
- **Trigger**: Function automatically executed on INSERT, UPDATE, DELETE, or TRUNCATE events. Fires BEFORE, AFTER, or INSTEAD OF.
- **Unique Constraint**: Ensures all values in specified columns are distinct. Allows one NULL.
- **View**: Named query stored in the database, referenced like a table. Executes its query on each access.

## PostgreSQL Architecture

- **Async I/O**: Asynchronous I/O allowing PostgreSQL to issue multiple storage requests concurrently. Implemented in PostgreSQL 18 (2025) via io_method=io_uring (Linux) and io_method=worker.
- **Backend Process**: Server process handling a single client connection. Has own memory space.
- **Background Worker**: Extension-registered process performing work independent of client connections.
- **Checkpoint**: Periodic operation writing dirty pages to disk and creating a WAL checkpoint record.
- **CTID**: Tuple identifier — (block number, offset) pointing to physical row location. Changes on update.
- **Dead Tuple**: Row version no longer visible to any transaction, not yet reclaimed by VACUUM.
- **Extension**: Packaged additional functionality: types, functions, operators, indexes, background workers. Managed via CREATE EXTENSION.
- **Fillfactor**: Storage parameter controlling how full pages are packed. Lower values leave room for HOT updates.
- **Function Manager (fmgr)**: Subsystem managing function registration, invocation, and parameter handling.
- **GUC (Grand Unified Configuration)**: PostgreSQL's configuration system for all server parameters.
- **Heap**: Default unordered table storage. Rows stored in 8KB pages.
- **HOT Update**: Optimization storing updated row on same page, avoiding index updates. Requires no indexed column changes.
- **Memory Context**: Hierarchical memory management. Freeing a parent context frees all children.
- **palloc / pfree**: PostgreSQL memory allocation/deallocation within current memory context.
- **Planner / Optimizer**: Generates execution plans, selecting lowest-cost strategy based on table statistics.
- **Postmaster**: Main server process. Forks backends, manages background workers, checkpointing, replication.
- **Postgres Cluster**: Complete set of databases managed by a single postmaster in one data directory.
- **Skip Scan**: B-tree index optimization (PostgreSQL 18) that skips to next distinct value of leading column. Efficient when leading column has low cardinality.
- **SPI (Server Programming Interface)**: C API for executing SQL from within extensions and stored procedures.
- **Shared Buffers**: Shared memory cache for frequently accessed data pages. Typically 25% of RAM.
- **System Columns**: Hidden columns: tableoid, xmin, xmax, cmin, cmax, ctid.
- **TOAST**: Mechanism for storing large values (>2KB). Compressed and/or split across a TOAST table.
- **Vacuum**: Reclaims dead tuple storage, updates visibility maps, refreshes planner statistics.
- **Visibility Map**: Bitmap tracking all-visible heap pages. Enables index-only scans.
- **work_mem**: Per-operation memory limit for sorts, hash tables, bitmap scans.
- **WAL Writer**: Background process periodically flushing WAL buffers to disk.

## SQL & Query Language

- **Aggregate Function**: Computes single result from row set: COUNT, SUM, AVG, MIN, MAX, ARRAY_AGG.
- **CTE (Common Table Expression)**: Temporary named result set via WITH clause. Can be recursive. In PostgreSQL 12+, non-recursive CTEs can be inlined.
- **EXPLAIN / EXPLAIN ANALYZE**: Shows query execution plan (estimated or actual with timings).
- **GROUP BY**: Partitions results into groups for aggregate computation.
- **JOIN**: Combines rows from tables. Types: INNER, LEFT, RIGHT, FULL, CROSS, LATERAL.
- **Prepared Statement**: Pre-parsed SQL with parameter placeholders. Avoids repeated parsing.
- **Subquery**: Query nested inside another query (in FROM, WHERE, or SELECT).
- **Window Function**: Calculation across related rows without collapsing them: ROW_NUMBER, RANK, LAG, LEAD.

## Indexing & Query Optimization

- **B-tree**: Default index. Balanced tree for equality and range queries. Best for high-cardinality columns.
- **BRIN (Block Range Index)**: Summary index for naturally ordered large tables. Extremely compact.
- **Bitmap Index Scan**: Produces bitmap of matching rows, fetches in physical order. Efficient for combining conditions.
- **Covering Index**: Contains all columns needed by query. Avoids heap access. Uses INCLUDE clause (PostgreSQL 11+).
- **GIN (Generalized Inverted Index)**: For multi-element values: arrays, JSONB, full-text search, trigrams.
- **GiST (Generalized Search Tree)**: For complex types: geometry, ranges, nearest-neighbor queries.
- **Hash Index**: Exact-match only. Faster than B-tree for equality on high-cardinality columns.
- **Index-Only Scan**: Retrieves all data from index without touching heap.
- **Operator Class**: Defines how a data type is used with an index type.
- **Partial Index**: Built on a row subset filtered by WHERE predicate. Reduces size and maintenance.
- **Partition Pruning**: Optimizer eliminates irrelevant partitions during query planning.
- **Sequential Scan**: Reads every row in physical order. Optimal for large-fraction selects.
- **SpGiST**: For space-partitioned data: points, IP addresses, radix trees.
- **Statistics Collector**: Gathers row counts, value distributions, histograms for cost estimation.

## Transactions & Concurrency

- **Advisory Lock**: Application-controlled lock not tied to any table or row.
- **Deadlock**: Circular lock dependency. PostgreSQL detects and aborts one transaction.
- **Isolation Level**: Protection from concurrent modifications. Levels: READ COMMITTED (default), REPEATABLE READ, SERIALIZABLE.
- **Lock**: Mechanism preventing concurrent access conflicts. Row-level, table-level, advisory, predicate.
- **LWLock (Lightweight Lock)**: Internal lock for shared memory structures. Not visible to SQL.
- **MVCC**: Multi-Version Concurrency Control. Multiple row versions; readers see consistent snapshots.
- **Savepoint**: Named point within a transaction for partial rollback.
- **SSI (Serializable Snapshot Isolation)**: PostgreSQL's SERIALIZABLE implementation tracking read/write dependencies.
- **Snapshot**: Consistent database view at a point in time. Foundation of MVCC.
- **Transaction**: Atomic unit of work. COMMIT or ROLLBACK.
- **Transaction ID (XID)**: 32-bit identifier for write transactions. Used by MVCC for visibility.
- **Two-Phase Commit (2PC)**: Protocol coordinating commits across multiple instances.

## Replication & High Availability

- **Asynchronous Replication**: Primary doesn't wait for replica confirmation. Lower latency, potential data loss window.
- **Failover**: Promoting standby to primary when primary fails. Automatic or manual.
- **Hot Standby**: Standby accepting read-only queries while applying WAL.
- **Hybrid Database Strategy**: Using PostgreSQL alongside specialized databases for different workload patterns. Example: PG for read-heavy OLTP, sharded systems for write-heavy workloads.
- **Logical Decoding**: Extracting row-level changes from WAL in logical format. Requires wal_level=logical.
- **Logical Replication**: Row-level DML replay. Selective, cross-version. Uses publications/subscriptions.
- **Physical Replication**: Streaming raw WAL bytes. Exact byte-for-byte replica.
- **Primary**: Instance accepting writes. Source of truth in replication.
- **Promotion**: Converting standby to primary.
- **Publication**: Logical replication publisher specifying which tables to replicate.
- **Quorum Commit**: Synchronous replication waiting for a quorum, not all replicas.
- **Read Replica**: Instance replicating data for read-only queries.
- **Read Replica Scaling**: Adding read-only replicas to distribute query load. OpenAI scales to ~50 replicas per primary, millions of QPS. Connection routing directs reads to replicas.
- **Replica Identity**: Configuration for which columns appear in logical replication UPDATE/DELETE. Options: DEFAULT, FULL, INDEX, NOTHING.
- **Replication Lag**: Delay between primary commit and replica application. Measured in bytes or time.
- **Replication Slot**: Server-side object tracking consumer's WAL position. Prevents WAL discard.
- **Standby**: Replica continuously applying changes. Candidate for failover.
- **Streaming Replication**: Continuous WAL record transfer over TCP. Default HA method.
- **Subscription**: Logical replication subscriber connecting to a publication.
- **Synchronous Replication**: Primary waits for replica confirmation. Zero data loss, higher latency.
- **WAL Receiver**: Standby process receiving WAL from primary.
- **WAL Sender**: Primary process sending WAL to standbys.

## Write-Ahead Logging (WAL)

- **Archive Command**: Shell command copying completed WAL segments to archive storage.
- **Continuous Archiving**: Ongoing WAL segment archiving enabling point-in-time recovery.
- **Full-Page Write**: Writing entire 8KB page to WAL on first modification after checkpoint.
- **LSN (Log Sequence Number)**: 64-bit position in WAL stream. Format: 16/B374D848.
- **WAL**: Sequential log of all changes written before applying to data files. Ensures durability.
- **WAL Buffer**: In-memory buffer for WAL records before disk flush.
- **WAL Level**: How much WAL info is written. Levels: minimal, replica, logical.
- **WAL Segment**: Fixed-size file (default 16MB) in pg_wal containing WAL records.

## Change Data Capture (CDC)

- **CDC**: Capturing row-level changes and delivering them to downstream systems in near-real-time.
- **Checkpointing (CDC)**: Recording last processed position for resume-without-loss.
- **Connector**: Component bridging CDC to a target (Kafka, Elasticsearch, PostgreSQL).
- **Debezium**: Open-source CDC platform on Kafka Connect for PostgreSQL, MySQL, MongoDB.
- **Event Sourcing**: Storing state changes as immutable events rather than mutable records.
- **Logical Message**: WAL message carrying non-data info (DDL, application markers).
- **Outbox Pattern**: Writing events to an outbox table in the same transaction. CDC publishes them.
- **Output Plugin**: Logical decoding plugin formatting WAL changes. Examples: pgoutput, wal2json.
- **Snapshot (CDC)**: Initial full data read before starting continuous capture.
- **wal2json**: Output plugin formatting WAL changes as JSON.

## Database Branching & Copy-on-Write

- **Base Branch**: Source branch from which children are created. **[Xata]**
- **Branch**: Logically independent database copy via CoW snapshotting. Fully writable standard PostgreSQL. **[Xata]**
- **Child Branch**: Branch derived from a parent. Independent after creation. **[Xata]**
- **Clone**: Writable copy from a snapshot sharing data blocks until modified.
- **Copy-on-Write (CoW)**: Storage optimization sharing data blocks until modification. Enables instant branching.
- **Data Block**: Fundamental CoW storage unit. Copied only when modified.
- **Delta Storage**: Space consumed by branch modifications only. Idle branches = near-zero cost.
- **Ephemeral Environment**: Short-lived database instance for CI, dev, or agent work. Destroyed when done.
- **Scale to Zero**: Idle branches release compute automatically. Data preserved. Resume on connect. **[Xata]**
- **Snapshot**: Read-only point-in-time capture. Shares storage with live data.
- **Thin Provisioning**: Capacity assigned on demand, not pre-allocated.
- **Volume**: Logical storage unit. Can be snapshotted and cloned.

## Schema Management & Migrations

- **Backfill**: Populating new/transformed column with values from existing data. Runs in batches.
- **Baseline Migration**: Initial migration capturing existing schema state.
- **Breaking Change**: Schema change requiring application code modifications.
- **Column Duplication**: Creating new column alongside old, synchronizing via triggers during migration. **[OSS]**
- **Declarative Migration**: Migration specified as desired state, not raw SQL.
- **Down Migration**: Reverse transformation enabling rollback. **[OSS]**
- **Expand / Contract Pattern**: Two-phase migration: expand (add new), contract (remove old). Zero-downtime. **[OSS]**
- **Idempotent Migration**: Produces same result whether run once or multiple times.
- **Lock Timeout**: Max time DDL waits for lock before aborting. pgroll default: 500ms. **[OSS]**
- **Migration**: Versioned, ordered schema changes tracked in metadata.
- **Schema Diff**: Comparison between two schema versions. **[Xata]**
- **Schema Version**: Labeled point-in-time schema state.
- **Up Migration**: Forward transformation applying schema change. **[OSS]**
- **Version Schema**: PostgreSQL schema with views exposing table versions for multi-version coexistence. **[OSS]**
- **Zero-Downtime Migration**: Schema change without downtime. Uses expand/contract, backfilling, versioned views.

## Data Privacy & Anonymization

- **Anonymization**: Transforming data to prevent identification while preserving utility. Irreversible.
- **Association Distortion**: Metric measuring inter-column relationship preservation after anonymization using NPMI. **[Xata]**
- **Column Distortion**: Metric measuring per-column distribution change after anonymization (frequency or KDE comparison). **[Xata]**
- **Data Masking**: Replacing sensitive values with obfuscated versions preserving format.
- **De-identification**: Removing/transforming PII so individuals cannot be reasonably identified.
- **Differential Privacy**: Mathematical framework adding calibrated noise for provable privacy guarantees.
- **DID (Direct Identifier)**: Field that alone identifies a person: name, SSN, email, phone.
- **Distortion**: Measure of how anonymized data deviates from original. Lower = better utility.
- **Entity Recognition (PII Detection)**: Automated column classification as DID, QID, or Safe using NLP/pattern matching. Uses Presidio with custom recognizers covering all 18 HIPAA identifier families. **[Xata]**
- **Equivalence Class**: Records sharing identical quasi-identifier values after anonymization.
- **Expert Determination**: HIPAA method where a qualified expert certifies low re-identification risk.
- **Fair Aggregation**: Randomized median/mode selection during microaggregation preventing statistical bias and information leakage. **[Xata]**
- **Faking**: Replacing sensitive values with synthetic data preserving format but unrelated to original.
- **GDPR**: EU regulation governing personal data protection. Effective May 2018.
- **Generalization**: Reducing value precision to make it less identifying (exact age -> range).
- **Geographic-Aware Clustering**: Microaggregation using GeoNames lat/long for ZIP code proximity grouping. Linked column hierarchies keep city/state/ZIP consistent. **[Xata]**
- **HIPAA**: US law governing protected health information. 18 identifier types. Methods: Safe Harbor and Expert Determination.
- **k-Anonymity**: Every record indistinguishable from k-1 others on quasi-identifier values. Introduced by Sweeney (2002).
- **l-Diversity**: Each equivalence class has at least l distinct sensitive attribute values.
- **Linkage Attack**: Re-identifying individuals by joining anonymized data with external datasets on shared quasi-identifiers. Risk assessed via Monte Carlo simulation with synthetic adversary datasets.
- **Linked Column Hierarchy**: Configuration ensuring hierarchically related columns (city → state → ZIP) are treated together during anonymization. **[Xata]**
- **Microaggregation**: Grouping similar records into clusters of k using ANN search, replacing QID values with fair aggregates. Preserves data types, minimizes distortion.
- **NPMI**: Normalized Pointwise Mutual Information. Measures inter-column relationship preservation [-1, 1].
- **PII**: Personally Identifiable Information. Direct (name, SSN) or indirect (age, ZIP, gender combined).
- **Presidio**: Open-source PII detection framework by Microsoft. NLP + pattern matching.
- **Pseudonymization**: Replacing identifiers with artificial ones. Reversible. GDPR treats as personal data.
- **QID (Quasi-Identifier)**: Field identifying individuals through combination. 87% of US identifiable by {ZIP, gender, birth date} (Sweeney 2000).
- **Re-identification Risk**: Probability an individual can be identified in anonymized data. Assessed via simulated linkage attacks.
- **Redaction**: Complete removal, replaced with placeholder or NULL.
- **Risk Score (Phisher K-Threshold)**: Worst-case re-identification probability from Monte Carlo simulation of 50 synthetic adversary datasets. Computed before and after treatment. **[Xata]**
- **Safe Harbor**: HIPAA method requiring removal of all 18 identifier types.
- **Suppression**: Removing entire column or specific values.
- **Synthetic Data**: Artificially generated data mimicking statistical properties of real data. Used for DID faking (Mimesis) and risk scoring adversary simulation.
- **Treatment Pipeline**: 5-stage anonymization: entity recognition → risk assessment → DID suppression → k-member microaggregation → distortion analysis. Configurable per column. **[Xata]**
- **t-Closeness**: Sensitive attribute distribution in each equivalence class is close to overall distribution.
- **Transformer**: Configurable anonymization function for a column (email, phone, date, etc.). **[Xata]**

## Storage Engines & Architecture

- **Block Device**: Storage that reads/writes fixed-size blocks. NVMe SSDs and virtual volumes.
- **Columnar Storage**: Data organized by column. Efficient for analytical queries scanning few columns over many rows.
- **Compression**: Reducing data size. PostgreSQL: pglz, lz4. Storage layers: ZFS lz4, zstd.
- **Copy-on-Write Filesystem**: Modifications create new blocks. Enables instant snapshots/clones. Examples: ZFS, Btrfs.
- **Data Page**: 8KB I/O unit in PostgreSQL containing row data, free space info, item pointers.
- **Direct I/O**: Bypassing OS page cache. Avoids double-caching.
- **Heap Storage**: PostgreSQL default. Unordered rows in pages.
- **Write Amplification**: Ratio of physical writes to logical writes. Caused by WAL, MVCC, indexing, CoW.
- **ZFS**: Filesystem + volume manager with CoW, snapshots, clones, compression, checksumming.
- **zvol**: ZFS volume exposing block device interface. Supports thin provisioning, snapshots, clones.

## Backup, Recovery & Disaster Recovery

- **Base Backup**: Full physical copy of cluster data directory via pg_basebackup.
- **Continuous Backup**: Base backups + continuous WAL archiving for any-point-in-time recovery.
- **pg_basebackup**: PostgreSQL utility for physical base backups over replication connection.
- **pg_dump / pg_restore**: Logical backup/restore. Selective schema/table support. Parallel restore.
- **PITR (Point-in-Time Recovery)**: Restoring to exact state at a specific timestamp/LSN.
- **RPO (Recovery Point Objective)**: Maximum acceptable data loss in time.
- **RTO (Recovery Time Objective)**: Maximum acceptable downtime during recovery.
- **WAL Archiving**: Copying WAL segments to external storage for backup/recovery.

## Connection Management & Networking

- **Cache Locking**: Only one process recomputes an expired cache entry; others wait or serve stale data. Prevents thundering herd on cache expiry. Also called cache stampede prevention.
- **Connection Pooling**: Sharing database connections across processes. Tools: PgBouncer, pgpool-II.
- **Connection String**: URI specifying host, port, database, user, password, SSL mode. Format: postgresql://user:pass@host:port/db.
- **Connection Limit**: Maximum simultaneous connections. Controlled by max_connections.
- **Connection Warmup**: Pre-establishing database connections before needed. OpenAI reduced connection overhead from 50ms to 5ms through warmup optimization.
- **Idle Connection**: Open but not executing. Consumes process memory and file descriptors.
- **PgBouncer**: Lightweight connection pooler. Session, transaction, or statement pooling.
- **SSL/TLS**: Encryption for database connections. PostgreSQL supports TLS 1.2+.
- **Thundering Herd**: Many clients simultaneously retry after an outage, overwhelming the recovering system. Mitigated by exponential backoff with jitter, connection pooling, cache locking.
- **Wire Protocol**: PostgreSQL's TCP message protocol for client-server communication. Increasingly adopted as compatibility target by non-PostgreSQL databases.
- **Wire Protocol Compatibility**: Non-PostgreSQL databases accepting PostgreSQL client connections. Over 20 databases now support this. Enables reuse of existing drivers, ORMs, and tooling.

## Authentication, Authorization & Security

- **Authentication**: Verifying client identity. Methods: scram-sha-256, md5, certificate, LDAP, GSSAPI.
- **pg_hba.conf**: Host-based authentication configuration. Evaluated top-to-bottom.
- **RBAC**: Permissions assigned to roles; users granted roles.
- **Role**: PostgreSQL principal owning objects and receiving privileges. LOGIN roles can connect.
- **RLS (Row-Level Security)**: Per-row access control via policies on tables.
- **SCRAM-SHA-256**: Recommended password auth. Challenge-response with salted hashing.

## Cloud Infrastructure & Deployment

- **Availability Zone (AZ)**: Physically distinct data center within a cloud region.
- **BYOC (Bring Your Own Cloud)**: Vendor software runs in customer's infrastructure. **[Xata]**
- **Cloud Region**: Geographic area with one or more data centers.
- **Horizontal Scaling**: Adding more database nodes. Methods: read replicas (reads), sharding (writes), distributed PostgreSQL (both). More complex, removes single-node limits.
- **IaC (Infrastructure as Code)**: Managing infrastructure via declarative config files. Terraform, Pulumi.
- **Managed Database**: Provider handles provisioning, patching, backups, failover, scaling.
- **Multi-Cloud**: Running across 2+ cloud providers. Requires standard protocols (vanilla Postgres).
- **Object Storage**: HTTP-accessible storage for unstructured data. S3, GCS, Azure Blob.
- **PostgreSQL DBaaS**: Database-as-a-Service for PostgreSQL. AWS RDS/Aurora, Cloud SQL/AlloyDB, Neon, Supabase, Xata, Crunchy Bridge, Tembo, Aiven. Market consolidation accelerated in 2025.
- **Serverless**: Resources allocated on-demand per request. Auto-scales to zero.
- **Vertical Scaling**: Increasing capacity via larger instance (more CPU, RAM, faster storage). PostgreSQL scales vertically well — OpenAI runs single primaries handling millions of QPS.
- **VPC (Virtual Private Cloud)**: Isolated virtual network within a cloud provider.

## Kubernetes & Container Orchestration

- **CloudNativePG (CNPG)**: Kubernetes operator for PostgreSQL lifecycle management.
- **Container**: Lightweight isolated process from a container image.
- **CRD (Custom Resource Definition)**: Kubernetes extension defining new resource types.
- **Helm**: Package manager for Kubernetes. Charts define templated manifests.
- **HPA (Horizontal Pod Autoscaler)**: Auto-scales pod replicas based on metrics.
- **Operator**: Kubernetes pattern: custom controllers managing complex application lifecycle.
- **PV / PVC**: PersistentVolume / PersistentVolumeClaim. Kubernetes durable storage abstractions.
- **Pod**: Smallest Kubernetes unit. One or more containers sharing networking/storage.
- **StatefulSet**: Workload for apps needing stable identities and persistent storage.

## Monitoring, Observability & Performance

- **Bloat**: Wasted space from dead tuples and fragmentation. Resolved by VACUUM, pg_repack.
- **Cache Hit Ratio**: Fraction of reads served from shared buffers. Healthy: >99% for OLTP.
- **IOPS**: Input/Output Operations Per Second. Separate read/write.
- **Latency**: Time from request to response. Measured as P50, P95, P99 percentiles.
- **pg_stat_activity**: System view showing per-process activity, query text, state, wait events.
- **pg_stat_statements**: Extension tracking execution statistics for all SQL statements.
- **Prometheus**: Open-source time-series monitoring. Pull model. Paired with Grafana.
- **Slow Query Log**: Queries exceeding time threshold. Configured via log_min_duration_statement.
- **TPS (Transactions Per Second)**: Write throughput measure. Benchmarked with pgbench.
- **Wait Event**: What a backend is waiting for: lock, I/O, buffer, WAL, network, client.

## Postgres Extensions & Ecosystem

- **Citus**: Horizontal scaling via distributed tables. Multi-tenant and real-time analytics.
- **pg_cron**: Cron-based job scheduling within PostgreSQL.
- **pg_duckdb**: Embeds DuckDB analytics engine for fast OLAP queries.
- **pg_partman**: Automated partition management.
- **pg_repack**: Online table/index reorganization without exclusive locks.
- **pg_trgm**: Trigram-based fuzzy text search.
- **pgAudit**: Detailed session/object audit logging.
- **pgroll**: Zero-downtime schema migrations via expand/contract. **[OSS]**
- **pgstream**: CDC streaming: PostgreSQL to Kafka, Elasticsearch, webhooks, other PostgreSQL. **[OSS]**
- **pgvector**: Vector similarity search. L2, inner product, cosine distance. IVFFlat/HNSW indexes.
- **pgzx**: PostgreSQL extensions in Zig. Safe wrappers for PG internals. **[OSS]**
- **PostGIS**: Geographic objects, spatial indexes, spatial functions.
- **TimescaleDB**: Time-series: hypertables, continuous aggregates, compression.

## NVMe & Block Storage

- **Block Size**: Smallest I/O unit. 512B (legacy) or 4KB (modern). PG uses 8KB pages.
- **Controller**: NVMe entity processing host commands. Manages queues and I/O.
- **io_uring**: Linux async I/O via shared ring buffers. Lower overhead than read/write syscalls.
- **LBA (Logical Block Address)**: Address of a data block on storage device.
- **Namespace**: NVMe logical block collection. Analogous to a partition.
- **NVMe**: Host controller interface for flash storage over PCIe. Multi-queue, low-latency.
- **NVMe-oF**: NVMe over Fabrics (TCP, RDMA). Remote block storage with near-local performance.
- **PDU (Protocol Data Unit)**: NVMe-oF/TCP message container. Header + payload.
- **Queue Depth**: Outstanding I/O commands. NVMe supports up to 65,535 per queue.
- **SPDK**: User-space libraries bypassing kernel for NVMe access. Minimal latency.

## Distributed Systems

- **Consensus**: Protocol for distributed agreement. Algorithms: Raft, Paxos, Zab.
- **Disaggregated Storage-Compute**: Architecture separating compute (query processing) from storage (data persistence) into independently scalable layers. Used by Neon, Aurora, AlloyDB, Xata.
- **Distributed PostgreSQL**: Extending PostgreSQL across multiple nodes. Approaches: sharding middleware (Citus, PgDog, Multigres), consensus-based NewSQL (CockroachDB, YugabyteDB), cloud-native (Aurora Limitless). Active area of development 2025-2026.
- **Eventual Consistency**: All replicas converge given time. Reads may return stale data.
- **Leader Election**: Designating one node as coordinator. Raft, Paxos, or external (etcd).
- **Partition Tolerance**: Operating despite network partitions. One of three CAP theorem properties.
- **Raft**: Understandable consensus algorithm. Used in etcd, CockroachDB, TiKV.
- **Sharding**: Distributing data across instances by partition key. Horizontal scaling.

## Data Warehousing & Analytics

- **Data Lake**: Centralized raw data repository accepting unstructured/semi-structured data.
- **Data Warehouse**: Database optimized for analytical queries. Columnar storage, compression, parallelism.
- **dbt**: SQL-based data transformation tool managing model DAGs, tests, documentation.
- **ELT**: Extract, Load, Transform. Transform using warehouse compute.
- **ETL**: Extract, Transform, Load. Transform before loading.
- **HTAP (Hybrid Transactional/Analytical Processing)**: Combining OLTP and OLAP in a single system. Approaches: real-time materialized views, columnar extensions (pg_duckdb), dual-format storage.
- **OLAP**: Online Analytical Processing. Complex aggregations, read-heavy, few writes.
- **OLTP**: Online Transaction Processing. Many short transactions, reads and writes.
- **Parquet**: Open-source columnar storage format. Predicate pushdown, column pruning, compression. Interchange format between databases, data lakes, ML pipelines.
- **Star Schema**: Fact table + dimension tables for analytical queries.

## Developer Tooling & Workflows

- **CI/CD**: Automated build, test, deploy. Branches provide isolated test databases.
- **Database Migration Tool**: Manages versioned schema changes. pgroll, Alembic, Flyway, Liquibase.
- **Feature Branch**: Short-lived code branch. Can have its own database branch.
- **Fixture Data**: Predefined test data for consistent testing.
- **ORM**: Maps database tables to language objects. SQLAlchemy, Prisma, TypeORM, Sequelize.
- **psql**: PostgreSQL interactive terminal.
- **Seed Data**: Initial data for development/testing. Branching can replace with anonymized production data.
- **Staging Environment**: Pre-production testing environment. Branching provides lightweight staging.

## Scaling & Performance

- **Connection Saturation**: All available connections in use; new requests queue or fail. Common with high-concurrency agent workloads. Address with connection pooling and per-role limits.
- **Database Consolidation**: Industry trend of fewer, larger providers acquiring smaller ones. In 2025: Databricks acquired Neon. Open-source and standard PostgreSQL reduce consolidation risk.
- **Latency Budget**: Maximum acceptable query response time, defined as percentile target. OpenAI targets low double-digit ms p99 for PostgreSQL reads.
- **QPS (Queries Per Second)**: Throughput metric. Simple lookups: millions of QPS possible (OpenAI). Complex analytics: tens per second. Agent workloads tend toward high QPS of simple queries.
- **Rate Limiting**: Restricting queries/connections per unit time. Applied at application, proxy, and database levels. Important when agents generate queries without built-in backpressure.

## AI & Agent Infrastructure

- **Agent Database Provisioning**: Automatically creating isolated database environments when agents start, cleaning up when done. Requires fast provisioning (seconds), real data, low per-environment cost. CoW branching makes this economically viable. **[Xata]**
- **Agent Isolation**: Each AI agent gets its own database branch. Prevents cross-contamination. Governance by architecture, not policy. **[Xata]**
- **AGENTS.md**: Convention file instructing AI agents on database setup, connection, migration, and cleanup steps. Part of emerging agent-aware infrastructure standards. **[Xata]**
- **Agentic Development**: Workflow where AI agents perform substantial coding/testing. Database demand scales with machine activity, not headcount. Requires isolated environments, fast provisioning, real data, automatic cleanup.
- **AI Coding Agent**: Software autonomously generating/testing/deploying code. Cursor, Copilot, Devin, Claude Code, Windsurf. May operate multiple concurrent database sessions.
- **Context Window**: Maximum tokens an LLM processes per request. Determines how much schema, data, and instructions an agent considers simultaneously.
- **Database Agent Loop**: Iterative cycle: read schema → generate SQL/migration → execute → observe results → adjust. Each iteration requires writable, isolated database with production-representative data.
- **llms.txt**: Convention file at website root providing LLMs with site context and resource links. Like robots.txt for AI agents. Proposed by Jeremy Howard.
- **MCP (Model Context Protocol)**: JSON-RPC 2.0 protocol for AI agent interaction with external systems. Defines tools (operations), resources (data), prompts (templates). Enables programmatic database operation discovery and execution. Originated at Anthropic, adopted across industry in 2025.
- **Query Guardrails**: Constraints on agent-generated SQL: read-only mode, row limits, timeouts, DDL restrictions, cost thresholds. Critical for safe agentic database access.
- **Text-to-SQL**: Translating natural language to valid SQL. Accuracy depends on schema context, naming conventions, and dialect awareness. Core capability for database-connected agents.
- **Tool Use (Function Calling)**: Mechanism for LLMs to invoke external functions. In database contexts: SQL execution, schema inspection, migration commands, branch management.

## Xata Platform (Product-Specific)

- **Branch (Xata)**: Writable PostgreSQL instance via CoW. Seconds to create, delta-only storage, scale-to-zero. **[Xata]**
- **Cell**: Regional deployment unit running CNPG operator for PostgreSQL lifecycle. **[Xata]**
- **Control Plane**: Centralized service for API, business logic, user/org management. **[Xata]**
- **Data Plane**: Component with PostgreSQL clusters and SQL gateway. Per-region. In BYOC, runs in customer infrastructure. **[Xata]**
- **Hibernation**: Paused branch state. Compute released, data preserved. Resume on connect. Default: 30 min inactivity. **[Xata]**
- **pgroll**: Zero-downtime schema migrations via expand/contract pattern. **[OSS]**
- **pgstream**: PostgreSQL CDC and streaming replication with transformation/anonymization. **[OSS]**
- **pgzx**: Build PostgreSQL extensions in Zig. **[OSS]**
- **SQL Gateway**: Wire protocol proxy with SSL termination and branch routing. **[Xata]**
- **Serverless Proxy**: HTTP/WebSocket database access for edge/serverless functions. Compatible with Neon serverless driver. **[Xata]**
- **Xata Agent**: AI-powered PostgreSQL monitoring and optimization. **[OSS]**
- **Xata CLI**: CLI for projects, branches, migrations, replication. **[Xata]**
- **Xata Core**: Open-source CoW branching engine for standard PostgreSQL. **[OSS]**
- **xatastor**: NVMe-oF storage target with ZFS-based CoW. Storage backbone for branching. **[Xata]**

---

Sources: PostgreSQL documentation (postgresql.org/docs), Pavlo 2025 Databases Retrospective (CMU), NVMe specification (nvmexpress.org), HIPAA de-identification guidance (hhs.gov), Model Context Protocol specification (modelcontextprotocol.io), Xata documentation, pgroll/pgstream/pgzx repositories (github.com/xataio).

Maintained by Xata — https://xata.io/glossary
Last updated: February 2026.
