Database Infrastructure Glossary
A structured reference for every term in modern database infrastructure — built for AI agents, coding assistants, and the humans who work with them. Vendor-neutral definitions. Xata-specific terms clearly labeled.
Maintained by
Xata
258 terms across 26 categories
Last updated February 2026
Agent context files
Download these files as context for AI coding agents, LLM tool definitions, CLAUDE.md skills, or offline reference. Same content as this page in a flat, parseable format optimized for token efficiency.
Also referenced in /llms.txt
258 terms
A
ACID
Core DatabaseThe four properties guaranteeing reliable transaction processing: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data survives failures). Coined by Andreas Reuter and Theo Harder in 1983.
See also: Transaction, BASE
Async I/O
PostgreSQLAsynchronous input/output allowing PostgreSQL to issue multiple storage requests concurrently rather than waiting for each to complete sequentially. Implemented in PostgreSQL 18 (2025) via io_method=io_uring on Linux and io_method=worker on other platforms. Significantly improves throughput on NVMe and SSD storage.
See also: NVMe-oF
A function that computes a single result from a set of rows: COUNT, SUM, AVG, MIN, MAX, ARRAY_AGG, STRING_AGG. Used with GROUP BY to compute per-group results.
See also: Window Function
Advisory Lock
TransactionsAn application-controlled lock managed by PostgreSQL but not tied to any table or row. Acquired and released explicitly by application code. Used for coordinating distributed processes or preventing duplicate work.
Asynchronous Replication
Replication & HAReplication where the primary does not wait for replicas to confirm receipt of changes before committing. Offers lower latency but allows a small window of potential data loss on primary failure.
See also: Synchronous Replication
Anonymization
Privacy & AnonymizationThe process of transforming data to prevent identification of individuals while preserving the data's utility for analysis, testing, or development. Distinct from pseudonymization, which is reversible.
See also: Pseudonymization, De-identification
A metric measuring how well inter-column relationships are preserved after anonymization. Uses NPMI (Normalized Pointwise Mutual Information) to compare column pair correlations before and after treatment. Critical for maintaining the analytical value of anonymized datasets.
See also: Column Distortion, NPMI
Providing each AI coding agent with its own isolated database environment (branch) to prevent agent-generated mutations from affecting production or other agents' work. Branching provides governance by architecture rather than by policy.
See also: Branch, Ephemeral Environment, Agent Database Provisioning
Automatically creating isolated database environments when AI agents start work and cleaning them up when done. Requires fast provisioning (seconds, not minutes), production-representative data, and low per-environment cost. Copy-on-write branching makes this economically viable — agents now create the majority of new database instances at scale.
See also: Agent Isolation, Branch, Scale to Zero
A convention file placed in a repository root that instructs AI coding agents on project setup, database configuration, and safe development practices. Specifies branch creation, connection strings, migration commands, and cleanup steps. Part of the emerging standard for agent-aware infrastructure.
See also: llms.txt
AI Coding Agent
AI & AgentsSoftware that autonomously generates, tests, and deploys code with minimal human intervention. Examples: Cursor, GitHub Copilot, Devin, Claude Code, Windsurf. Agents create database schema changes, run migrations, execute tests, and may operate multiple concurrent database sessions — all requiring isolated, writable database access with real data.
Agentic Development
AI & AgentsA software development workflow where AI agents perform substantial portions of coding, testing, and debugging tasks. Database demand scales with machine activity, not headcount. Requires infrastructure providing: isolated environments per agent, fast provisioning, production-representative data, and automatic cleanup.
B
BASE
Core DatabaseAn alternative consistency model to ACID used in distributed systems: Basically Available, Soft state, Eventually consistent. Common in NoSQL and distributed databases where availability is prioritized over immediate consistency.
See also: ACID, Eventual Consistency
Backend Process
PostgreSQLA server process spawned by the postmaster to handle a single client connection. Each backend has its own memory space (work_mem, temp_buffers) and communicates with other backends through shared memory.
Background Worker
PostgreSQLA process registered by an extension to perform work independently of client connections. Used for periodic maintenance, monitoring, custom replication, or application logic.
B-tree
IndexingThe default index type in PostgreSQL. A balanced tree structure supporting equality and range queries. Maintains sorted order. Optimal for high-cardinality columns.
See also: Index, GIN, GiST
BRIN (Block Range Index)
IndexingAn index storing summary information (min/max) for ranges of physical table blocks. Extremely compact for large, naturally ordered tables (e.g., time-series data). Not useful for randomly ordered data.
A logically independent copy of a database created through copy-on-write snapshotting. Branches share unchanged data blocks with their parent, making creation near-instant regardless of database size. Each branch is a fully writable, standard PostgreSQL instance.
See also: Copy-on-Write, Base Branch, Child Branch
The source branch from which child branches are created. Typically the "main" or "production" branch. The base branch's data serves as the starting point for all derived branches.
See also: Branch, Child Branch
Backfill
Schema MigrationsThe process of populating a new or transformed column with values derived from existing data. During zero-downtime migrations, backfilling runs in batches to avoid long-running locks.
Baseline Migration
Schema MigrationsAn initial migration that captures the current state of an existing database schema without applying changes. Establishes the starting point for version-controlled schema management.
Base Backup
Backup & RecoveryA full physical copy of a PostgreSQL cluster's data directory, taken while the server is running. Combined with WAL archives, enables point-in-time recovery. Created with pg_basebackup.
See also: PITR, Continuous Archiving
A deployment model where the vendor's software runs in the customer's own cloud account or infrastructure. The customer retains control of their data plane while the vendor manages the control plane.
See also: Control Plane, Data Plane
Bloat
MonitoringWasted space in tables and indexes caused by dead tuples, unused pages, and fragmentation. Measured as the ratio of actual size to minimum possible size. Resolved by VACUUM, REINDEX, or pg_repack.
See also: Vacuum, Dead Tuple
C
CAP Theorem
Core DatabaseStates that a distributed data store can guarantee at most two of three properties simultaneously: Consistency (every read returns the most recent write), Availability (every request receives a response), Partition tolerance (the system continues operating despite network partitions). Formulated by Eric Brewer in 2000.
See also: Partition Tolerance, Eventual Consistency
Cardinality
Core DatabaseThe number of distinct values in a column relative to total rows. High cardinality (many unique values, e.g., email addresses) vs. low cardinality (few unique values, e.g., boolean flags). Affects index selection and query optimization.
Catalog
Core DatabaseSystem tables maintained by the database engine containing metadata about all database objects: tables, columns, data types, functions, indexes, constraints, and permissions. In PostgreSQL, accessible via the pg_catalog schema.
Column
Core DatabaseA named attribute of a table with a defined data type. Each row in the table contains one value for each column. Also called a "field" or "attribute" in relational theory.
Composite Type
Core DatabaseA data type composed of multiple named fields, each with its own type. In PostgreSQL, every table automatically defines a composite type with the same name and structure.
Constraint
Core DatabaseA rule enforced by the database to maintain data integrity. Types include: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, and EXCLUDE. Constraints are evaluated on every insert, update, or delete that affects the constrained columns.
See also: Primary Key, Foreign Key, Unique Constraint
Checkpoint
PostgreSQLA periodic operation where PostgreSQL writes all dirty (modified) pages from shared buffers to disk and creates a WAL checkpoint record. Ensures that recovery only needs to replay WAL from the last checkpoint. Controlled by checkpoint_timeout and max_wal_size.
See also: WAL, Shared Buffers
CTID
PostgreSQLTuple identifier — a pair (block number, offset) pointing to the physical location of a row on disk. Changes when a row is updated (due to MVCC creating a new tuple version). Not stable across vacuums.
See also: Tuple, MVCC
A temporary named result set defined with the WITH clause. CTEs improve readability and can be recursive. In PostgreSQL 12+, non-recursive CTEs can be inlined by the optimizer.
Covering Index
IndexingAn index containing all columns needed to satisfy a query, allowing the executor to return results directly from the index without accessing the heap table. Created with INCLUDE clause in PostgreSQL 11+.
See also: Index-Only Scan
The practice of continuously archiving WAL segments as they are completed. Combined with a base backup, enables point-in-time recovery to any moment within the archive retention period.
See also: WAL Archiving, PITR
The practice of capturing row-level changes (inserts, updates, deletes) from a database and delivering them to downstream systems in near-real-time. Implemented via WAL logical decoding in PostgreSQL.
See also: Logical Decoding, pgstream
A branch derived from a parent/base branch. Inherits all data at the moment of creation. Subsequent writes to either the parent or child do not affect the other.
See also: Branch, Base Branch
Clone
Branching & CoWA writable copy of a filesystem, volume, or dataset created from a snapshot. Clones share all data blocks with the source snapshot until modified. In ZFS: zfs clone snapshot@name clone/name.
See also: Snapshot, Copy-on-Write
Copy-on-Write (CoW)
Branching & CoWA storage-level optimization where a copy operation does not duplicate data immediately. Instead, the copy shares the same physical data blocks as the source. Blocks are duplicated only when one copy modifies them. Enables instant, space-efficient database branching.
See also: Branch, Delta Storage, ZFS
A migration technique where a new column is created alongside the existing one, data is backfilled, and triggers synchronize both columns until all applications have migrated.
A metric measuring how much a single column's value distribution changed after anonymization. Computed by comparing frequency distributions (categorical) or kernel density estimates (numeric) between original and treated data. Lower distortion indicates better data utility preservation.
See also: Distortion, NPMI
Columnar Storage
StorageA storage format where data is organized by column rather than by row. Efficient for analytical queries that scan a few columns across many rows. Examples: Parquet, pg_duckdb, cstore_fdw.
See also: OLAP
Compression
StorageReducing data size on disk. PostgreSQL supports TOAST compression (pglz, lz4). Storage layers may add transparent compression (ZFS lz4, zstd). Columnar formats achieve high compression through column-wise encoding.
See also: TOAST
Connection Pooling
NetworkingSharing a pool of database connections across multiple application processes. Reduces the overhead of establishing new connections. Tools: PgBouncer, pgpool-II, built-in application-level pools.
See also: PgBouncer
Connection String
NetworkingA URI or key-value format specifying database connection parameters: host, port, database name, user, password, SSL mode. PostgreSQL format: postgresql://user:password@host:port/dbname?sslmode=require.
Cache Locking
NetworkingA concurrency pattern where only one process recomputes an expired cache entry while others wait or serve stale data. Prevents thundering herd on cache expiry. Also called cache stampede prevention. Essential for high-QPS database workloads.
See also: Thundering Herd
Connection Warmup
NetworkingPre-establishing database connections before they are needed to avoid latency spikes from cold connection pools. OpenAI reduced PostgreSQL connection establishment overhead from 50ms to 5ms through connection warmup optimization.
See also: Connection Pooling
CloudNativePG (CNPG)
KubernetesAn open-source Kubernetes operator for managing PostgreSQL clusters. Handles provisioning, HA, failover, backups, scaling, and rolling updates. Uses custom resource definitions (CRDs) to define cluster specifications.
See also: Operator
Cache Hit Ratio
MonitoringThe fraction of data requests served from shared buffers without reading from disk. A healthy ratio is typically above 99% for OLTP workloads.
See also: Shared Buffers
Citus
ExtensionsA PostgreSQL extension that distributes tables across multiple nodes for horizontal scaling. Supports distributed queries, multi-tenant schemas, and real-time analytics. Maintained by Microsoft.
Consensus
Distributed SystemsA protocol by which distributed nodes agree on a single value or sequence of values. Used for leader election, log replication, and configuration management. Algorithms: Raft, Paxos, Zab.
CI/CD
Developer ToolsContinuous Integration / Continuous Delivery — automated processes that build, test, and deploy code changes. In database infrastructure: schema migrations run in CI pipelines, branched databases provide isolated test environments.
Connection Saturation
Scaling & PerformanceThe state where all available database connections are in use, causing new requests to queue or fail. A common bottleneck in high-concurrency environments, especially with agent workloads that open many parallel connections. Addressed by connection pooling, query optimization, and per-role connection limits.
See also: Connection Pooling, PgBouncer
Context Window
AI & AgentsThe maximum number of tokens an LLM can process in a single request. Determines how much schema metadata, query results, error messages, and instructions an agent can consider simultaneously. Larger context windows enable more complex database interactions but increase latency and cost.
A regional deployment unit in the Xata data plane. Each cell runs a CloudNativePG operator managing PostgreSQL cluster lifecycle, provisioning, scaling, backups, and high availability.
See also: Data Plane, CloudNativePG
The centralized Xata service handling API access, business logic, user/organization management, project configuration, and branch metadata. Region-independent.
See also: Data Plane
D
DDL (Data Definition Language)
Core DatabaseSQL statements that define or modify database structure: CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME. DDL changes are typically auto-committed and may acquire exclusive locks.
See also: DML
DML (Data Manipulation Language)
Core DatabaseSQL statements that read or modify data within existing structures: SELECT, INSERT, UPDATE, DELETE, MERGE. DML operations run within transactions.
See also: DDL, Transaction
Data Type
Core DatabaseThe classification of a column's allowed values. Determines storage format, valid operations, and comparison behavior. Common types: INTEGER, TEXT, BOOLEAN, TIMESTAMP, UUID, JSONB, NUMERIC, BYTEA.
Database
Core DatabaseA named collection of schemas, tables, and other objects managed by a database server. In PostgreSQL, a single cluster can host multiple databases, each isolated with its own catalog.
Default Value
Core DatabaseAn expression automatically evaluated and stored when a row is inserted without an explicit value for that column. Can be a literal, a function call (e.g., now()), or a sequence reference.
Domain
Core DatabaseA user-defined data type based on an existing type with additional constraints. Useful for enforcing business rules (e.g., a positive_integer domain that wraps INTEGER with a CHECK (value > 0) constraint).
Dead Tuple
PostgreSQLA row version that is no longer visible to any active transaction but has not yet been reclaimed by VACUUM. Accumulation of dead tuples leads to table bloat and degraded performance.
Deadlock
TransactionsA situation where two or more transactions each hold a lock the other needs, creating a circular dependency. PostgreSQL detects deadlocks automatically and aborts one transaction to break the cycle.
See also: Lock
Debezium
CDCAn open-source distributed platform for CDC built on Kafka Connect. Captures changes from PostgreSQL, MySQL, MongoDB, and other databases. Uses PostgreSQL's logical decoding API.
See also: CDC
Delta Storage
Branching & CoWStorage consumed by branch modifications. In a CoW system, a branch's storage cost equals the delta (changed blocks) relative to its source. Idle branches with no modifications consume near-zero additional storage.
See also: Copy-on-Write
Data Masking
Privacy & AnonymizationReplacing sensitive data values with obfuscated versions that preserve format and length. Examples: john@example.com becomes j***@e******.com, phone number 555-1234 becomes XXX-XXXX.
See also: Anonymization, Redaction
De-identification
Privacy & AnonymizationThe process of removing or transforming personally identifiable information from a dataset so that individuals cannot be reasonably identified. Regulatory frameworks (HIPAA, GDPR) define specific requirements.
Differential Privacy
Privacy & AnonymizationA mathematical framework providing provable privacy guarantees. Adds calibrated noise to query results or data, ensuring that the presence or absence of any individual's data does not significantly change the output.
DID (Direct Identifier)
Privacy & AnonymizationA field that alone identifies a person: name, Social Security number, email address, phone number, medical record number. Typically suppressed (removed, masked, or faked) during anonymization.
See also: QID, PII
Data Page
StorageThe fundamental I/O unit in PostgreSQL: an 8KB block containing row data, free space information, and item pointers. Tables and indexes are organized as sequences of pages.
Disaggregated Storage-Compute
Distributed SystemsAn architecture separating database compute (query processing) from storage (data persistence) into independently scalable layers. Enables scaling reads without duplicating storage and scaling storage without adding compute. Used by Neon, Aurora, AlloyDB, and Xata. A defining trend in modern PostgreSQL infrastructure.
See also: Scale to Zero, Copy-on-Write
Distributed PostgreSQL
Distributed SystemsExtending PostgreSQL to operate across multiple nodes for horizontal scalability. Approaches include: transparent sharding middleware (Citus, PgDog, Multigres), consensus-based NewSQL (CockroachDB, YugabyteDB), and cloud-native multi-writer (Aurora Limitless). An active area of development in 2025-2026 with multiple new entrants.
See also: Sharding, Wire Protocol Compatibility
dbt (Data Build Tool)
AnalyticsAn open-source tool for transforming data in warehouses using SQL. Manages a DAG of SQL models, handles dependencies, runs tests, and generates documentation.
Database Migration Tool
Developer ToolsSoftware that manages versioned schema changes. Tracks which migrations have been applied and applies pending ones in order. Examples: pgroll, Alembic, Flyway, Liquibase, golang-migrate, Prisma Migrate.
Database Consolidation
Scaling & PerformanceThe industry trend of fewer, larger database providers acquiring smaller ones. In 2025: Databricks acquired Neon, EDB consolidated several PostgreSQL companies. Impacts vendor selection — open-source core and standard PostgreSQL reduce consolidation risk by ensuring portability.
Database Agent Loop
AI & AgentsThe iterative cycle where an AI agent reads schema → generates SQL or migration → executes against a database → observes results → adjusts and repeats. Each iteration requires a writable, isolated database with production-representative data. The speed and cost of this loop determines agent productivity.
The component containing PostgreSQL clusters and the SQL gateway. Deployed per region/cell. Handles actual database operations. In BYOC deployments, the data plane runs in the customer's infrastructure.
See also: Control Plane, BYOC
E
Enum Type
Core DatabaseA user-defined data type consisting of a static, ordered set of values. Example: CREATE TYPE status AS ENUM ('active', 'inactive', 'suspended'). Values are stored as integers internally but displayed as labels.
Extension
PostgreSQLA packaged unit of additional functionality that can be installed into a PostgreSQL database. Extensions can add types, functions, operators, indexes, and background workers. Managed with CREATE EXTENSION / DROP EXTENSION.
Diagnostic commands showing the query execution plan. EXPLAIN shows the estimated plan; EXPLAIN ANALYZE executes the query and shows actual timings, row counts, and buffer usage.
See also: Query Plan
An architectural pattern where state changes are stored as a sequence of immutable events rather than mutable current-state records. CDC streams can be used to derive event sourcing patterns from traditional databases.
Ephemeral Environment
Branching & CoWA short-lived database instance (typically a branch) created for a specific task — CI test run, developer feature work, AI agent session — and destroyed when done. Copy-on-write branching makes ephemeral environments economically viable at scale.
A two-phase migration strategy: Expand adds the new schema elements alongside the old ones. Contract removes the old elements after all applications have migrated to the new schema. Enables zero-downtime changes.
See also: Zero-Downtime Migration, pgroll
Equivalence Class
Privacy & AnonymizationA set of records that share identical values on all quasi-identifier columns after anonymization. In a k-anonymous dataset, every equivalence class contains at least k records.
See also: k-Anonymity, QID
Automated classification of database columns as Direct Identifiers (DIDs), Quasi-Identifiers (QIDs), or Safe using NLP and pattern matching. Privacy Dynamics uses Microsoft Presidio with custom recognizers covering all 18 HIPAA identifier families plus additional types (coordinates, advertising IDs, license plates).
See also: DID, QID, Presidio
Eventual Consistency
Distributed SystemsA consistency model where all replicas converge to the same state given sufficient time without new updates. Reads may temporarily return stale data.
See also: BASE, CAP Theorem
F
Foreign Key
Core DatabaseA constraint linking one or more columns in a table to the primary key or unique columns of another table. Enforces referential integrity. Supports cascading actions: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION.
See also: Primary Key, Constraint
Failover
Replication & HAThe process of promoting a standby/replica to primary when the current primary becomes unavailable. Can be automatic (via Patroni, pg_auto_failover, CloudNativePG) or manual.
A randomized aggregation method used during microaggregation that selects median or mode values with controlled randomness to prevent statistical bias and information leakage. Unlike deterministic aggregation, fair aggregation prevents adversaries from inferring the original value distribution within an equivalence class.
See also: Microaggregation, Equivalence Class
G
GUC (Grand Unified Configuration)
PostgreSQLPostgreSQL's configuration system managing all server parameters. Settings can be applied at server, database, role, session, or transaction level. Parameters have contexts (e.g., SUSET, USERSET) controlling who can change them.
GIN (Generalized Inverted Index)
IndexingAn index type for values containing multiple elements: arrays, JSONB, full-text search vectors, and trigrams. Maps each element to the rows containing it. Excellent for containment queries (@>, ?, @@).
See also: B-tree, pg_trgm
GiST (Generalized Search Tree)
IndexingA balanced tree index supporting complex data types: geometric shapes, ranges, full-text search, and nearest-neighbor queries. Extensible — custom data types can define their own GiST operator classes.
GDPR
Privacy & AnonymizationGeneral Data Protection Regulation — European Union regulation governing personal data protection. Requires lawful basis for processing personal data, data minimization, purpose limitation, and rights to access, rectification, and erasure. Effective May 2018.
Microaggregation clustering that uses geographic coordinates (latitude/longitude) from GeoNames reference data to group records by physical proximity rather than raw ZIP code values. Produces more meaningful anonymized locations. Linked column hierarchies keep city, state, and ZIP code values consistent within each cluster.
See also: Microaggregation, Linked Column Hierarchy
H
Heap
PostgreSQLThe default table storage method in PostgreSQL. Rows are stored unordered in 8KB pages. The heap is accessed via sequential scans or through index lookups that reference heap pages by CTID.
HOT Update
PostgreSQLHeap-Only Tuple update — an optimization where an updated row can be stored on the same heap page as the original, avoiding index updates. Requires that no indexed columns changed and the page has free space.
See also: Fillfactor
Hash Index
IndexingAn index using a hash function for exact-match lookups only. Faster than B-tree for simple equality queries on high-cardinality columns. Does not support range queries. WAL-logged since PostgreSQL 10.
Hot Standby
Replication & HAA standby server that accepts read-only queries while continuously applying WAL from the primary. Enables read scaling and reporting workloads without impacting the primary.
See also: Standby, Read Replica
Hybrid Database Strategy
Replication & HAUsing PostgreSQL alongside specialized databases for different workload patterns. Example: PostgreSQL for read-heavy OLTP, sharded systems for write-heavy workloads. Chosen when a single system cannot optimally serve all access patterns. OpenAI uses this approach at scale.
See also: Horizontal Scaling
HIPAA
Privacy & AnonymizationHealth Insurance Portability and Accountability Act — US federal law governing protected health information (PHI). The Privacy Rule defines 18 identifier types that must be addressed for de-identification. Two methods: Safe Harbor and Expert Determination.
See also: Safe Harbor, Expert Determination
Horizontal Scaling
Cloud & DeploymentIncreasing database capacity by adding more nodes. Methods include read replicas (for read scaling), sharding (for write scaling), and distributed PostgreSQL (for both). More operationally complex than vertical scaling but removes single-node limits.
See also: Vertical Scaling, Read Replica, Distributed PostgreSQL
Helm
KubernetesA package manager for Kubernetes. Helm charts define templated Kubernetes manifests for deploying applications. Used for installing database operators, monitoring stacks, and application workloads.
Database workloads combining OLTP and OLAP in a single system. Eliminates the need to replicate data to a separate analytical store. Approaches: real-time materialized views, columnar extensions (pg_duckdb), or dual-format storage. Growing in importance as agents need both transactional and analytical access.
The paused state of an idle branch where compute resources are released but data remains on disk. Branches automatically hibernate after a configurable inactivity period (default 30 minutes) and resume on the next connection.
See also: Scale to Zero
I
Index
Core DatabaseA data structure maintained alongside a table to accelerate lookups on specific columns. Trades write performance and storage for faster reads.
See also: B-tree, GIN, GiST
Index-Only Scan
IndexingA query execution strategy where all required data is retrieved from the index without touching the heap table. Requires the index to cover all columns in the query and the visibility map to confirm all-visible pages.
See also: Covering Index, Visibility Map
Isolation Level
TransactionsThe degree to which a transaction is protected from concurrent modifications by other transactions. PostgreSQL supports: READ COMMITTED (default), REPEATABLE READ, and SERIALIZABLE.
See also: MVCC, SSI
IaC (Infrastructure as Code)
Cloud & DeploymentManaging cloud infrastructure through declarative configuration files rather than manual processes. Tools: Terraform, Pulumi, AWS CDK, CloudFormation. Enables version control, review, and reproducibility.
IOPS
MonitoringInput/Output Operations Per Second — a measure of storage throughput. Read IOPS and write IOPS are typically reported separately. Database workload characteristics determine IOPS requirements.
io_uring
NVMe & StorageA Linux kernel interface for asynchronous I/O. Uses submission and completion ring buffers shared between userspace and kernel. Provides significantly lower overhead than traditional read/write system calls.
J
JOIN
SQLAn operation combining rows from two or more tables based on a related column. Types: INNER JOIN (matching rows only), LEFT JOIN (all left + matching right), RIGHT JOIN, FULL JOIN (all from both), CROSS JOIN (Cartesian product).
K
k-Anonymity
Privacy & AnonymizationA privacy property where every record in a dataset is indistinguishable from at least k-1 other records on quasi-identifier values. Prevents singling out individuals through quasi-identifier combinations. Introduced by Latanya Sweeney in 2002.
See also: Equivalence Class, l-Diversity, Microaggregation
L
Lock
TransactionsA mechanism preventing concurrent access conflicts. PostgreSQL uses multiple lock types: row-level (shared/exclusive), table-level (eight modes from ACCESS SHARE to ACCESS EXCLUSIVE), advisory, and predicate locks.
See also: Deadlock, Advisory Lock
Logical Decoding
Replication & HAPostgreSQL's mechanism for extracting row-level changes from WAL in a logical format. Foundation for logical replication, CDC, and tools like pgstream. Requires wal_level = logical.
See also: Logical Replication, CDC, Output Plugin
Logical Replication
Replication & HAA replication method that captures and replays DML operations (INSERT, UPDATE, DELETE) at the row level. Supports selective replication of specific tables, cross-version replication, and conflict resolution. Uses publications and subscriptions.
See also: Physical Replication, Publication, Subscription
A 64-bit integer identifying a specific byte position in the WAL stream. Format: 16/B374D848. Used to track replication progress, recovery targets, and WAL consumption. Monotonically increasing.
See also: WAL, Replication Slot
Linkage Attack
Privacy & AnonymizationAn attack that re-identifies individuals by joining anonymized data with external datasets on shared quasi-identifiers. Sweeney (2000) demonstrated that 87% of the US population is uniquely identifiable by {ZIP code, gender, date of birth}. Risk assessment simulates these attacks using Monte Carlo methods with synthetic adversary datasets.
See also: QID, Re-identification Risk, Risk Score
A configuration specifying that certain columns have a hierarchical relationship (e.g., city → state → ZIP code) and must be treated together during anonymization. Ensures that after microaggregation, geographic values remain internally consistent rather than producing impossible combinations.
See also: Geographic-Aware Clustering, Microaggregation
Latency Budget
Scaling & PerformanceThe maximum acceptable response time for a database query, typically defined as a percentile target (p50, p95, p99). OpenAI targets low double-digit millisecond p99 latency for PostgreSQL reads. Drives decisions about indexing, caching, replica placement, and connection pooling configuration.
llms.txt
AI & AgentsA convention for placing a structured text file at a website's root (similar to robots.txt) to provide LLMs and AI agents with site context, navigation, and key content links. Proposed by Jeremy Howard. Agents check this file to understand a site's purpose and discover relevant resources like documentation and glossaries.
See also: AGENTS.md
M
Materialized View
Core DatabaseA view whose results are physically stored on disk and must be explicitly refreshed. Useful for caching expensive query results. Unlike regular views, materialized views can be indexed.
See also: View
Memory Context
PostgreSQLPostgreSQL's hierarchical memory management system. Allocations are made within named contexts (e.g., TopMemoryContext, MessageContext). Freeing a parent context automatically frees all child contexts and allocations.
MVCC (Multi-Version Concurrency Control)
TransactionsPostgreSQL's concurrency model. Instead of locking rows during reads, PostgreSQL maintains multiple versions of each row. Readers see a consistent snapshot based on their transaction's start time. Writers create new row versions rather than overwriting in place.
Migration
Schema MigrationsA versioned, ordered set of schema changes applied to a database. Migrations are typically stored as files (SQL or declarative definitions) and tracked in a metadata table.
Microaggregation
Privacy & AnonymizationAn anonymization technique that groups similar records into clusters of at least k members and replaces quasi-identifier values with aggregated values (median, mode). Preserves data types and statistical properties better than suppression or generalization.
See also: k-Anonymity
Managed Database
Cloud & DeploymentA database service where the cloud provider handles provisioning, patching, backups, failover, and scaling. Examples: AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL.
Multi-Cloud
Cloud & DeploymentRunning infrastructure across two or more cloud providers. Reduces vendor lock-in and enables best-of-breed service selection. Requires cloud-agnostic tooling and standard protocols (e.g., vanilla Postgres).
MCP (Model Context Protocol)
AI & AgentsA JSON-RPC 2.0 protocol for AI agents to interact with external systems including databases. Defines tools (executable operations), resources (readable data), and prompts (templated instructions) as structured capabilities. Enables agents to discover and execute database operations programmatically without hardcoded integrations. Originated at Anthropic, adopted across the industry in 2025.
See also: Tool Use
N
Normalization
Core DatabaseThe process of organizing tables to reduce data redundancy and improve integrity. Normal forms (1NF through 5NF) provide progressively stricter rules. Most production schemas target third normal form (3NF) as a practical balance.
Null
Core DatabaseA special marker indicating the absence of a value. Not equal to zero, empty string, or false. NULL comparisons require IS NULL / IS NOT NULL operators because NULL = NULL evaluates to NULL (unknown), not true.
NVMe
NVMe & StorageNon-Volatile Memory Express — a host controller interface and storage protocol designed for flash storage over PCIe. Provides low-latency, high-throughput access with multi-queue architecture. Replaced SATA/SAS for high-performance storage.
NVMe-oF (NVMe over Fabrics)
NVMe & StorageAn extension of the NVMe protocol over network fabrics (TCP, RDMA, Fibre Channel). Enables remote block storage access with near-local NVMe performance. Used for disaggregated storage architectures.
O
OID (Object Identifier)
Core DatabaseA 32-bit unsigned integer used internally by PostgreSQL to identify catalog objects (tables, types, functions). Every database object has an OID. Exposed in system catalogs but typically not used in application code.
A design pattern where application writes are paired with event records in the same transaction. A CDC connector reads the outbox table and publishes events to a message broker, ensuring reliable delivery semantics.
A PostgreSQL logical decoding plugin that formats WAL changes into a specific format. Examples: pgoutput (built-in, used by logical replication), wal2json (JSON format), test_decoding (text format).
See also: Logical Decoding, wal2json
Operator
KubernetesA Kubernetes pattern where custom controllers manage the lifecycle of complex applications. Database operators (CloudNativePG, Zalando PostgreSQL Operator, CrunchyData PGO) automate PostgreSQL operations.
See also: CloudNativePG
OLAP
AnalyticsOnline Analytical Processing — query patterns characterized by complex aggregations over large datasets, typically read-heavy with few writes. Columnar storage and analytical engines are optimized for OLAP.
See also: OLTP, Columnar Storage
OLTP
AnalyticsOnline Transaction Processing — query patterns characterized by many short, simple transactions with reads and writes. Row-oriented storage and B-tree indexes are optimized for OLTP.
See also: OLAP
ORM (Object-Relational Mapping)
Developer ToolsA library that maps database tables to programming language objects. Handles query generation, result hydration, and schema definition. Examples: SQLAlchemy, Django ORM, Prisma, TypeORM, Sequelize, Hibernate.
P
Partition
Core DatabaseA physical division of a large table into smaller, more manageable pieces based on a partition key. PostgreSQL supports range, list, and hash partitioning. Queries targeting a specific partition value can skip unrelated partitions (partition pruning).
See also: Partition Pruning
Primary Key
Core DatabaseA constraint guaranteeing that one or more columns uniquely identify each row and contain no NULLs. A table can have at most one primary key. Automatically creates a unique B-tree index.
See also: Constraint, Unique Constraint, B-tree
Planner / Optimizer
PostgreSQLThe PostgreSQL component that analyzes a parsed query and generates an execution plan. Evaluates multiple strategies (sequential scan, index scan, join methods) and selects the one with the lowest estimated cost based on table statistics.
See also: Query Plan, Statistics Collector
Postmaster
PostgreSQLThe main PostgreSQL server process. Listens for client connections and forks backend processes to handle them. Also manages background workers, checkpointing, WAL archiving, and replication processes.
Postgres Cluster
PostgreSQLThe complete set of databases managed by a single PostgreSQL server instance (postmaster). Stored in a single data directory (PGDATA). Not to be confused with a multi-node cluster for high availability.
A pre-parsed, pre-planned SQL statement with parameter placeholders. Avoids repeated parsing/planning overhead for frequently executed queries. Created with PREPARE, executed with EXECUTE.
Partial Index
IndexingAn index built on a subset of table rows, filtered by a WHERE predicate. Reduces index size and maintenance cost. Example: CREATE INDEX ON orders (created_at) WHERE status = 'pending'.
Partition Pruning
IndexingAn optimization where the query planner eliminates partitions that cannot contain rows matching the query's WHERE clause. Reduces the amount of data scanned in partitioned tables.
See also: Partition
Physical Replication
Replication & HAReplication by streaming raw WAL bytes from primary to standby. The standby maintains an exact byte-for-byte copy of the primary's data files. Cannot selectively replicate individual tables. Also called streaming replication.
See also: Logical Replication, Streaming Replication
Primary
Replication & HAThe database instance that accepts write operations. In a replication setup, the primary is the source of truth.
See also: Standby, Read Replica
Promotion
Replication & HAThe act of converting a standby/replica into a primary that accepts writes. Triggered during failover. The promoted instance stops replaying WAL and begins accepting write transactions.
See also: Failover
Publication
Replication & HAA logical replication publisher definition. Specifies which tables' changes should be made available to subscribers. Created with CREATE PUBLICATION.
See also: Subscription, Logical Replication
PII (Personally Identifiable Information)
Privacy & AnonymizationAny data that can identify an individual, directly or indirectly. Direct PII: name, SSN, email. Indirect PII: age, ZIP code, gender (can identify when combined). Classification varies by regulation and jurisdiction.
See also: DID, QID
pg_dump / pg_restore
Backup & RecoveryLogical backup and restore utilities. pg_dump exports database objects and data as SQL or custom-format archives. pg_restore loads these archives. Supports selective schema/table backup and parallel restore.
PITR (Point-in-Time Recovery)
Backup & RecoveryRestoring a database to its exact state at a specific timestamp or LSN. Requires a base backup plus all WAL archives from the backup to the target recovery point.
See also: Base Backup, WAL Archiving
PgBouncer
NetworkingA lightweight PostgreSQL connection pooler. Sits between applications and the database, multiplexing many client connections onto fewer server connections. Supports session, transaction, and statement pooling modes.
See also: Connection Pooling
pg_hba.conf
SecurityPostgreSQL's host-based authentication configuration file. Each line specifies: connection type, database, user, client address, and authentication method. Evaluated top-to-bottom; first matching rule applies.
PostgreSQL DBaaS
Cloud & DeploymentDatabase-as-a-Service offerings for PostgreSQL. Major providers: AWS RDS/Aurora, Google Cloud SQL/AlloyDB, Azure Database for PostgreSQL, Neon, Supabase, Xata, Crunchy Bridge, Tembo, Aiven. Market consolidation accelerated in 2025 (Databricks acquired Neon, EDB acquired multiple PG companies).
See also: Managed Database
pg_stat_activity
MonitoringA PostgreSQL system view showing one row per server process with current activity: query text, state, wait events, transaction start time, and client information. Primary tool for monitoring active connections.
pg_stat_statements
MonitoringA PostgreSQL extension tracking planning and execution statistics for all SQL statements. Records call count, total/mean/min/max time, rows returned, and shared buffer hits. Essential for identifying slow queries.
Prometheus
MonitoringAn open-source monitoring system collecting time-series metrics via a pull model. Database exporters (postgres_exporter) expose PostgreSQL metrics. Paired with Grafana for visualization and alerting.
pg_cron
ExtensionsA PostgreSQL extension providing cron-based job scheduling within the database. Runs SQL commands or functions on configurable schedules without external schedulers.
An open-source PostgreSQL schema migration tool enabling zero-downtime, reversible schema changes using the expand/contract pattern. Maintains multiple schema versions simultaneously through versioned views.
See also: Expand / Contract Pattern, Zero-Downtime Migration
An open-source PostgreSQL CDC and streaming replication tool. Captures row-level changes via logical decoding and streams them to targets: PostgreSQL, Kafka, Elasticsearch, webhooks. Supports data transformation and anonymization.
See also: CDC, Logical Decoding
pgvector
ExtensionsA PostgreSQL extension adding vector similarity search. Stores embeddings as the vector type with support for L2 distance, inner product, and cosine distance. Uses IVFFlat and HNSW indexes for approximate nearest neighbor search.
A framework for building PostgreSQL extensions in the Zig programming language. Provides safe wrappers for PostgreSQL internals: memory contexts, SPI, error handling, data type conversion, and GUC configuration.
PostGIS
ExtensionsA PostgreSQL extension adding geographic object support. Provides spatial data types (point, line, polygon), spatial indexes (GiST R-tree), and thousands of spatial functions. The standard for geospatial databases.
Partition Tolerance
Distributed SystemsThe ability of a distributed system to continue operating when network communication between nodes is interrupted. One of the three properties in the CAP theorem. All practical distributed databases must handle partitions.
See also: CAP Theorem, Consensus
Parquet
AnalyticsAn open-source columnar storage format for efficient analytical queries. Supports predicate pushdown, column pruning, and compression. Increasingly used as an interchange format between databases, data lakes, and ML pipelines. Accessible from PostgreSQL via pg_duckdb and foreign data wrappers.
See also: Columnar Storage, OLAP
psql
Developer ToolsPostgreSQL's interactive terminal. Provides a command-line interface for executing SQL, managing databases, and inspecting server state. Supports tab completion, command history, and formatting options.
Q
Query Plan
IndexingThe execution strategy chosen by the optimizer for a SQL statement. Represented as a tree of plan nodes (Seq Scan, Index Scan, Nested Loop, Hash Join, Sort, Aggregate, etc.). Viewable with EXPLAIN.
See also: EXPLAIN / EXPLAIN ANALYZE, Planner / Optimizer
QID (Quasi-Identifier)
Privacy & AnonymizationA field that is not directly identifying alone but can identify individuals when combined with other quasi-identifiers. Classic example: 87% of the US population can be uniquely identified by {ZIP code, gender, date of birth} (Sweeney, 2000).
See also: DID, k-Anonymity
Queue Depth
NVMe & StorageThe number of I/O commands that can be outstanding (submitted but not completed) simultaneously. Higher queue depth enables the storage device to optimize command scheduling. NVMe supports up to 65,535 commands per queue.
QPS (Queries Per Second)
Scaling & PerformanceThroughput metric for database workloads. Varies by query complexity: simple key lookups may achieve millions of QPS on a single primary (OpenAI), while analytical queries may sustain only tens per second. Agent workloads tend toward high QPS of simple queries.
See also: Latency Budget
Query Guardrails
AI & AgentsConstraints applied to agent-generated SQL to prevent destructive or expensive operations. Includes: read-only transaction mode, row count limits, query timeout enforcement, DDL restrictions, and cost-based query rejection. Critical for safe agentic database access where agents may generate unbounded queries.
See also: Agent Isolation, Rate Limiting
R
Relation
Core DatabaseIn relational theory, any named table-like object. In PostgreSQL, relations include tables, views, materialized views, indexes, sequences, and composite types. Metadata stored in pg_class.
Row
Core DatabaseA single record in a table containing one value per column. Also called a "tuple" in relational theory and PostgreSQL internals.
See also: Tuple
Read Replica
Replication & HAA database instance that replicates data from a primary and serves read-only queries. Used for read scaling, analytics, and geographic distribution.
See also: Primary, Hot Standby
Replica Identity
Replication & HAConfiguration determining which column values are included in WAL for UPDATE and DELETE during logical replication. Options: DEFAULT (primary key only), FULL (all columns), INDEX (specified unique index), NOTHING.
See also: Logical Replication
Replication Lag
Replication & HAThe delay between a change committed on the primary and its application on a replica. Measured in bytes (WAL position difference) or time. High replication lag indicates the replica is falling behind.
Replication Slot
Replication & HAA server-side object tracking a consumer's position in the WAL stream. Prevents the server from discarding WAL segments that haven't been consumed. Physical slots track LSN; logical slots also track decoding state.
See also: LSN, WAL
Read Replica Scaling
Replication & HAAdding read-only replicas to distribute query load across multiple PostgreSQL instances. Each replica receives changes via streaming replication. Connection routing directs reads to replicas and writes to the primary. OpenAI scales PostgreSQL to ~50 read replicas per primary, handling millions of QPS.
See also: Read Replica, Streaming Replication, Connection Pooling
Re-identification Risk
Privacy & AnonymizationThe probability that an individual can be identified in an anonymized dataset. Assessed through simulated linkage attacks, k-anonymity metrics, or probabilistic models.
See also: Risk Score, k-Anonymity, Linkage Attack
A quantified upper bound on re-identification probability computed by simulating linkage attacks. Privacy Dynamics generates 50 synthetic adversary datasets via Monte Carlo simulation and reports the worst-case re-identification probability. Computed before and after treatment to measure anonymization effectiveness.
See also: Re-identification Risk, Linkage Attack
RPO (Recovery Point Objective)
Backup & RecoveryThe maximum acceptable amount of data loss measured in time. An RPO of zero requires synchronous replication. An RPO of 1 hour means up to 1 hour of data loss is acceptable.
RTO (Recovery Time Objective)
Backup & RecoveryThe maximum acceptable downtime during a failure recovery. Determines the required speed of failover, restore, and catchup processes.
RBAC (Role-Based Access Control)
SecurityAn authorization model where permissions are assigned to roles, and users are granted roles. PostgreSQL implements RBAC natively: roles can own objects, receive GRANT privileges, and be members of other roles.
Role
SecurityA PostgreSQL principal that can own database objects and receive privileges. Roles with LOGIN privilege can connect to the database. Roles can be granted membership in other roles for privilege inheritance.
RLS (Row-Level Security)
SecurityA PostgreSQL feature enabling per-row access control. Policies define which rows a user can see (SELECT) or modify (INSERT, UPDATE, DELETE). Enabled with ALTER TABLE ... ENABLE ROW LEVEL SECURITY.
Raft
Distributed SystemsA consensus algorithm designed for understandability. Used in etcd, CockroachDB, and TiKV. Elects a leader that manages log replication. Provides strong consistency guarantees.
See also: Consensus
Rate Limiting
Scaling & PerformanceRestricting the number of queries or connections per unit time. Applied at application, proxy (PgBouncer), and database levels. Prevents individual clients from monopolizing resources. Especially important when AI agents generate high query volumes without built-in backpressure.
See also: Connection Saturation, PgBouncer
S
Schema
Core DatabaseA named namespace within a database that contains tables, views, functions, types, and other objects. PostgreSQL databases have a public schema by default. Used to organize objects and control access.
Sequence
Core DatabaseA database object generating unique, incrementing numeric values. Commonly used for auto-incrementing primary keys. PostgreSQL sequences support BIGINT values, custom start/increment, and cycling.
Stored Procedure
Core DatabaseA named block of procedural code stored in the database and executed via CALL. Unlike functions, stored procedures can manage transactions (commit/rollback within the procedure body). Introduced in PostgreSQL 11.
SPI (Server Programming Interface)
PostgreSQLA C API allowing PostgreSQL extensions and stored procedures to execute SQL statements from within server-side code. Provides functions like SPI_execute(), SPI_prepare(), and result set iteration.
System Columns
PostgreSQLHidden columns present in every table: tableoid (OID of the containing table), xmin (inserting transaction ID), xmax (deleting transaction ID), cmin/cmax (command IDs within transaction), ctid (physical row location).
See also: CTID, Transaction ID
Skip Scan
PostgreSQLA B-tree index optimization added in PostgreSQL 18 that efficiently handles queries where the leading index column has low cardinality. Instead of scanning the entire index, the executor skips to the next distinct value of the leading column. Eliminates the need for certain multi-column indexes.
Subquery
SQLA query nested inside another query. Can appear in FROM (derived table), WHERE (scalar or correlated subquery), or SELECT (scalar subquery). Correlated subqueries reference the outer query and execute once per outer row.
Sequential Scan
IndexingReading every row in a table from disk in physical storage order. Optimal when selecting a large fraction of the table or when no suitable index exists.
SpGiST
IndexingSpace-Partitioned GiST — an index type for data that can be partitioned into non-overlapping regions: points, IP addresses, phone numbers, text (radix tree). More efficient than GiST for certain data distributions.
See also: GiST
Statistics Collector
IndexingPostgreSQL subsystem that gathers information about table contents (row counts, most common values, histograms, correlation). The planner uses these statistics for cost estimation. Updated by ANALYZE.
Savepoint
TransactionsA named point within a transaction that can be rolled back to without aborting the entire transaction. Created with SAVEPOINT, rolled back with ROLLBACK TO SAVEPOINT.
Snapshot
TransactionsA consistent view of the database at a specific point in time. Each transaction operates on its own snapshot, seeing only data committed before the snapshot was taken. Foundation of MVCC.
See also: MVCC, Transaction
SSI (Serializable Snapshot Isolation)
TransactionsPostgreSQL's implementation of SERIALIZABLE isolation. Tracks read/write dependencies between concurrent transactions and aborts transactions that would produce results inconsistent with some serial execution order.
See also: Isolation Level
Standby
Replication & HAA replica server that continuously applies changes from the primary. Can be a hot standby (accepting reads) or warm standby (no queries). Standby servers are candidates for failover promotion.
See also: Hot Standby, Failover
Streaming Replication
Replication & HAContinuous replication of WAL records from primary to standby over a TCP connection. Changes are sent as they occur, providing near-real-time replication. The default replication method for PostgreSQL HA.
See also: Physical Replication
Subscription
Replication & HAA logical replication subscriber definition. Connects to a publication on a remote server and applies replicated changes locally. Created with CREATE SUBSCRIPTION.
See also: Publication
Synchronous Replication
Replication & HAReplication where the primary waits for one or more replicas to confirm that changes have been written before committing the transaction. Guarantees zero data loss at the cost of increased commit latency.
See also: Asynchronous Replication
The ability for idle database branches to automatically release compute resources (CPU, memory) while preserving data on disk. Branches resume when a connection is established. Eliminates the cost of idle environments.
See also: Hibernation
A comparison between two schema versions showing added, removed, and modified objects. Useful for reviewing branch changes before merging.
Synthetic Data
Privacy & AnonymizationArtificially generated data that mimics the statistical properties of real data without containing actual records. Used for DID faking (format-preserving synthetic values via Mimesis) and for generating adversary datasets in risk scoring simulations. Distinct from anonymized data, which starts from real records.
See also: Faking, Risk Score (Phisher K-Threshold)
SCRAM-SHA-256
SecurityThe recommended password authentication mechanism in PostgreSQL 10+. Uses a challenge-response protocol with salted hashing, preventing password interception and replay attacks.
Serverless
Cloud & DeploymentA compute model where resources are allocated on-demand per request or query, with automatic scaling to zero during inactivity. Eliminates idle resource costs.
StatefulSet
KubernetesA Kubernetes workload type for applications requiring stable identities, ordered deployment, and persistent storage. Used for databases where each instance needs a consistent hostname and dedicated storage.
Staging Environment
Developer ToolsA pre-production environment that mirrors production for testing. Traditionally requires full database copies. Database branching provides staging environments as lightweight, instant branches.
See also: Ephemeral Environment, Branch
The service providing PostgreSQL wire protocol access to Xata branches. Handles SSL termination, authentication, and routing to the correct PostgreSQL cluster based on the branch ID encoded in the connection hostname.
An HTTP/WebSocket interface for connecting to Xata databases from serverless functions and edge environments where persistent TCP connections are impractical. Compatible with the Neon serverless driver.
T
Table
Core DatabaseThe fundamental data storage structure in a relational database. Organized as rows and columns with a defined schema. Tables can have constraints, indexes, triggers, and policies.
Tablespace
Core DatabaseA named location on disk where PostgreSQL stores data files. Allows distributing tables and indexes across different physical storage devices for performance or capacity management.
Trigger
Core DatabaseA function automatically executed in response to INSERT, UPDATE, DELETE, or TRUNCATE events on a table. Can fire BEFORE, AFTER, or INSTEAD OF the triggering event. Used for auditing, validation, and data synchronization.
Tuple
Core DatabasePostgreSQL's internal term for a row. A heap tuple is the physical on-disk representation of a row, including system columns (xmin, xmax, ctid) used for MVCC.
See also: Row, MVCC
TOAST
PostgreSQLThe Oversized Attribute Storage Technique — PostgreSQL's mechanism for storing large column values (over ~2KB). Values are compressed and/or split across a separate TOAST table. Transparent to applications.
See also: Compression
Transaction
TransactionsAn atomic unit of work that either fully completes (COMMIT) or fully reverts (ROLLBACK). Transactions maintain ACID properties and provide isolation from concurrent operations.
See also: ACID, Isolation Level
Transaction ID (XID)
TransactionsA 32-bit identifier assigned to each write transaction. Used by MVCC to determine row visibility. Stored in the xmin and xmax system columns of each tuple. Subject to wraparound, managed by VACUUM.
See also: MVCC, Vacuum
Two-Phase Commit (2PC)
TransactionsA protocol for coordinating transaction commits across multiple database instances. Phase 1: all participants prepare (PREPARE TRANSACTION). Phase 2: coordinator issues COMMIT PREPARED or ROLLBACK PREPARED to all participants.
Thin Provisioning
Branching & CoWA storage allocation strategy where capacity is assigned on demand rather than pre-allocated. A thin-provisioned volume appears to have a large size but consumes only the space actually written.
The multi-stage anonymization process: (1) entity recognition to classify columns, (2) risk assessment via simulated linkage attacks, (3) DID suppression (redact, mask, fake, or hash), (4) k-member microaggregation for QIDs, (5) distortion analysis measuring information loss. Each stage is configurable per column and per dataset.
See also: Entity Recognition (PII Detection), Microaggregation, Distortion
Thundering Herd
NetworkingA failure pattern where many clients simultaneously retry or reconnect after a service interruption, overwhelming the recovering system. Mitigated by exponential backoff with jitter, connection pooling, and cache locking. Critical at scale — OpenAI prevents this across 50+ PostgreSQL read replicas.
See also: Cache Locking, Connection Pooling
TPS (Transactions Per Second)
MonitoringA measure of database throughput for write-heavy workloads. Commonly benchmarked with pgbench. Affected by transaction isolation level, WAL configuration, and storage performance.
TimescaleDB
ExtensionsA PostgreSQL extension for time-series data. Provides automatic partitioning (hypertables), continuous aggregates, compression, and retention policies optimized for time-series workloads.
Text-to-SQL
AI & AgentsTranslating natural language queries into valid SQL statements. Accuracy depends on schema awareness, join path inference, dialect-specific syntax, and column naming conventions. Modern approaches use LLMs with schema context rather than rule-based parsers. A core capability for database-connected AI agents.
Tool Use (Function Calling)
AI & AgentsThe mechanism by which an LLM invokes external functions during response generation. In database contexts, tools typically wrap SQL execution, schema inspection, migration commands, branch management, and data export operations. The bridge between agent reasoning and database actions.
See also: MCP
U
Unique Constraint
Core DatabaseA constraint ensuring that all values in the specified column(s) are distinct across all rows. Allows one NULL value per column. Automatically creates a unique index.
See also: Constraint, Primary Key
V
View
Core DatabaseA named query stored in the database and referenced like a table. Views do not store data; they execute their underlying query each time they are accessed. Can be used for access control, simplification, and abstraction.
See also: Materialized View
Vacuum
PostgreSQLA maintenance operation that reclaims storage from dead tuples, updates visibility maps, and refreshes planner statistics. VACUUM runs without exclusive locks; VACUUM FULL rewrites the entire table and requires an exclusive lock.
See also: Dead Tuple, Visibility Map, Bloat
Visibility Map
PostgreSQLA bitmap tracking which heap pages contain only tuples visible to all current transactions. Allows index-only scans to skip heap fetches for pages marked all-visible. Updated by VACUUM.
See also: Vacuum, Index-Only Scan
A PostgreSQL schema created during a migration containing views that expose the database tables with the appropriate column names and types for that schema version. Allows multiple application versions to coexist.
See also: pgroll, Expand / Contract Pattern
Vertical Scaling
Cloud & DeploymentIncreasing database capacity by upgrading to a larger instance with more CPU, RAM, or faster storage. PostgreSQL scales vertically to large instance sizes — OpenAI runs single PostgreSQL primaries handling millions of queries per second. Simple but has upper limits.
See also: Horizontal Scaling
W
work_mem
PostgreSQLPer-operation memory limit for internal sort operations, hash tables, and bitmap heap scans. Set per session/query. Too low causes disk spills to temporary files; too high risks memory exhaustion under concurrency.
A function that performs a calculation across a set of rows related to the current row without collapsing them into a single output row. Examples: ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER(). Defined with the OVER() clause.
See also: Aggregate Function
The mechanism ensuring data durability by writing all changes to a sequential log before applying them to data files. On crash recovery, PostgreSQL replays the WAL from the last checkpoint to restore consistency.
See also: LSN, Checkpoint, WAL Level
WAL Level
WALConfiguration controlling how much information is written to WAL. Levels: minimal (crash recovery only), replica (physical replication, default), logical (logical decoding and replication).
See also: Logical Decoding
WAL Segment
WALA fixed-size file (default 16MB) in the pg_wal directory containing sequential WAL records. Segments are recycled after checkpoint and archiving. Segment size is configurable at cluster initialization.
wal2json
CDCA PostgreSQL output plugin for logical decoding that formats WAL changes as JSON objects. Each change includes the table name, operation type, column names, types, and values.
See also: Output Plugin, CDC
Write Amplification
StorageThe ratio of actual physical writes to logical writes requested by the application. Caused by journaling, MVCC versioning, WAL, index maintenance, and CoW operations.
Wire Protocol
NetworkingThe message-based protocol PostgreSQL uses for client-server communication over TCP. Defines message formats for authentication, query execution, result transmission, and error reporting. Increasingly adopted as a compatibility target by non-PostgreSQL databases (CockroachDB, YugabyteDB, Aurora, AlloyDB, QuestDB, ClickHouse).
See also: Wire Protocol Compatibility
Wire Protocol Compatibility
NetworkingBuilding non-PostgreSQL databases that accept PostgreSQL client connections by implementing the PostgreSQL wire protocol. Allows reuse of existing drivers, ORMs, and tooling. A major industry trend — over 20 databases now advertise PostgreSQL wire protocol support.
See also: Wire Protocol
X
An AI-powered PostgreSQL monitoring and optimization tool. Analyzes query performance, suggests index improvements, and identifies configuration opportunities.
Command-line tool for managing Xata projects, branches, migrations, and replication. Commands include xata init, xata branch create, xata clone, xata stream, and xata roll.
The open-source core of the Xata platform implementing copy-on-write database branching on standard PostgreSQL. Enables instant branch creation, delta-only storage, and scale-to-zero compute.
See also: Branch, Copy-on-Write
Z
Zero-Downtime Migration
Schema MigrationsA schema migration that completes without application downtime or degraded performance. Achieved through techniques like expand/contract, column duplication, background backfilling, and versioned views.
See also: Expand / Contract Pattern, pgroll
References
- PostgreSQL Official Documentation
- Pavlo (2026) — 2025 Databases Retrospective
- NVMe Specification
- HHS HIPAA De-identification Guidance
- Sweeney (2000) — Simple Demographics Often Identify People Uniquely
- pgroll — Zero-downtime schema migrations
- pgstream — PostgreSQL CDC and streaming
- pgzx — PostgreSQL extensions in Zig
- CloudNativePG
- Model Context Protocol (MCP) Specification