Database Infrastructure Glossary

A structured reference for every term in modern database infrastructure — built for AI agents, coding assistants, and the humans who work with them. Vendor-neutral definitions. Xata-specific terms clearly labeled.

Maintained by

Xata

258 terms across 26 categories

Last updated February 2026

Agent context files

Download these files as context for AI coding agents, LLM tool definitions, CLAUDE.md skills, or offline reference. Same content as this page in a flat, parseable format optimized for token efficiency.

database-glossary-agent.md database-glossary-agent.txt

Also referenced in /llms.txt

A B C D E F G H I J K L M N O P Q R S T U V W X Z

258 terms

A

ACID

Core Database

The four properties guaranteeing reliable transaction processing: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data survives failures). Coined by Andreas Reuter and Theo Harder in 1983.

Async I/O

PostgreSQL

Asynchronous input/output allowing PostgreSQL to issue multiple storage requests concurrently rather than waiting for each to complete sequentially. Implemented in PostgreSQL 18 (2025) via io_method=io_uring on Linux and io_method=worker on other platforms. Significantly improves throughput on NVMe and SSD storage.

Aggregate Function

SQL

A function that computes a single result from a set of rows: COUNT, SUM, AVG, MIN, MAX, ARRAY_AGG, STRING_AGG. Used with GROUP BY to compute per-group results.

Advisory Lock

Transactions

An application-controlled lock managed by PostgreSQL but not tied to any table or row. Acquired and released explicitly by application code. Used for coordinating distributed processes or preventing duplicate work.

Asynchronous Replication

Replication & HA

Replication where the primary does not wait for replicas to confirm receipt of changes before committing. Offers lower latency but allows a small window of potential data loss on primary failure.

Anonymization

Privacy & Anonymization

The process of transforming data to prevent identification of individuals while preserving the data's utility for analysis, testing, or development. Distinct from pseudonymization, which is reversible.

See also: Pseudonymization, De-identification

Association Distortion

Privacy & AnonymizationXata

A metric measuring how well inter-column relationships are preserved after anonymization. Uses NPMI (Normalized Pointwise Mutual Information) to compare column pair correlations before and after treatment. Critical for maintaining the analytical value of anonymized datasets.

Agent Isolation

AI & AgentsXata

Providing each AI coding agent with its own isolated database environment (branch) to prevent agent-generated mutations from affecting production or other agents' work. Branching provides governance by architecture rather than by policy.

Agent Database Provisioning

AI & AgentsXata

Automatically creating isolated database environments when AI agents start work and cleaning them up when done. Requires fast provisioning (seconds, not minutes), production-representative data, and low per-environment cost. Copy-on-write branching makes this economically viable — agents now create the majority of new database instances at scale.

AGENTS.md

AI & AgentsXata

A convention file placed in a repository root that instructs AI coding agents on project setup, database configuration, and safe development practices. Specifies branch creation, connection strings, migration commands, and cleanup steps. Part of the emerging standard for agent-aware infrastructure.

AI Coding Agent

AI & Agents

Software that autonomously generates, tests, and deploys code with minimal human intervention. Examples: Cursor, GitHub Copilot, Devin, Claude Code, Windsurf. Agents create database schema changes, run migrations, execute tests, and may operate multiple concurrent database sessions — all requiring isolated, writable database access with real data.

Agentic Development

AI & Agents

A software development workflow where AI agents perform substantial portions of coding, testing, and debugging tasks. Database demand scales with machine activity, not headcount. Requires infrastructure providing: isolated environments per agent, fast provisioning, production-representative data, and automatic cleanup.

B

BASE

Core Database

An alternative consistency model to ACID used in distributed systems: Basically Available, Soft state, Eventually consistent. Common in NoSQL and distributed databases where availability is prioritized over immediate consistency.

Backend Process

PostgreSQL

A server process spawned by the postmaster to handle a single client connection. Each backend has its own memory space (work_mem, temp_buffers) and communicates with other backends through shared memory.

Background Worker

PostgreSQL

A process registered by an extension to perform work independently of client connections. Used for periodic maintenance, monitoring, custom replication, or application logic.

B-tree

Indexing

The default index type in PostgreSQL. A balanced tree structure supporting equality and range queries. Maintains sorted order. Optimal for high-cardinality columns.

BRIN (Block Range Index)

Indexing

An index storing summary information (min/max) for ranges of physical table blocks. Extremely compact for large, naturally ordered tables (e.g., time-series data). Not useful for randomly ordered data.

Branch

Branching & CoWXata

A logically independent copy of a database created through copy-on-write snapshotting. Branches share unchanged data blocks with their parent, making creation near-instant regardless of database size. Each branch is a fully writable, standard PostgreSQL instance.

See also: Copy-on-Write, Base Branch, Child Branch

Base Branch

Branching & CoWXata

The source branch from which child branches are created. Typically the "main" or "production" branch. The base branch's data serves as the starting point for all derived branches.

Backfill

Schema Migrations

The process of populating a new or transformed column with values derived from existing data. During zero-downtime migrations, backfilling runs in batches to avoid long-running locks.

Baseline Migration

Schema Migrations

An initial migration that captures the current state of an existing database schema without applying changes. Establishes the starting point for version-controlled schema management.

Base Backup

Backup & Recovery

A full physical copy of a PostgreSQL cluster's data directory, taken while the server is running. Combined with WAL archives, enables point-in-time recovery. Created with pg_basebackup.

BYOC (Bring Your Own Cloud)

Cloud & DeploymentXata

A deployment model where the vendor's software runs in the customer's own cloud account or infrastructure. The customer retains control of their data plane while the vendor manages the control plane.

Bloat

Monitoring

Wasted space in tables and indexes caused by dead tuples, unused pages, and fragmentation. Measured as the ratio of actual size to minimum possible size. Resolved by VACUUM, REINDEX, or pg_repack.

C

CAP Theorem

Core Database

States that a distributed data store can guarantee at most two of three properties simultaneously: Consistency (every read returns the most recent write), Availability (every request receives a response), Partition tolerance (the system continues operating despite network partitions). Formulated by Eric Brewer in 2000.

Cardinality

Core Database

The number of distinct values in a column relative to total rows. High cardinality (many unique values, e.g., email addresses) vs. low cardinality (few unique values, e.g., boolean flags). Affects index selection and query optimization.

Catalog

Core Database

System tables maintained by the database engine containing metadata about all database objects: tables, columns, data types, functions, indexes, constraints, and permissions. In PostgreSQL, accessible via the pg_catalog schema.

Column

Core Database

A named attribute of a table with a defined data type. Each row in the table contains one value for each column. Also called a "field" or "attribute" in relational theory.

Composite Type

Core Database

A data type composed of multiple named fields, each with its own type. In PostgreSQL, every table automatically defines a composite type with the same name and structure.

Constraint

Core Database

A rule enforced by the database to maintain data integrity. Types include: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, and EXCLUDE. Constraints are evaluated on every insert, update, or delete that affects the constrained columns.

Checkpoint

PostgreSQL

A periodic operation where PostgreSQL writes all dirty (modified) pages from shared buffers to disk and creates a WAL checkpoint record. Ensures that recovery only needs to replay WAL from the last checkpoint. Controlled by checkpoint_timeout and max_wal_size.

CTID

PostgreSQL

Tuple identifier — a pair (block number, offset) pointing to the physical location of a row on disk. Changes when a row is updated (due to MVCC creating a new tuple version). Not stable across vacuums.

CTE (Common Table Expression)

SQL

A temporary named result set defined with the WITH clause. CTEs improve readability and can be recursive. In PostgreSQL 12+, non-recursive CTEs can be inlined by the optimizer.

Covering Index

Indexing

An index containing all columns needed to satisfy a query, allowing the executor to return results directly from the index without accessing the heap table. Created with INCLUDE clause in PostgreSQL 11+.

Continuous Archiving

WAL

The practice of continuously archiving WAL segments as they are completed. Combined with a base backup, enables point-in-time recovery to any moment within the archive retention period.

CDC (Change Data Capture)

CDC

The practice of capturing row-level changes (inserts, updates, deletes) from a database and delivering them to downstream systems in near-real-time. Implemented via WAL logical decoding in PostgreSQL.

Child Branch

Branching & CoWXata

A branch derived from a parent/base branch. Inherits all data at the moment of creation. Subsequent writes to either the parent or child do not affect the other.

Clone

Branching & CoW

A writable copy of a filesystem, volume, or dataset created from a snapshot. Clones share all data blocks with the source snapshot until modified. In ZFS: zfs clone snapshot@name clone/name.

Copy-on-Write (CoW)

Branching & CoW

A storage-level optimization where a copy operation does not duplicate data immediately. Instead, the copy shares the same physical data blocks as the source. Blocks are duplicated only when one copy modifies them. Enables instant, space-efficient database branching.

Column Duplication

Schema MigrationsOpen Source

A migration technique where a new column is created alongside the existing one, data is backfilled, and triggers synchronize both columns until all applications have migrated.

Column Distortion

Privacy & AnonymizationXata

A metric measuring how much a single column's value distribution changed after anonymization. Computed by comparing frequency distributions (categorical) or kernel density estimates (numeric) between original and treated data. Lower distortion indicates better data utility preservation.

Columnar Storage

Storage

A storage format where data is organized by column rather than by row. Efficient for analytical queries that scan a few columns across many rows. Examples: Parquet, pg_duckdb, cstore_fdw.

Compression

Storage

Reducing data size on disk. PostgreSQL supports TOAST compression (pglz, lz4). Storage layers may add transparent compression (ZFS lz4, zstd). Columnar formats achieve high compression through column-wise encoding.

Connection Pooling

Networking

Sharing a pool of database connections across multiple application processes. Reduces the overhead of establishing new connections. Tools: PgBouncer, pgpool-II, built-in application-level pools.

Connection String

Networking

A URI or key-value format specifying database connection parameters: host, port, database name, user, password, SSL mode. PostgreSQL format: postgresql://user:password@host:port/dbname?sslmode=require.

Cache Locking

Networking

A concurrency pattern where only one process recomputes an expired cache entry while others wait or serve stale data. Prevents thundering herd on cache expiry. Also called cache stampede prevention. Essential for high-QPS database workloads.

Connection Warmup

Networking

Pre-establishing database connections before they are needed to avoid latency spikes from cold connection pools. OpenAI reduced PostgreSQL connection establishment overhead from 50ms to 5ms through connection warmup optimization.

CloudNativePG (CNPG)

Kubernetes

An open-source Kubernetes operator for managing PostgreSQL clusters. Handles provisioning, HA, failover, backups, scaling, and rolling updates. Uses custom resource definitions (CRDs) to define cluster specifications.

Cache Hit Ratio

Monitoring

The fraction of data requests served from shared buffers without reading from disk. A healthy ratio is typically above 99% for OLTP workloads.

Citus

Extensions

A PostgreSQL extension that distributes tables across multiple nodes for horizontal scaling. Supports distributed queries, multi-tenant schemas, and real-time analytics. Maintained by Microsoft.

Consensus

Distributed Systems

A protocol by which distributed nodes agree on a single value or sequence of values. Used for leader election, log replication, and configuration management. Algorithms: Raft, Paxos, Zab.

CI/CD

Developer Tools

Continuous Integration / Continuous Delivery — automated processes that build, test, and deploy code changes. In database infrastructure: schema migrations run in CI pipelines, branched databases provide isolated test environments.

Connection Saturation

Scaling & Performance

The state where all available database connections are in use, causing new requests to queue or fail. A common bottleneck in high-concurrency environments, especially with agent workloads that open many parallel connections. Addressed by connection pooling, query optimization, and per-role connection limits.

Context Window

AI & Agents

The maximum number of tokens an LLM can process in a single request. Determines how much schema metadata, query results, error messages, and instructions an agent can consider simultaneously. Larger context windows enable more complex database interactions but increase latency and cost.

Cell

Xata PlatformXata

A regional deployment unit in the Xata data plane. Each cell runs a CloudNativePG operator managing PostgreSQL cluster lifecycle, provisioning, scaling, backups, and high availability.

Control Plane

Xata PlatformXata

The centralized Xata service handling API access, business logic, user/organization management, project configuration, and branch metadata. Region-independent.

D

DDL (Data Definition Language)

Core Database

SQL statements that define or modify database structure: CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME. DDL changes are typically auto-committed and may acquire exclusive locks.

DML (Data Manipulation Language)

Core Database

SQL statements that read or modify data within existing structures: SELECT, INSERT, UPDATE, DELETE, MERGE. DML operations run within transactions.

Data Type

Core Database

The classification of a column's allowed values. Determines storage format, valid operations, and comparison behavior. Common types: INTEGER, TEXT, BOOLEAN, TIMESTAMP, UUID, JSONB, NUMERIC, BYTEA.

Database

Core Database

A named collection of schemas, tables, and other objects managed by a database server. In PostgreSQL, a single cluster can host multiple databases, each isolated with its own catalog.

Default Value

Core Database

An expression automatically evaluated and stored when a row is inserted without an explicit value for that column. Can be a literal, a function call (e.g., now()), or a sequence reference.

Domain

Core Database

A user-defined data type based on an existing type with additional constraints. Useful for enforcing business rules (e.g., a positive_integer domain that wraps INTEGER with a CHECK (value > 0) constraint).

Dead Tuple

PostgreSQL

A row version that is no longer visible to any active transaction but has not yet been reclaimed by VACUUM. Accumulation of dead tuples leads to table bloat and degraded performance.

Deadlock

Transactions

A situation where two or more transactions each hold a lock the other needs, creating a circular dependency. PostgreSQL detects deadlocks automatically and aborts one transaction to break the cycle.

Debezium

CDC

An open-source distributed platform for CDC built on Kafka Connect. Captures changes from PostgreSQL, MySQL, MongoDB, and other databases. Uses PostgreSQL's logical decoding API.

Delta Storage

Branching & CoW

Storage consumed by branch modifications. In a CoW system, a branch's storage cost equals the delta (changed blocks) relative to its source. Idle branches with no modifications consume near-zero additional storage.

Data Masking

Privacy & Anonymization

Replacing sensitive data values with obfuscated versions that preserve format and length. Examples: john@example.com becomes j***@e******.com, phone number 555-1234 becomes XXX-XXXX.

De-identification

Privacy & Anonymization

The process of removing or transforming personally identifiable information from a dataset so that individuals cannot be reasonably identified. Regulatory frameworks (HIPAA, GDPR) define specific requirements.

Differential Privacy

Privacy & Anonymization

A mathematical framework providing provable privacy guarantees. Adds calibrated noise to query results or data, ensuring that the presence or absence of any individual's data does not significantly change the output.

DID (Direct Identifier)

Privacy & Anonymization

A field that alone identifies a person: name, Social Security number, email address, phone number, medical record number. Typically suppressed (removed, masked, or faked) during anonymization.

Data Page

Storage

The fundamental I/O unit in PostgreSQL: an 8KB block containing row data, free space information, and item pointers. Tables and indexes are organized as sequences of pages.

Disaggregated Storage-Compute

Distributed Systems

An architecture separating database compute (query processing) from storage (data persistence) into independently scalable layers. Enables scaling reads without duplicating storage and scaling storage without adding compute. Used by Neon, Aurora, AlloyDB, and Xata. A defining trend in modern PostgreSQL infrastructure.

Distributed PostgreSQL

Distributed Systems

Extending PostgreSQL to operate across multiple nodes for horizontal scalability. Approaches include: transparent sharding middleware (Citus, PgDog, Multigres), consensus-based NewSQL (CockroachDB, YugabyteDB), and cloud-native multi-writer (Aurora Limitless). An active area of development in 2025-2026 with multiple new entrants.

dbt (Data Build Tool)

Analytics

An open-source tool for transforming data in warehouses using SQL. Manages a DAG of SQL models, handles dependencies, runs tests, and generates documentation.

Database Migration Tool

Developer Tools

Software that manages versioned schema changes. Tracks which migrations have been applied and applies pending ones in order. Examples: pgroll, Alembic, Flyway, Liquibase, golang-migrate, Prisma Migrate.

Database Consolidation

Scaling & Performance

The industry trend of fewer, larger database providers acquiring smaller ones. In 2025: Databricks acquired Neon, EDB consolidated several PostgreSQL companies. Impacts vendor selection — open-source core and standard PostgreSQL reduce consolidation risk by ensuring portability.

Database Agent Loop

AI & Agents

The iterative cycle where an AI agent reads schema → generates SQL or migration → executes against a database → observes results → adjusts and repeats. Each iteration requires a writable, isolated database with production-representative data. The speed and cost of this loop determines agent productivity.

Data Plane

Xata PlatformXata

The component containing PostgreSQL clusters and the SQL gateway. Deployed per region/cell. Handles actual database operations. In BYOC deployments, the data plane runs in the customer's infrastructure.

E

Enum Type

Core Database

A user-defined data type consisting of a static, ordered set of values. Example: CREATE TYPE status AS ENUM ('active', 'inactive', 'suspended'). Values are stored as integers internally but displayed as labels.

Extension

PostgreSQL

A packaged unit of additional functionality that can be installed into a PostgreSQL database. Extensions can add types, functions, operators, indexes, and background workers. Managed with CREATE EXTENSION / DROP EXTENSION.

EXPLAIN / EXPLAIN ANALYZE

SQL

Diagnostic commands showing the query execution plan. EXPLAIN shows the estimated plan; EXPLAIN ANALYZE executes the query and shows actual timings, row counts, and buffer usage.

Event Sourcing

CDC

An architectural pattern where state changes are stored as a sequence of immutable events rather than mutable current-state records. CDC streams can be used to derive event sourcing patterns from traditional databases.

Ephemeral Environment

Branching & CoW

A short-lived database instance (typically a branch) created for a specific task — CI test run, developer feature work, AI agent session — and destroyed when done. Copy-on-write branching makes ephemeral environments economically viable at scale.

Expand / Contract Pattern

Schema MigrationsOpen Source

A two-phase migration strategy: Expand adds the new schema elements alongside the old ones. Contract removes the old elements after all applications have migrated to the new schema. Enables zero-downtime changes.

Equivalence Class

Privacy & Anonymization

A set of records that share identical values on all quasi-identifier columns after anonymization. In a k-anonymous dataset, every equivalence class contains at least k records.

Entity Recognition (PII Detection)

Privacy & AnonymizationXata

Automated classification of database columns as Direct Identifiers (DIDs), Quasi-Identifiers (QIDs), or Safe using NLP and pattern matching. Privacy Dynamics uses Microsoft Presidio with custom recognizers covering all 18 HIPAA identifier families plus additional types (coordinates, advertising IDs, license plates).

Eventual Consistency

Distributed Systems

A consistency model where all replicas converge to the same state given sufficient time without new updates. Reads may temporarily return stale data.

ELT

Analytics

Extract, Load, Transform — a data pipeline pattern where data is first loaded into a warehouse, then transformed using the warehouse's compute.

ETL

Analytics

Extract, Transform, Load — a data pipeline pattern where data is extracted from sources, transformed in an intermediate system, then loaded into the target.

F

Foreign Key

Core Database

A constraint linking one or more columns in a table to the primary key or unique columns of another table. Enforces referential integrity. Supports cascading actions: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION.

Failover

Replication & HA

The process of promoting a standby/replica to primary when the current primary becomes unavailable. Can be automatic (via Patroni, pg_auto_failover, CloudNativePG) or manual.

Fair Aggregation

Privacy & AnonymizationXata

A randomized aggregation method used during microaggregation that selects median or mode values with controlled randomness to prevent statistical bias and information leakage. Unlike deterministic aggregation, fair aggregation prevents adversaries from inferring the original value distribution within an equivalence class.

G

GUC (Grand Unified Configuration)

PostgreSQL

PostgreSQL's configuration system managing all server parameters. Settings can be applied at server, database, role, session, or transaction level. Parameters have contexts (e.g., SUSET, USERSET) controlling who can change them.

GIN (Generalized Inverted Index)

Indexing

An index type for values containing multiple elements: arrays, JSONB, full-text search vectors, and trigrams. Maps each element to the rows containing it. Excellent for containment queries (@>, ?, @@).

GiST (Generalized Search Tree)

Indexing

A balanced tree index supporting complex data types: geometric shapes, ranges, full-text search, and nearest-neighbor queries. Extensible — custom data types can define their own GiST operator classes.

Geographic-Aware Clustering

Privacy & AnonymizationXata

Microaggregation clustering that uses geographic coordinates (latitude/longitude) from GeoNames reference data to group records by physical proximity rather than raw ZIP code values. Produces more meaningful anonymized locations. Linked column hierarchies keep city, state, and ZIP code values consistent within each cluster.

H

Heap

PostgreSQL

The default table storage method in PostgreSQL. Rows are stored unordered in 8KB pages. The heap is accessed via sequential scans or through index lookups that reference heap pages by CTID.

HOT Update

PostgreSQL

Heap-Only Tuple update — an optimization where an updated row can be stored on the same heap page as the original, avoiding index updates. Requires that no indexed columns changed and the page has free space.

Hash Index

Indexing

An index using a hash function for exact-match lookups only. Faster than B-tree for simple equality queries on high-cardinality columns. Does not support range queries. WAL-logged since PostgreSQL 10.

Hot Standby

Replication & HA

A standby server that accepts read-only queries while continuously applying WAL from the primary. Enables read scaling and reporting workloads without impacting the primary.

Hybrid Database Strategy

Replication & HA

Using PostgreSQL alongside specialized databases for different workload patterns. Example: PostgreSQL for read-heavy OLTP, sharded systems for write-heavy workloads. Chosen when a single system cannot optimally serve all access patterns. OpenAI uses this approach at scale.

HIPAA

Privacy & Anonymization

Health Insurance Portability and Accountability Act — US federal law governing protected health information (PHI). The Privacy Rule defines 18 identifier types that must be addressed for de-identification. Two methods: Safe Harbor and Expert Determination.

See also: Safe Harbor, Expert Determination

Horizontal Scaling

Cloud & Deployment

Increasing database capacity by adding more nodes. Methods include read replicas (for read scaling), sharding (for write scaling), and distributed PostgreSQL (for both). More operationally complex than vertical scaling but removes single-node limits.

Helm

Kubernetes

A package manager for Kubernetes. Helm charts define templated Kubernetes manifests for deploying applications. Used for installing database operators, monitoring stacks, and application workloads.

HTAP (Hybrid Transactional/Analytical Processing)

Analytics

Database workloads combining OLTP and OLAP in a single system. Eliminates the need to replicate data to a separate analytical store. Approaches: real-time materialized views, columnar extensions (pg_duckdb), or dual-format storage. Growing in importance as agents need both transactional and analytical access.

Hibernation

Xata PlatformXata

The paused state of an idle branch where compute resources are released but data remains on disk. Branches automatically hibernate after a configurable inactivity period (default 30 minutes) and resume on the next connection.

I

Index

Core Database

A data structure maintained alongside a table to accelerate lookups on specific columns. Trades write performance and storage for faster reads.

Index-Only Scan

Indexing

A query execution strategy where all required data is retrieved from the index without touching the heap table. Requires the index to cover all columns in the query and the visibility map to confirm all-visible pages.

Isolation Level

Transactions

The degree to which a transaction is protected from concurrent modifications by other transactions. PostgreSQL supports: READ COMMITTED (default), REPEATABLE READ, and SERIALIZABLE.

IaC (Infrastructure as Code)

Cloud & Deployment

Managing cloud infrastructure through declarative configuration files rather than manual processes. Tools: Terraform, Pulumi, AWS CDK, CloudFormation. Enables version control, review, and reproducibility.

IOPS

Monitoring

Input/Output Operations Per Second — a measure of storage throughput. Read IOPS and write IOPS are typically reported separately. Database workload characteristics determine IOPS requirements.

io_uring

NVMe & Storage

A Linux kernel interface for asynchronous I/O. Uses submission and completion ring buffers shared between userspace and kernel. Provides significantly lower overhead than traditional read/write system calls.

J

JOIN

SQL

An operation combining rows from two or more tables based on a related column. Types: INNER JOIN (matching rows only), LEFT JOIN (all left + matching right), RIGHT JOIN, FULL JOIN (all from both), CROSS JOIN (Cartesian product).

K

k-Anonymity

Privacy & Anonymization

A privacy property where every record in a dataset is indistinguishable from at least k-1 other records on quasi-identifier values. Prevents singling out individuals through quasi-identifier combinations. Introduced by Latanya Sweeney in 2002.

See also: Equivalence Class, l-Diversity, Microaggregation

L

Lock

Transactions

A mechanism preventing concurrent access conflicts. PostgreSQL uses multiple lock types: row-level (shared/exclusive), table-level (eight modes from ACCESS SHARE to ACCESS EXCLUSIVE), advisory, and predicate locks.

Logical Decoding

Replication & HA

PostgreSQL's mechanism for extracting row-level changes from WAL in a logical format. Foundation for logical replication, CDC, and tools like pgstream. Requires wal_level = logical.

See also: Logical Replication, CDC, Output Plugin

Logical Replication

Replication & HA

A replication method that captures and replays DML operations (INSERT, UPDATE, DELETE) at the row level. Supports selective replication of specific tables, cross-version replication, and conflict resolution. Uses publications and subscriptions.

LSN (Log Sequence Number)

WAL

A 64-bit integer identifying a specific byte position in the WAL stream. Format: 16/B374D848. Used to track replication progress, recovery targets, and WAL consumption. Monotonically increasing.

Linkage Attack

Privacy & Anonymization

An attack that re-identifies individuals by joining anonymized data with external datasets on shared quasi-identifiers. Sweeney (2000) demonstrated that 87% of the US population is uniquely identifiable by {ZIP code, gender, date of birth}. Risk assessment simulates these attacks using Monte Carlo methods with synthetic adversary datasets.

See also: QID, Re-identification Risk, Risk Score

Linked Column Hierarchy

Privacy & AnonymizationXata

A configuration specifying that certain columns have a hierarchical relationship (e.g., city → state → ZIP code) and must be treated together during anonymization. Ensures that after microaggregation, geographic values remain internally consistent rather than producing impossible combinations.

Latency Budget

Scaling & Performance

The maximum acceptable response time for a database query, typically defined as a percentile target (p50, p95, p99). OpenAI targets low double-digit millisecond p99 latency for PostgreSQL reads. Drives decisions about indexing, caching, replica placement, and connection pooling configuration.

llms.txt

AI & Agents

A convention for placing a structured text file at a website's root (similar to robots.txt) to provide LLMs and AI agents with site context, navigation, and key content links. Proposed by Jeremy Howard. Agents check this file to understand a site's purpose and discover relevant resources like documentation and glossaries.

M

Materialized View

Core Database

A view whose results are physically stored on disk and must be explicitly refreshed. Useful for caching expensive query results. Unlike regular views, materialized views can be indexed.

Memory Context

PostgreSQL

PostgreSQL's hierarchical memory management system. Allocations are made within named contexts (e.g., TopMemoryContext, MessageContext). Freeing a parent context automatically frees all child contexts and allocations.

MVCC (Multi-Version Concurrency Control)

Transactions

PostgreSQL's concurrency model. Instead of locking rows during reads, PostgreSQL maintains multiple versions of each row. Readers see a consistent snapshot based on their transaction's start time. Writers create new row versions rather than overwriting in place.

See also: Snapshot, Transaction ID, Tuple

Migration

Schema Migrations

A versioned, ordered set of schema changes applied to a database. Migrations are typically stored as files (SQL or declarative definitions) and tracked in a metadata table.

Microaggregation

Privacy & Anonymization

An anonymization technique that groups similar records into clusters of at least k members and replaces quasi-identifier values with aggregated values (median, mode). Preserves data types and statistical properties better than suppression or generalization.

Managed Database

Cloud & Deployment

A database service where the cloud provider handles provisioning, patching, backups, failover, and scaling. Examples: AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL.

Multi-Cloud

Cloud & Deployment

Running infrastructure across two or more cloud providers. Reduces vendor lock-in and enables best-of-breed service selection. Requires cloud-agnostic tooling and standard protocols (e.g., vanilla Postgres).

MCP (Model Context Protocol)

AI & Agents

A JSON-RPC 2.0 protocol for AI agents to interact with external systems including databases. Defines tools (executable operations), resources (readable data), and prompts (templated instructions) as structured capabilities. Enables agents to discover and execute database operations programmatically without hardcoded integrations. Originated at Anthropic, adopted across the industry in 2025.

N

Normalization

Core Database

The process of organizing tables to reduce data redundancy and improve integrity. Normal forms (1NF through 5NF) provide progressively stricter rules. Most production schemas target third normal form (3NF) as a practical balance.

Null

Core Database

A special marker indicating the absence of a value. Not equal to zero, empty string, or false. NULL comparisons require IS NULL / IS NOT NULL operators because NULL = NULL evaluates to NULL (unknown), not true.

NVMe

NVMe & Storage

Non-Volatile Memory Express — a host controller interface and storage protocol designed for flash storage over PCIe. Provides low-latency, high-throughput access with multi-queue architecture. Replaced SATA/SAS for high-performance storage.

NVMe-oF (NVMe over Fabrics)

NVMe & Storage

An extension of the NVMe protocol over network fabrics (TCP, RDMA, Fibre Channel). Enables remote block storage access with near-local NVMe performance. Used for disaggregated storage architectures.

O

OID (Object Identifier)

Core Database

A 32-bit unsigned integer used internally by PostgreSQL to identify catalog objects (tables, types, functions). Every database object has an OID. Exposed in system catalogs but typically not used in application code.

Outbox Pattern

CDC

A design pattern where application writes are paired with event records in the same transaction. A CDC connector reads the outbox table and publishes events to a message broker, ensuring reliable delivery semantics.

Output Plugin

CDC

A PostgreSQL logical decoding plugin that formats WAL changes into a specific format. Examples: pgoutput (built-in, used by logical replication), wal2json (JSON format), test_decoding (text format).

Operator

Kubernetes

A Kubernetes pattern where custom controllers manage the lifecycle of complex applications. Database operators (CloudNativePG, Zalando PostgreSQL Operator, CrunchyData PGO) automate PostgreSQL operations.

OLAP

Analytics

Online Analytical Processing — query patterns characterized by complex aggregations over large datasets, typically read-heavy with few writes. Columnar storage and analytical engines are optimized for OLAP.

OLTP

Analytics

Online Transaction Processing — query patterns characterized by many short, simple transactions with reads and writes. Row-oriented storage and B-tree indexes are optimized for OLTP.

ORM (Object-Relational Mapping)

Developer Tools

A library that maps database tables to programming language objects. Handles query generation, result hydration, and schema definition. Examples: SQLAlchemy, Django ORM, Prisma, TypeORM, Sequelize, Hibernate.

P

Partition

Core Database

A physical division of a large table into smaller, more manageable pieces based on a partition key. PostgreSQL supports range, list, and hash partitioning. Queries targeting a specific partition value can skip unrelated partitions (partition pruning).

Primary Key

Core Database

A constraint guaranteeing that one or more columns uniquely identify each row and contain no NULLs. A table can have at most one primary key. Automatically creates a unique B-tree index.

Planner / Optimizer

PostgreSQL

The PostgreSQL component that analyzes a parsed query and generates an execution plan. Evaluates multiple strategies (sequential scan, index scan, join methods) and selects the one with the lowest estimated cost based on table statistics.

Postmaster

PostgreSQL

The main PostgreSQL server process. Listens for client connections and forks backend processes to handle them. Also manages background workers, checkpointing, WAL archiving, and replication processes.

Postgres Cluster

PostgreSQL

The complete set of databases managed by a single PostgreSQL server instance (postmaster). Stored in a single data directory (PGDATA). Not to be confused with a multi-node cluster for high availability.

Prepared Statement

SQL

A pre-parsed, pre-planned SQL statement with parameter placeholders. Avoids repeated parsing/planning overhead for frequently executed queries. Created with PREPARE, executed with EXECUTE.

Partial Index

Indexing

An index built on a subset of table rows, filtered by a WHERE predicate. Reduces index size and maintenance cost. Example: CREATE INDEX ON orders (created_at) WHERE status = 'pending'.

Partition Pruning

Indexing

An optimization where the query planner eliminates partitions that cannot contain rows matching the query's WHERE clause. Reduces the amount of data scanned in partitioned tables.

Physical Replication

Replication & HA

Replication by streaming raw WAL bytes from primary to standby. The standby maintains an exact byte-for-byte copy of the primary's data files. Cannot selectively replicate individual tables. Also called streaming replication.

Primary

Replication & HA

The database instance that accepts write operations. In a replication setup, the primary is the source of truth.

Promotion

Replication & HA

The act of converting a standby/replica into a primary that accepts writes. Triggered during failover. The promoted instance stops replaying WAL and begins accepting write transactions.

Publication

Replication & HA

A logical replication publisher definition. Specifies which tables' changes should be made available to subscribers. Created with CREATE PUBLICATION.

PII (Personally Identifiable Information)

Privacy & Anonymization

Any data that can identify an individual, directly or indirectly. Direct PII: name, SSN, email. Indirect PII: age, ZIP code, gender (can identify when combined). Classification varies by regulation and jurisdiction.

pg_dump / pg_restore

Backup & Recovery

Logical backup and restore utilities. pg_dump exports database objects and data as SQL or custom-format archives. pg_restore loads these archives. Supports selective schema/table backup and parallel restore.

PITR (Point-in-Time Recovery)

Backup & Recovery

Restoring a database to its exact state at a specific timestamp or LSN. Requires a base backup plus all WAL archives from the backup to the target recovery point.

PgBouncer

Networking

A lightweight PostgreSQL connection pooler. Sits between applications and the database, multiplexing many client connections onto fewer server connections. Supports session, transaction, and statement pooling modes.

pg_hba.conf

Security

PostgreSQL's host-based authentication configuration file. Each line specifies: connection type, database, user, client address, and authentication method. Evaluated top-to-bottom; first matching rule applies.

PostgreSQL DBaaS

Cloud & Deployment

Database-as-a-Service offerings for PostgreSQL. Major providers: AWS RDS/Aurora, Google Cloud SQL/AlloyDB, Azure Database for PostgreSQL, Neon, Supabase, Xata, Crunchy Bridge, Tembo, Aiven. Market consolidation accelerated in 2025 (Databricks acquired Neon, EDB acquired multiple PG companies).

pg_stat_activity

Monitoring

A PostgreSQL system view showing one row per server process with current activity: query text, state, wait events, transaction start time, and client information. Primary tool for monitoring active connections.

pg_stat_statements

Monitoring

A PostgreSQL extension tracking planning and execution statistics for all SQL statements. Records call count, total/mean/min/max time, rows returned, and shared buffer hits. Essential for identifying slow queries.

Prometheus

Monitoring

An open-source monitoring system collecting time-series metrics via a pull model. Database exporters (postgres_exporter) expose PostgreSQL metrics. Paired with Grafana for visualization and alerting.

pg_cron

Extensions

A PostgreSQL extension providing cron-based job scheduling within the database. Runs SQL commands or functions on configurable schedules without external schedulers.

pgroll

ExtensionsOpen Source

An open-source PostgreSQL schema migration tool enabling zero-downtime, reversible schema changes using the expand/contract pattern. Maintains multiple schema versions simultaneously through versioned views.

pgstream

ExtensionsOpen Source

An open-source PostgreSQL CDC and streaming replication tool. Captures row-level changes via logical decoding and streams them to targets: PostgreSQL, Kafka, Elasticsearch, webhooks. Supports data transformation and anonymization.

pgvector

Extensions

A PostgreSQL extension adding vector similarity search. Stores embeddings as the vector type with support for L2 distance, inner product, and cosine distance. Uses IVFFlat and HNSW indexes for approximate nearest neighbor search.

pgzx

ExtensionsOpen Source

A framework for building PostgreSQL extensions in the Zig programming language. Provides safe wrappers for PostgreSQL internals: memory contexts, SPI, error handling, data type conversion, and GUC configuration.

PostGIS

Extensions

A PostgreSQL extension adding geographic object support. Provides spatial data types (point, line, polygon), spatial indexes (GiST R-tree), and thousands of spatial functions. The standard for geospatial databases.

Partition Tolerance

Distributed Systems

The ability of a distributed system to continue operating when network communication between nodes is interrupted. One of the three properties in the CAP theorem. All practical distributed databases must handle partitions.

Parquet

Analytics

An open-source columnar storage format for efficient analytical queries. Supports predicate pushdown, column pruning, and compression. Increasingly used as an interchange format between databases, data lakes, and ML pipelines. Accessible from PostgreSQL via pg_duckdb and foreign data wrappers.

psql

Developer Tools

PostgreSQL's interactive terminal. Provides a command-line interface for executing SQL, managing databases, and inspecting server state. Supports tab completion, command history, and formatting options.

Q

Query Plan

Indexing

The execution strategy chosen by the optimizer for a SQL statement. Represented as a tree of plan nodes (Seq Scan, Index Scan, Nested Loop, Hash Join, Sort, Aggregate, etc.). Viewable with EXPLAIN.

QID (Quasi-Identifier)

Privacy & Anonymization

A field that is not directly identifying alone but can identify individuals when combined with other quasi-identifiers. Classic example: 87% of the US population can be uniquely identified by {ZIP code, gender, date of birth} (Sweeney, 2000).

Queue Depth

NVMe & Storage

The number of I/O commands that can be outstanding (submitted but not completed) simultaneously. Higher queue depth enables the storage device to optimize command scheduling. NVMe supports up to 65,535 commands per queue.

QPS (Queries Per Second)

Scaling & Performance

Throughput metric for database workloads. Varies by query complexity: simple key lookups may achieve millions of QPS on a single primary (OpenAI), while analytical queries may sustain only tens per second. Agent workloads tend toward high QPS of simple queries.

Query Guardrails

AI & Agents

Constraints applied to agent-generated SQL to prevent destructive or expensive operations. Includes: read-only transaction mode, row count limits, query timeout enforcement, DDL restrictions, and cost-based query rejection. Critical for safe agentic database access where agents may generate unbounded queries.

R

Relation

Core Database

In relational theory, any named table-like object. In PostgreSQL, relations include tables, views, materialized views, indexes, sequences, and composite types. Metadata stored in pg_class.

Row

Core Database

A single record in a table containing one value per column. Also called a "tuple" in relational theory and PostgreSQL internals.

Read Replica

Replication & HA

A database instance that replicates data from a primary and serves read-only queries. Used for read scaling, analytics, and geographic distribution.

Replica Identity

Replication & HA

Configuration determining which column values are included in WAL for UPDATE and DELETE during logical replication. Options: DEFAULT (primary key only), FULL (all columns), INDEX (specified unique index), NOTHING.

Replication Lag

Replication & HA

The delay between a change committed on the primary and its application on a replica. Measured in bytes (WAL position difference) or time. High replication lag indicates the replica is falling behind.

Replication Slot

Replication & HA

A server-side object tracking a consumer's position in the WAL stream. Prevents the server from discarding WAL segments that haven't been consumed. Physical slots track LSN; logical slots also track decoding state.

Read Replica Scaling

Replication & HA

Adding read-only replicas to distribute query load across multiple PostgreSQL instances. Each replica receives changes via streaming replication. Connection routing directs reads to replicas and writes to the primary. OpenAI scales PostgreSQL to ~50 read replicas per primary, handling millions of QPS.

Re-identification Risk

Privacy & Anonymization

The probability that an individual can be identified in an anonymized dataset. Assessed through simulated linkage attacks, k-anonymity metrics, or probabilistic models.

See also: Risk Score, k-Anonymity, Linkage Attack

Risk Score (Phisher K-Threshold)

Privacy & AnonymizationXata

A quantified upper bound on re-identification probability computed by simulating linkage attacks. Privacy Dynamics generates 50 synthetic adversary datasets via Monte Carlo simulation and reports the worst-case re-identification probability. Computed before and after treatment to measure anonymization effectiveness.

RPO (Recovery Point Objective)

Backup & Recovery

The maximum acceptable amount of data loss measured in time. An RPO of zero requires synchronous replication. An RPO of 1 hour means up to 1 hour of data loss is acceptable.

RTO (Recovery Time Objective)

Backup & Recovery

The maximum acceptable downtime during a failure recovery. Determines the required speed of failover, restore, and catchup processes.

RBAC (Role-Based Access Control)

Security

An authorization model where permissions are assigned to roles, and users are granted roles. PostgreSQL implements RBAC natively: roles can own objects, receive GRANT privileges, and be members of other roles.

Role

Security

A PostgreSQL principal that can own database objects and receive privileges. Roles with LOGIN privilege can connect to the database. Roles can be granted membership in other roles for privilege inheritance.

RLS (Row-Level Security)

Security

A PostgreSQL feature enabling per-row access control. Policies define which rows a user can see (SELECT) or modify (INSERT, UPDATE, DELETE). Enabled with ALTER TABLE ... ENABLE ROW LEVEL SECURITY.

Raft

Distributed Systems

A consensus algorithm designed for understandability. Used in etcd, CockroachDB, and TiKV. Elects a leader that manages log replication. Provides strong consistency guarantees.

Rate Limiting

Scaling & Performance

Restricting the number of queries or connections per unit time. Applied at application, proxy (PgBouncer), and database levels. Prevents individual clients from monopolizing resources. Especially important when AI agents generate high query volumes without built-in backpressure.

S

Schema

Core Database

A named namespace within a database that contains tables, views, functions, types, and other objects. PostgreSQL databases have a public schema by default. Used to organize objects and control access.

Sequence

Core Database

A database object generating unique, incrementing numeric values. Commonly used for auto-incrementing primary keys. PostgreSQL sequences support BIGINT values, custom start/increment, and cycling.

Stored Procedure

Core Database

A named block of procedural code stored in the database and executed via CALL. Unlike functions, stored procedures can manage transactions (commit/rollback within the procedure body). Introduced in PostgreSQL 11.

SPI (Server Programming Interface)

PostgreSQL

A C API allowing PostgreSQL extensions and stored procedures to execute SQL statements from within server-side code. Provides functions like SPI_execute(), SPI_prepare(), and result set iteration.

Shared Buffers

PostgreSQL

A region of shared memory used to cache frequently accessed data pages. All backend processes share this cache. Controlled by the shared_buffers parameter (typically 25% of system RAM). Reduces disk I/O by keeping hot data in memory.

System Columns

PostgreSQL

Hidden columns present in every table: tableoid (OID of the containing table), xmin (inserting transaction ID), xmax (deleting transaction ID), cmin/cmax (command IDs within transaction), ctid (physical row location).

Skip Scan

PostgreSQL

A B-tree index optimization added in PostgreSQL 18 that efficiently handles queries where the leading index column has low cardinality. Instead of scanning the entire index, the executor skips to the next distinct value of the leading column. Eliminates the need for certain multi-column indexes.

Subquery

SQL

A query nested inside another query. Can appear in FROM (derived table), WHERE (scalar or correlated subquery), or SELECT (scalar subquery). Correlated subqueries reference the outer query and execute once per outer row.

Sequential Scan

Indexing

Reading every row in a table from disk in physical storage order. Optimal when selecting a large fraction of the table or when no suitable index exists.

SpGiST

Indexing

Space-Partitioned GiST — an index type for data that can be partitioned into non-overlapping regions: points, IP addresses, phone numbers, text (radix tree). More efficient than GiST for certain data distributions.

Statistics Collector

Indexing

PostgreSQL subsystem that gathers information about table contents (row counts, most common values, histograms, correlation). The planner uses these statistics for cost estimation. Updated by ANALYZE.

Savepoint

Transactions

A named point within a transaction that can be rolled back to without aborting the entire transaction. Created with SAVEPOINT, rolled back with ROLLBACK TO SAVEPOINT.

Snapshot

Transactions

A consistent view of the database at a specific point in time. Each transaction operates on its own snapshot, seeing only data committed before the snapshot was taken. Foundation of MVCC.

SSI (Serializable Snapshot Isolation)

Transactions

PostgreSQL's implementation of SERIALIZABLE isolation. Tracks read/write dependencies between concurrent transactions and aborts transactions that would produce results inconsistent with some serial execution order.

Standby

Replication & HA

A replica server that continuously applies changes from the primary. Can be a hot standby (accepting reads) or warm standby (no queries). Standby servers are candidates for failover promotion.

Streaming Replication

Replication & HA

Continuous replication of WAL records from primary to standby over a TCP connection. Changes are sent as they occur, providing near-real-time replication. The default replication method for PostgreSQL HA.

Subscription

Replication & HA

A logical replication subscriber definition. Connects to a publication on a remote server and applies replicated changes locally. Created with CREATE SUBSCRIPTION.

Synchronous Replication

Replication & HA

Replication where the primary waits for one or more replicas to confirm that changes have been written before committing the transaction. Guarantees zero data loss at the cost of increased commit latency.

Scale to Zero

Branching & CoWXata

The ability for idle database branches to automatically release compute resources (CPU, memory) while preserving data on disk. Branches resume when a connection is established. Eliminates the cost of idle environments.

Schema Diff

Schema MigrationsXata

A comparison between two schema versions showing added, removed, and modified objects. Useful for reviewing branch changes before merging.

Synthetic Data

Privacy & Anonymization

Artificially generated data that mimics the statistical properties of real data without containing actual records. Used for DID faking (format-preserving synthetic values via Mimesis) and for generating adversary datasets in risk scoring simulations. Distinct from anonymized data, which starts from real records.

See also: Faking, Risk Score (Phisher K-Threshold)

SCRAM-SHA-256

Security

The recommended password authentication mechanism in PostgreSQL 10+. Uses a challenge-response protocol with salted hashing, preventing password interception and replay attacks.

Serverless

Cloud & Deployment

A compute model where resources are allocated on-demand per request or query, with automatic scaling to zero during inactivity. Eliminates idle resource costs.

StatefulSet

Kubernetes

A Kubernetes workload type for applications requiring stable identities, ordered deployment, and persistent storage. Used for databases where each instance needs a consistent hostname and dedicated storage.

Sharding

Distributed Systems

Distributing a large dataset across multiple database instances (shards), each holding a subset of the data. Partition key determines which shard holds each row. Enables horizontal scaling beyond single-node limits.

Staging Environment

Developer Tools

A pre-production environment that mirrors production for testing. Traditionally requires full database copies. Database branching provides staging environments as lightweight, instant branches.

SQL Gateway

Xata PlatformXata

The service providing PostgreSQL wire protocol access to Xata branches. Handles SSL termination, authentication, and routing to the correct PostgreSQL cluster based on the branch ID encoded in the connection hostname.

Serverless Proxy

Xata PlatformXata

An HTTP/WebSocket interface for connecting to Xata databases from serverless functions and edge environments where persistent TCP connections are impractical. Compatible with the Neon serverless driver.

T

Table

Core Database

The fundamental data storage structure in a relational database. Organized as rows and columns with a defined schema. Tables can have constraints, indexes, triggers, and policies.

Tablespace

Core Database

A named location on disk where PostgreSQL stores data files. Allows distributing tables and indexes across different physical storage devices for performance or capacity management.

Trigger

Core Database

A function automatically executed in response to INSERT, UPDATE, DELETE, or TRUNCATE events on a table. Can fire BEFORE, AFTER, or INSTEAD OF the triggering event. Used for auditing, validation, and data synchronization.

Tuple

Core Database

PostgreSQL's internal term for a row. A heap tuple is the physical on-disk representation of a row, including system columns (xmin, xmax, ctid) used for MVCC.

TOAST

PostgreSQL

The Oversized Attribute Storage Technique — PostgreSQL's mechanism for storing large column values (over ~2KB). Values are compressed and/or split across a separate TOAST table. Transparent to applications.

Transaction

Transactions

An atomic unit of work that either fully completes (COMMIT) or fully reverts (ROLLBACK). Transactions maintain ACID properties and provide isolation from concurrent operations.

Transaction ID (XID)

Transactions

A 32-bit identifier assigned to each write transaction. Used by MVCC to determine row visibility. Stored in the xmin and xmax system columns of each tuple. Subject to wraparound, managed by VACUUM.

Two-Phase Commit (2PC)

Transactions

A protocol for coordinating transaction commits across multiple database instances. Phase 1: all participants prepare (PREPARE TRANSACTION). Phase 2: coordinator issues COMMIT PREPARED or ROLLBACK PREPARED to all participants.

Thin Provisioning

Branching & CoW

A storage allocation strategy where capacity is assigned on demand rather than pre-allocated. A thin-provisioned volume appears to have a large size but consumes only the space actually written.

Treatment Pipeline

Privacy & AnonymizationXata

The multi-stage anonymization process: (1) entity recognition to classify columns, (2) risk assessment via simulated linkage attacks, (3) DID suppression (redact, mask, fake, or hash), (4) k-member microaggregation for QIDs, (5) distortion analysis measuring information loss. Each stage is configurable per column and per dataset.

See also: Entity Recognition (PII Detection), Microaggregation, Distortion

Thundering Herd

Networking

A failure pattern where many clients simultaneously retry or reconnect after a service interruption, overwhelming the recovering system. Mitigated by exponential backoff with jitter, connection pooling, and cache locking. Critical at scale — OpenAI prevents this across 50+ PostgreSQL read replicas.

TPS (Transactions Per Second)

Monitoring

A measure of database throughput for write-heavy workloads. Commonly benchmarked with pgbench. Affected by transaction isolation level, WAL configuration, and storage performance.

TimescaleDB

Extensions

A PostgreSQL extension for time-series data. Provides automatic partitioning (hypertables), continuous aggregates, compression, and retention policies optimized for time-series workloads.

Text-to-SQL

AI & Agents

Translating natural language queries into valid SQL statements. Accuracy depends on schema awareness, join path inference, dialect-specific syntax, and column naming conventions. Modern approaches use LLMs with schema context rather than rule-based parsers. A core capability for database-connected AI agents.

Tool Use (Function Calling)

AI & Agents

The mechanism by which an LLM invokes external functions during response generation. In database contexts, tools typically wrap SQL execution, schema inspection, migration commands, branch management, and data export operations. The bridge between agent reasoning and database actions.

U

Unique Constraint

Core Database

A constraint ensuring that all values in the specified column(s) are distinct across all rows. Allows one NULL value per column. Automatically creates a unique index.

V

View

Core Database

A named query stored in the database and referenced like a table. Views do not store data; they execute their underlying query each time they are accessed. Can be used for access control, simplification, and abstraction.

Vacuum

PostgreSQL

A maintenance operation that reclaims storage from dead tuples, updates visibility maps, and refreshes planner statistics. VACUUM runs without exclusive locks; VACUUM FULL rewrites the entire table and requires an exclusive lock.

See also: Dead Tuple, Visibility Map, Bloat

Visibility Map

PostgreSQL

A bitmap tracking which heap pages contain only tuples visible to all current transactions. Allows index-only scans to skip heap fetches for pages marked all-visible. Updated by VACUUM.

Version Schema

Schema MigrationsOpen Source

A PostgreSQL schema created during a migration containing views that expose the database tables with the appropriate column names and types for that schema version. Allows multiple application versions to coexist.

Vertical Scaling

Cloud & Deployment

Increasing database capacity by upgrading to a larger instance with more CPU, RAM, or faster storage. PostgreSQL scales vertically to large instance sizes — OpenAI runs single PostgreSQL primaries handling millions of queries per second. Simple but has upper limits.

W

work_mem

PostgreSQL

Per-operation memory limit for internal sort operations, hash tables, and bitmap heap scans. Set per session/query. Too low causes disk spills to temporary files; too high risks memory exhaustion under concurrency.

Window Function

SQL

A function that performs a calculation across a set of rows related to the current row without collapsing them into a single output row. Examples: ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER(). Defined with the OVER() clause.

WAL (Write-Ahead Logging)

WAL

The mechanism ensuring data durability by writing all changes to a sequential log before applying them to data files. On crash recovery, PostgreSQL replays the WAL from the last checkpoint to restore consistency.

WAL Level

WAL

Configuration controlling how much information is written to WAL. Levels: minimal (crash recovery only), replica (physical replication, default), logical (logical decoding and replication).

WAL Segment

WAL

A fixed-size file (default 16MB) in the pg_wal directory containing sequential WAL records. Segments are recycled after checkpoint and archiving. Segment size is configurable at cluster initialization.

wal2json

CDC

A PostgreSQL output plugin for logical decoding that formats WAL changes as JSON objects. Each change includes the table name, operation type, column names, types, and values.

Write Amplification

Storage

The ratio of actual physical writes to logical writes requested by the application. Caused by journaling, MVCC versioning, WAL, index maintenance, and CoW operations.

Wire Protocol

Networking

The message-based protocol PostgreSQL uses for client-server communication over TCP. Defines message formats for authentication, query execution, result transmission, and error reporting. Increasingly adopted as a compatibility target by non-PostgreSQL databases (CockroachDB, YugabyteDB, Aurora, AlloyDB, QuestDB, ClickHouse).

Wire Protocol Compatibility

Networking

Building non-PostgreSQL databases that accept PostgreSQL client connections by implementing the PostgreSQL wire protocol. Allows reuse of existing drivers, ORMs, and tooling. A major industry trend — over 20 databases now advertise PostgreSQL wire protocol support.

X

Xata Agent

Xata PlatformOpen Source

An AI-powered PostgreSQL monitoring and optimization tool. Analyzes query performance, suggests index improvements, and identifies configuration opportunities.

Xata CLI

Xata PlatformXata

Command-line tool for managing Xata projects, branches, migrations, and replication. Commands include xata init, xata branch create, xata clone, xata stream, and xata roll.

Xata Core

Xata PlatformOpen Source

The open-source core of the Xata platform implementing copy-on-write database branching on standard PostgreSQL. Enables instant branch creation, delta-only storage, and scale-to-zero compute.

xatastor

Xata PlatformXata

The NVMe-oF storage target providing high-performance block storage with ZFS-based copy-on-write. Exposes volumes via NVMe over TCP with a gRPC management API. The storage backbone enabling instant branching.

Z

Zero-Downtime Migration

Schema Migrations

A schema migration that completes without application downtime or degraded performance. Achieved through techniques like expand/contract, column duplication, background backfilling, and versioned views.

ZFS

Storage

A combined filesystem and volume manager featuring copy-on-write, snapshots, clones, compression, checksumming, and RAID-Z. Used as a storage backend for database branching systems.

zvol

Storage

A ZFS volume — a dataset that exposes a block device interface rather than a filesystem. Used to provide raw block storage to databases or NVMe targets. Supports thin provisioning, snapshots, and clones.