Database Infrastructure Glossary

A structured reference for every term in modern database infrastructure — built for AI agents, coding assistants, and the humans who work with them. Vendor-neutral definitions. Xata-specific terms clearly labeled.

Maintained by

Xata

258 terms across 26 categories

Last updated February 2026

Agent context files

Download these files as context for AI coding agents, LLM tool definitions, CLAUDE.md skills, or offline reference. Same content as this page in a flat, parseable format optimized for token efficiency.

Also referenced in /llms.txt

258 terms

A

ACID

Core Database

The four properties guaranteeing reliable transaction processing: Atomicity (all or nothing), Consistency (valid state transitions), Isolation (concurrent transactions don't interfere), Durability (committed data survives failures). Coined by Andreas Reuter and Theo Harder in 1983.

See also: Transaction, BASE

Async I/O

PostgreSQL

Asynchronous input/output allowing PostgreSQL to issue multiple storage requests concurrently rather than waiting for each to complete sequentially. Implemented in PostgreSQL 18 (2025) via io_method=io_uring on Linux and io_method=worker on other platforms. Significantly improves throughput on NVMe and SSD storage.

See also: NVMe-oF

A function that computes a single result from a set of rows: COUNT, SUM, AVG, MIN, MAX, ARRAY_AGG, STRING_AGG. Used with GROUP BY to compute per-group results.

See also: Window Function

Advisory Lock

Transactions

An application-controlled lock managed by PostgreSQL but not tied to any table or row. Acquired and released explicitly by application code. Used for coordinating distributed processes or preventing duplicate work.

Asynchronous Replication

Replication & HA

Replication where the primary does not wait for replicas to confirm receipt of changes before committing. Offers lower latency but allows a small window of potential data loss on primary failure.

See also: Synchronous Replication

Anonymization

Privacy & Anonymization

The process of transforming data to prevent identification of individuals while preserving the data's utility for analysis, testing, or development. Distinct from pseudonymization, which is reversible.

See also: Pseudonymization, De-identification

Association Distortion

Privacy & AnonymizationXata

A metric measuring how well inter-column relationships are preserved after anonymization. Uses NPMI (Normalized Pointwise Mutual Information) to compare column pair correlations before and after treatment. Critical for maintaining the analytical value of anonymized datasets.

See also: Column Distortion, NPMI

Agent Isolation

AI & AgentsXata

Providing each AI coding agent with its own isolated database environment (branch) to prevent agent-generated mutations from affecting production or other agents' work. Branching provides governance by architecture rather than by policy.

See also: Branch, Ephemeral Environment, Agent Database Provisioning

Automatically creating isolated database environments when AI agents start work and cleaning them up when done. Requires fast provisioning (seconds, not minutes), production-representative data, and low per-environment cost. Copy-on-write branching makes this economically viable — agents now create the majority of new database instances at scale.

See also: Agent Isolation, Branch, Scale to Zero

AGENTS.md

AI & AgentsXata

A convention file placed in a repository root that instructs AI coding agents on project setup, database configuration, and safe development practices. Specifies branch creation, connection strings, migration commands, and cleanup steps. Part of the emerging standard for agent-aware infrastructure.

See also: llms.txt

AI Coding Agent

AI & Agents

Software that autonomously generates, tests, and deploys code with minimal human intervention. Examples: Cursor, GitHub Copilot, Devin, Claude Code, Windsurf. Agents create database schema changes, run migrations, execute tests, and may operate multiple concurrent database sessions — all requiring isolated, writable database access with real data.

A software development workflow where AI agents perform substantial portions of coding, testing, and debugging tasks. Database demand scales with machine activity, not headcount. Requires infrastructure providing: isolated environments per agent, fast provisioning, production-representative data, and automatic cleanup.

B

BASE

Core Database

An alternative consistency model to ACID used in distributed systems: Basically Available, Soft state, Eventually consistent. Common in NoSQL and distributed databases where availability is prioritized over immediate consistency.

See also: ACID, Eventual Consistency

Backend Process

PostgreSQL

A server process spawned by the postmaster to handle a single client connection. Each backend has its own memory space (work_mem, temp_buffers) and communicates with other backends through shared memory.

A process registered by an extension to perform work independently of client connections. Used for periodic maintenance, monitoring, custom replication, or application logic.

B-tree

Indexing

The default index type in PostgreSQL. A balanced tree structure supporting equality and range queries. Maintains sorted order. Optimal for high-cardinality columns.

See also: Index, GIN, GiST

An index storing summary information (min/max) for ranges of physical table blocks. Extremely compact for large, naturally ordered tables (e.g., time-series data). Not useful for randomly ordered data.

Branch

Branching & CoWXata

A logically independent copy of a database created through copy-on-write snapshotting. Branches share unchanged data blocks with their parent, making creation near-instant regardless of database size. Each branch is a fully writable, standard PostgreSQL instance.

See also: Copy-on-Write, Base Branch, Child Branch

Base Branch

Branching & CoWXata

The source branch from which child branches are created. Typically the "main" or "production" branch. The base branch's data serves as the starting point for all derived branches.

See also: Branch, Child Branch

Backfill

Schema Migrations

The process of populating a new or transformed column with values derived from existing data. During zero-downtime migrations, backfilling runs in batches to avoid long-running locks.

Baseline Migration

Schema Migrations

An initial migration that captures the current state of an existing database schema without applying changes. Establishes the starting point for version-controlled schema management.

Base Backup

Backup & Recovery

A full physical copy of a PostgreSQL cluster's data directory, taken while the server is running. Combined with WAL archives, enables point-in-time recovery. Created with pg_basebackup.

See also: PITR, Continuous Archiving

BYOC (Bring Your Own Cloud)

Cloud & DeploymentXata

A deployment model where the vendor's software runs in the customer's own cloud account or infrastructure. The customer retains control of their data plane while the vendor manages the control plane.

See also: Control Plane, Data Plane

Bloat

Monitoring

Wasted space in tables and indexes caused by dead tuples, unused pages, and fragmentation. Measured as the ratio of actual size to minimum possible size. Resolved by VACUUM, REINDEX, or pg_repack.

See also: Vacuum, Dead Tuple

C

CAP Theorem

Core Database

States that a distributed data store can guarantee at most two of three properties simultaneously: Consistency (every read returns the most recent write), Availability (every request receives a response), Partition tolerance (the system continues operating despite network partitions). Formulated by Eric Brewer in 2000.

See also: Partition Tolerance, Eventual Consistency

Cardinality

Core Database

The number of distinct values in a column relative to total rows. High cardinality (many unique values, e.g., email addresses) vs. low cardinality (few unique values, e.g., boolean flags). Affects index selection and query optimization.

Catalog

Core Database

System tables maintained by the database engine containing metadata about all database objects: tables, columns, data types, functions, indexes, constraints, and permissions. In PostgreSQL, accessible via the pg_catalog schema.

Column

Core Database

A named attribute of a table with a defined data type. Each row in the table contains one value for each column. Also called a "field" or "attribute" in relational theory.

Composite Type

Core Database

A data type composed of multiple named fields, each with its own type. In PostgreSQL, every table automatically defines a composite type with the same name and structure.

Constraint

Core Database

A rule enforced by the database to maintain data integrity. Types include: PRIMARY KEY, FOREIGN KEY, UNIQUE, CHECK, NOT NULL, and EXCLUDE. Constraints are evaluated on every insert, update, or delete that affects the constrained columns.

See also: Primary Key, Foreign Key, Unique Constraint

Checkpoint

PostgreSQL

A periodic operation where PostgreSQL writes all dirty (modified) pages from shared buffers to disk and creates a WAL checkpoint record. Ensures that recovery only needs to replay WAL from the last checkpoint. Controlled by checkpoint_timeout and max_wal_size.

See also: WAL, Shared Buffers

CTID

PostgreSQL

Tuple identifier — a pair (block number, offset) pointing to the physical location of a row on disk. Changes when a row is updated (due to MVCC creating a new tuple version). Not stable across vacuums.

See also: Tuple, MVCC

A temporary named result set defined with the WITH clause. CTEs improve readability and can be recursive. In PostgreSQL 12+, non-recursive CTEs can be inlined by the optimizer.

An index containing all columns needed to satisfy a query, allowing the executor to return results directly from the index without accessing the heap table. Created with INCLUDE clause in PostgreSQL 11+.

See also: Index-Only Scan

The practice of continuously archiving WAL segments as they are completed. Combined with a base backup, enables point-in-time recovery to any moment within the archive retention period.

See also: WAL Archiving, PITR

The practice of capturing row-level changes (inserts, updates, deletes) from a database and delivering them to downstream systems in near-real-time. Implemented via WAL logical decoding in PostgreSQL.

See also: Logical Decoding, pgstream

Child Branch

Branching & CoWXata

A branch derived from a parent/base branch. Inherits all data at the moment of creation. Subsequent writes to either the parent or child do not affect the other.

See also: Branch, Base Branch

Clone

Branching & CoW

A writable copy of a filesystem, volume, or dataset created from a snapshot. Clones share all data blocks with the source snapshot until modified. In ZFS: zfs clone snapshot@name clone/name.

See also: Snapshot, Copy-on-Write

Copy-on-Write (CoW)

Branching & CoW

A storage-level optimization where a copy operation does not duplicate data immediately. Instead, the copy shares the same physical data blocks as the source. Blocks are duplicated only when one copy modifies them. Enables instant, space-efficient database branching.

See also: Branch, Delta Storage, ZFS

A migration technique where a new column is created alongside the existing one, data is backfilled, and triggers synchronize both columns until all applications have migrated.

See also: pgroll, Backfill

Column Distortion

Privacy & AnonymizationXata

A metric measuring how much a single column's value distribution changed after anonymization. Computed by comparing frequency distributions (categorical) or kernel density estimates (numeric) between original and treated data. Lower distortion indicates better data utility preservation.

See also: Distortion, NPMI

A storage format where data is organized by column rather than by row. Efficient for analytical queries that scan a few columns across many rows. Examples: Parquet, pg_duckdb, cstore_fdw.

See also: OLAP

Reducing data size on disk. PostgreSQL supports TOAST compression (pglz, lz4). Storage layers may add transparent compression (ZFS lz4, zstd). Columnar formats achieve high compression through column-wise encoding.

See also: TOAST

Sharing a pool of database connections across multiple application processes. Reduces the overhead of establishing new connections. Tools: PgBouncer, pgpool-II, built-in application-level pools.

See also: PgBouncer

A URI or key-value format specifying database connection parameters: host, port, database name, user, password, SSL mode. PostgreSQL format: postgresql://user:password@host:port/dbname?sslmode=require.

Cache Locking

Networking

A concurrency pattern where only one process recomputes an expired cache entry while others wait or serve stale data. Prevents thundering herd on cache expiry. Also called cache stampede prevention. Essential for high-QPS database workloads.

See also: Thundering Herd

Pre-establishing database connections before they are needed to avoid latency spikes from cold connection pools. OpenAI reduced PostgreSQL connection establishment overhead from 50ms to 5ms through connection warmup optimization.

See also: Connection Pooling

An open-source Kubernetes operator for managing PostgreSQL clusters. Handles provisioning, HA, failover, backups, scaling, and rolling updates. Uses custom resource definitions (CRDs) to define cluster specifications.

See also: Operator

Cache Hit Ratio

Monitoring

The fraction of data requests served from shared buffers without reading from disk. A healthy ratio is typically above 99% for OLTP workloads.

See also: Shared Buffers

Citus

Extensions

A PostgreSQL extension that distributes tables across multiple nodes for horizontal scaling. Supports distributed queries, multi-tenant schemas, and real-time analytics. Maintained by Microsoft.

Consensus

Distributed Systems

A protocol by which distributed nodes agree on a single value or sequence of values. Used for leader election, log replication, and configuration management. Algorithms: Raft, Paxos, Zab.

CI/CD

Developer Tools

Continuous Integration / Continuous Delivery — automated processes that build, test, and deploy code changes. In database infrastructure: schema migrations run in CI pipelines, branched databases provide isolated test environments.

Connection Saturation

Scaling & Performance

The state where all available database connections are in use, causing new requests to queue or fail. A common bottleneck in high-concurrency environments, especially with agent workloads that open many parallel connections. Addressed by connection pooling, query optimization, and per-role connection limits.

See also: Connection Pooling, PgBouncer

Context Window

AI & Agents

The maximum number of tokens an LLM can process in a single request. Determines how much schema metadata, query results, error messages, and instructions an agent can consider simultaneously. Larger context windows enable more complex database interactions but increase latency and cost.

Cell

Xata PlatformXata

A regional deployment unit in the Xata data plane. Each cell runs a CloudNativePG operator managing PostgreSQL cluster lifecycle, provisioning, scaling, backups, and high availability.

See also: Data Plane, CloudNativePG

Control Plane

Xata PlatformXata

The centralized Xata service handling API access, business logic, user/organization management, project configuration, and branch metadata. Region-independent.

See also: Data Plane

D

SQL statements that define or modify database structure: CREATE, ALTER, DROP, TRUNCATE, COMMENT, RENAME. DDL changes are typically auto-committed and may acquire exclusive locks.

See also: DML

SQL statements that read or modify data within existing structures: SELECT, INSERT, UPDATE, DELETE, MERGE. DML operations run within transactions.

See also: DDL, Transaction

Data Type

Core Database

The classification of a column's allowed values. Determines storage format, valid operations, and comparison behavior. Common types: INTEGER, TEXT, BOOLEAN, TIMESTAMP, UUID, JSONB, NUMERIC, BYTEA.

Database

Core Database

A named collection of schemas, tables, and other objects managed by a database server. In PostgreSQL, a single cluster can host multiple databases, each isolated with its own catalog.

Default Value

Core Database

An expression automatically evaluated and stored when a row is inserted without an explicit value for that column. Can be a literal, a function call (e.g., now()), or a sequence reference.

Domain

Core Database

A user-defined data type based on an existing type with additional constraints. Useful for enforcing business rules (e.g., a positive_integer domain that wraps INTEGER with a CHECK (value > 0) constraint).

Dead Tuple

PostgreSQL

A row version that is no longer visible to any active transaction but has not yet been reclaimed by VACUUM. Accumulation of dead tuples leads to table bloat and degraded performance.

See also: Vacuum, Bloat

Deadlock

Transactions

A situation where two or more transactions each hold a lock the other needs, creating a circular dependency. PostgreSQL detects deadlocks automatically and aborts one transaction to break the cycle.

See also: Lock

An open-source distributed platform for CDC built on Kafka Connect. Captures changes from PostgreSQL, MySQL, MongoDB, and other databases. Uses PostgreSQL's logical decoding API.

See also: CDC

Delta Storage

Branching & CoW

Storage consumed by branch modifications. In a CoW system, a branch's storage cost equals the delta (changed blocks) relative to its source. Idle branches with no modifications consume near-zero additional storage.

See also: Copy-on-Write

Data Masking

Privacy & Anonymization

Replacing sensitive data values with obfuscated versions that preserve format and length. Examples: john@example.com becomes j***@e******.com, phone number 555-1234 becomes XXX-XXXX.

See also: Anonymization, Redaction

De-identification

Privacy & Anonymization

The process of removing or transforming personally identifiable information from a dataset so that individuals cannot be reasonably identified. Regulatory frameworks (HIPAA, GDPR) define specific requirements.

See also: HIPAA, GDPR

Differential Privacy

Privacy & Anonymization

A mathematical framework providing provable privacy guarantees. Adds calibrated noise to query results or data, ensuring that the presence or absence of any individual's data does not significantly change the output.

DID (Direct Identifier)

Privacy & Anonymization

A field that alone identifies a person: name, Social Security number, email address, phone number, medical record number. Typically suppressed (removed, masked, or faked) during anonymization.

See also: QID, PII

Data Page

Storage

The fundamental I/O unit in PostgreSQL: an 8KB block containing row data, free space information, and item pointers. Tables and indexes are organized as sequences of pages.

An architecture separating database compute (query processing) from storage (data persistence) into independently scalable layers. Enables scaling reads without duplicating storage and scaling storage without adding compute. Used by Neon, Aurora, AlloyDB, and Xata. A defining trend in modern PostgreSQL infrastructure.

See also: Scale to Zero, Copy-on-Write

Distributed PostgreSQL

Distributed Systems

Extending PostgreSQL to operate across multiple nodes for horizontal scalability. Approaches include: transparent sharding middleware (Citus, PgDog, Multigres), consensus-based NewSQL (CockroachDB, YugabyteDB), and cloud-native multi-writer (Aurora Limitless). An active area of development in 2025-2026 with multiple new entrants.

See also: Sharding, Wire Protocol Compatibility

An open-source tool for transforming data in warehouses using SQL. Manages a DAG of SQL models, handles dependencies, runs tests, and generates documentation.

Software that manages versioned schema changes. Tracks which migrations have been applied and applies pending ones in order. Examples: pgroll, Alembic, Flyway, Liquibase, golang-migrate, Prisma Migrate.

See also: Migration, pgroll

Database Consolidation

Scaling & Performance

The industry trend of fewer, larger database providers acquiring smaller ones. In 2025: Databricks acquired Neon, EDB consolidated several PostgreSQL companies. Impacts vendor selection — open-source core and standard PostgreSQL reduce consolidation risk by ensuring portability.

The iterative cycle where an AI agent reads schema → generates SQL or migration → executes against a database → observes results → adjusts and repeats. Each iteration requires a writable, isolated database with production-representative data. The speed and cost of this loop determines agent productivity.

Data Plane

Xata PlatformXata

The component containing PostgreSQL clusters and the SQL gateway. Deployed per region/cell. Handles actual database operations. In BYOC deployments, the data plane runs in the customer's infrastructure.

See also: Control Plane, BYOC

E

Enum Type

Core Database

A user-defined data type consisting of a static, ordered set of values. Example: CREATE TYPE status AS ENUM ('active', 'inactive', 'suspended'). Values are stored as integers internally but displayed as labels.

Extension

PostgreSQL

A packaged unit of additional functionality that can be installed into a PostgreSQL database. Extensions can add types, functions, operators, indexes, and background workers. Managed with CREATE EXTENSION / DROP EXTENSION.

See also: pgroll, pgvector, PostGIS

Diagnostic commands showing the query execution plan. EXPLAIN shows the estimated plan; EXPLAIN ANALYZE executes the query and shows actual timings, row counts, and buffer usage.

See also: Query Plan

An architectural pattern where state changes are stored as a sequence of immutable events rather than mutable current-state records. CDC streams can be used to derive event sourcing patterns from traditional databases.

Ephemeral Environment

Branching & CoW

A short-lived database instance (typically a branch) created for a specific task — CI test run, developer feature work, AI agent session — and destroyed when done. Copy-on-write branching makes ephemeral environments economically viable at scale.

A two-phase migration strategy: Expand adds the new schema elements alongside the old ones. Contract removes the old elements after all applications have migrated to the new schema. Enables zero-downtime changes.

See also: Zero-Downtime Migration, pgroll

Equivalence Class

Privacy & Anonymization

A set of records that share identical values on all quasi-identifier columns after anonymization. In a k-anonymous dataset, every equivalence class contains at least k records.

See also: k-Anonymity, QID

Entity Recognition (PII Detection)

Privacy & AnonymizationXata

Automated classification of database columns as Direct Identifiers (DIDs), Quasi-Identifiers (QIDs), or Safe using NLP and pattern matching. Privacy Dynamics uses Microsoft Presidio with custom recognizers covering all 18 HIPAA identifier families plus additional types (coordinates, advertising IDs, license plates).

See also: DID, QID, Presidio

Eventual Consistency

Distributed Systems

A consistency model where all replicas converge to the same state given sufficient time without new updates. Reads may temporarily return stale data.

See also: BASE, CAP Theorem

ELT

Analytics

Extract, Load, Transform — a data pipeline pattern where data is first loaded into a warehouse, then transformed using the warehouse's compute.

See also: ETL

ETL

Analytics

Extract, Transform, Load — a data pipeline pattern where data is extracted from sources, transformed in an intermediate system, then loaded into the target.

See also: ELT

F

Foreign Key

Core Database

A constraint linking one or more columns in a table to the primary key or unique columns of another table. Enforces referential integrity. Supports cascading actions: CASCADE, SET NULL, SET DEFAULT, RESTRICT, NO ACTION.

See also: Primary Key, Constraint

Failover

Replication & HA

The process of promoting a standby/replica to primary when the current primary becomes unavailable. Can be automatic (via Patroni, pg_auto_failover, CloudNativePG) or manual.

See also: Promotion, Standby

Fair Aggregation

Privacy & AnonymizationXata

A randomized aggregation method used during microaggregation that selects median or mode values with controlled randomness to prevent statistical bias and information leakage. Unlike deterministic aggregation, fair aggregation prevents adversaries from inferring the original value distribution within an equivalence class.

See also: Microaggregation, Equivalence Class

G

PostgreSQL's configuration system managing all server parameters. Settings can be applied at server, database, role, session, or transaction level. Parameters have contexts (e.g., SUSET, USERSET) controlling who can change them.

An index type for values containing multiple elements: arrays, JSONB, full-text search vectors, and trigrams. Maps each element to the rows containing it. Excellent for containment queries (@>, ?, @@).

See also: B-tree, pg_trgm

A balanced tree index supporting complex data types: geometric shapes, ranges, full-text search, and nearest-neighbor queries. Extensible — custom data types can define their own GiST operator classes.

See also: PostGIS, SpGiST

GDPR

Privacy & Anonymization

General Data Protection Regulation — European Union regulation governing personal data protection. Requires lawful basis for processing personal data, data minimization, purpose limitation, and rights to access, rectification, and erasure. Effective May 2018.

Geographic-Aware Clustering

Privacy & AnonymizationXata

Microaggregation clustering that uses geographic coordinates (latitude/longitude) from GeoNames reference data to group records by physical proximity rather than raw ZIP code values. Produces more meaningful anonymized locations. Linked column hierarchies keep city, state, and ZIP code values consistent within each cluster.

See also: Microaggregation, Linked Column Hierarchy

H

Heap

PostgreSQL

The default table storage method in PostgreSQL. Rows are stored unordered in 8KB pages. The heap is accessed via sequential scans or through index lookups that reference heap pages by CTID.

See also: Data Page, CTID

HOT Update

PostgreSQL

Heap-Only Tuple update — an optimization where an updated row can be stored on the same heap page as the original, avoiding index updates. Requires that no indexed columns changed and the page has free space.

See also: Fillfactor

Hash Index

Indexing

An index using a hash function for exact-match lookups only. Faster than B-tree for simple equality queries on high-cardinality columns. Does not support range queries. WAL-logged since PostgreSQL 10.

Hot Standby

Replication & HA

A standby server that accepts read-only queries while continuously applying WAL from the primary. Enables read scaling and reporting workloads without impacting the primary.

See also: Standby, Read Replica

Hybrid Database Strategy

Replication & HA

Using PostgreSQL alongside specialized databases for different workload patterns. Example: PostgreSQL for read-heavy OLTP, sharded systems for write-heavy workloads. Chosen when a single system cannot optimally serve all access patterns. OpenAI uses this approach at scale.

See also: Horizontal Scaling

HIPAA

Privacy & Anonymization

Health Insurance Portability and Accountability Act — US federal law governing protected health information (PHI). The Privacy Rule defines 18 identifier types that must be addressed for de-identification. Two methods: Safe Harbor and Expert Determination.

See also: Safe Harbor, Expert Determination

Horizontal Scaling

Cloud & Deployment

Increasing database capacity by adding more nodes. Methods include read replicas (for read scaling), sharding (for write scaling), and distributed PostgreSQL (for both). More operationally complex than vertical scaling but removes single-node limits.

See also: Vertical Scaling, Read Replica, Distributed PostgreSQL

Helm

Kubernetes

A package manager for Kubernetes. Helm charts define templated Kubernetes manifests for deploying applications. Used for installing database operators, monitoring stacks, and application workloads.

Database workloads combining OLTP and OLAP in a single system. Eliminates the need to replicate data to a separate analytical store. Approaches: real-time materialized views, columnar extensions (pg_duckdb), or dual-format storage. Growing in importance as agents need both transactional and analytical access.

See also: OLTP, OLAP

Hibernation

Xata PlatformXata

The paused state of an idle branch where compute resources are released but data remains on disk. Branches automatically hibernate after a configurable inactivity period (default 30 minutes) and resume on the next connection.

See also: Scale to Zero

I

Index

Core Database

A data structure maintained alongside a table to accelerate lookups on specific columns. Trades write performance and storage for faster reads.

See also: B-tree, GIN, GiST

A query execution strategy where all required data is retrieved from the index without touching the heap table. Requires the index to cover all columns in the query and the visibility map to confirm all-visible pages.

See also: Covering Index, Visibility Map

Isolation Level

Transactions

The degree to which a transaction is protected from concurrent modifications by other transactions. PostgreSQL supports: READ COMMITTED (default), REPEATABLE READ, and SERIALIZABLE.

See also: MVCC, SSI

Managing cloud infrastructure through declarative configuration files rather than manual processes. Tools: Terraform, Pulumi, AWS CDK, CloudFormation. Enables version control, review, and reproducibility.

IOPS

Monitoring

Input/Output Operations Per Second — a measure of storage throughput. Read IOPS and write IOPS are typically reported separately. Database workload characteristics determine IOPS requirements.

io_uring

NVMe & Storage

A Linux kernel interface for asynchronous I/O. Uses submission and completion ring buffers shared between userspace and kernel. Provides significantly lower overhead than traditional read/write system calls.

J

JOIN

SQL

An operation combining rows from two or more tables based on a related column. Types: INNER JOIN (matching rows only), LEFT JOIN (all left + matching right), RIGHT JOIN, FULL JOIN (all from both), CROSS JOIN (Cartesian product).

K

k-Anonymity

Privacy & Anonymization

A privacy property where every record in a dataset is indistinguishable from at least k-1 other records on quasi-identifier values. Prevents singling out individuals through quasi-identifier combinations. Introduced by Latanya Sweeney in 2002.

See also: Equivalence Class, l-Diversity, Microaggregation

L

Lock

Transactions

A mechanism preventing concurrent access conflicts. PostgreSQL uses multiple lock types: row-level (shared/exclusive), table-level (eight modes from ACCESS SHARE to ACCESS EXCLUSIVE), advisory, and predicate locks.

See also: Deadlock, Advisory Lock

Logical Decoding

Replication & HA

PostgreSQL's mechanism for extracting row-level changes from WAL in a logical format. Foundation for logical replication, CDC, and tools like pgstream. Requires wal_level = logical.

See also: Logical Replication, CDC, Output Plugin

Logical Replication

Replication & HA

A replication method that captures and replays DML operations (INSERT, UPDATE, DELETE) at the row level. Supports selective replication of specific tables, cross-version replication, and conflict resolution. Uses publications and subscriptions.

See also: Physical Replication, Publication, Subscription

A 64-bit integer identifying a specific byte position in the WAL stream. Format: 16/B374D848. Used to track replication progress, recovery targets, and WAL consumption. Monotonically increasing.

See also: WAL, Replication Slot

Linkage Attack

Privacy & Anonymization

An attack that re-identifies individuals by joining anonymized data with external datasets on shared quasi-identifiers. Sweeney (2000) demonstrated that 87% of the US population is uniquely identifiable by {ZIP code, gender, date of birth}. Risk assessment simulates these attacks using Monte Carlo methods with synthetic adversary datasets.

See also: QID, Re-identification Risk, Risk Score

Linked Column Hierarchy

Privacy & AnonymizationXata

A configuration specifying that certain columns have a hierarchical relationship (e.g., city → state → ZIP code) and must be treated together during anonymization. Ensures that after microaggregation, geographic values remain internally consistent rather than producing impossible combinations.

See also: Geographic-Aware Clustering, Microaggregation

Latency Budget

Scaling & Performance

The maximum acceptable response time for a database query, typically defined as a percentile target (p50, p95, p99). OpenAI targets low double-digit millisecond p99 latency for PostgreSQL reads. Drives decisions about indexing, caching, replica placement, and connection pooling configuration.

llms.txt

AI & Agents

A convention for placing a structured text file at a website's root (similar to robots.txt) to provide LLMs and AI agents with site context, navigation, and key content links. Proposed by Jeremy Howard. Agents check this file to understand a site's purpose and discover relevant resources like documentation and glossaries.

See also: AGENTS.md

M

Materialized View

Core Database

A view whose results are physically stored on disk and must be explicitly refreshed. Useful for caching expensive query results. Unlike regular views, materialized views can be indexed.

See also: View

Memory Context

PostgreSQL

PostgreSQL's hierarchical memory management system. Allocations are made within named contexts (e.g., TopMemoryContext, MessageContext). Freeing a parent context automatically frees all child contexts and allocations.

PostgreSQL's concurrency model. Instead of locking rows during reads, PostgreSQL maintains multiple versions of each row. Readers see a consistent snapshot based on their transaction's start time. Writers create new row versions rather than overwriting in place.

See also: Snapshot, Transaction ID, Tuple

Migration

Schema Migrations

A versioned, ordered set of schema changes applied to a database. Migrations are typically stored as files (SQL or declarative definitions) and tracked in a metadata table.

Microaggregation

Privacy & Anonymization

An anonymization technique that groups similar records into clusters of at least k members and replaces quasi-identifier values with aggregated values (median, mode). Preserves data types and statistical properties better than suppression or generalization.

See also: k-Anonymity

Managed Database

Cloud & Deployment

A database service where the cloud provider handles provisioning, patching, backups, failover, and scaling. Examples: AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL.

Multi-Cloud

Cloud & Deployment

Running infrastructure across two or more cloud providers. Reduces vendor lock-in and enables best-of-breed service selection. Requires cloud-agnostic tooling and standard protocols (e.g., vanilla Postgres).

A JSON-RPC 2.0 protocol for AI agents to interact with external systems including databases. Defines tools (executable operations), resources (readable data), and prompts (templated instructions) as structured capabilities. Enables agents to discover and execute database operations programmatically without hardcoded integrations. Originated at Anthropic, adopted across the industry in 2025.

See also: Tool Use

N

Normalization

Core Database

The process of organizing tables to reduce data redundancy and improve integrity. Normal forms (1NF through 5NF) provide progressively stricter rules. Most production schemas target third normal form (3NF) as a practical balance.

Null

Core Database

A special marker indicating the absence of a value. Not equal to zero, empty string, or false. NULL comparisons require IS NULL / IS NOT NULL operators because NULL = NULL evaluates to NULL (unknown), not true.

NVMe

NVMe & Storage

Non-Volatile Memory Express — a host controller interface and storage protocol designed for flash storage over PCIe. Provides low-latency, high-throughput access with multi-queue architecture. Replaced SATA/SAS for high-performance storage.

An extension of the NVMe protocol over network fabrics (TCP, RDMA, Fibre Channel). Enables remote block storage access with near-local NVMe performance. Used for disaggregated storage architectures.

See also: NVMe, xatastor

O

A 32-bit unsigned integer used internally by PostgreSQL to identify catalog objects (tables, types, functions). Every database object has an OID. Exposed in system catalogs but typically not used in application code.

A design pattern where application writes are paired with event records in the same transaction. A CDC connector reads the outbox table and publishes events to a message broker, ensuring reliable delivery semantics.

A PostgreSQL logical decoding plugin that formats WAL changes into a specific format. Examples: pgoutput (built-in, used by logical replication), wal2json (JSON format), test_decoding (text format).

See also: Logical Decoding, wal2json

Operator

Kubernetes

A Kubernetes pattern where custom controllers manage the lifecycle of complex applications. Database operators (CloudNativePG, Zalando PostgreSQL Operator, CrunchyData PGO) automate PostgreSQL operations.

See also: CloudNativePG

OLAP

Analytics

Online Analytical Processing — query patterns characterized by complex aggregations over large datasets, typically read-heavy with few writes. Columnar storage and analytical engines are optimized for OLAP.

See also: OLTP, Columnar Storage

OLTP

Analytics

Online Transaction Processing — query patterns characterized by many short, simple transactions with reads and writes. Row-oriented storage and B-tree indexes are optimized for OLTP.

See also: OLAP

A library that maps database tables to programming language objects. Handles query generation, result hydration, and schema definition. Examples: SQLAlchemy, Django ORM, Prisma, TypeORM, Sequelize, Hibernate.

P

Partition

Core Database

A physical division of a large table into smaller, more manageable pieces based on a partition key. PostgreSQL supports range, list, and hash partitioning. Queries targeting a specific partition value can skip unrelated partitions (partition pruning).

See also: Partition Pruning

Primary Key

Core Database

A constraint guaranteeing that one or more columns uniquely identify each row and contain no NULLs. A table can have at most one primary key. Automatically creates a unique B-tree index.

See also: Constraint, Unique Constraint, B-tree

The PostgreSQL component that analyzes a parsed query and generates an execution plan. Evaluates multiple strategies (sequential scan, index scan, join methods) and selects the one with the lowest estimated cost based on table statistics.

See also: Query Plan, Statistics Collector

Postmaster

PostgreSQL

The main PostgreSQL server process. Listens for client connections and forks backend processes to handle them. Also manages background workers, checkpointing, WAL archiving, and replication processes.

The complete set of databases managed by a single PostgreSQL server instance (postmaster). Stored in a single data directory (PGDATA). Not to be confused with a multi-node cluster for high availability.

A pre-parsed, pre-planned SQL statement with parameter placeholders. Avoids repeated parsing/planning overhead for frequently executed queries. Created with PREPARE, executed with EXECUTE.

An index built on a subset of table rows, filtered by a WHERE predicate. Reduces index size and maintenance cost. Example: CREATE INDEX ON orders (created_at) WHERE status = 'pending'.

An optimization where the query planner eliminates partitions that cannot contain rows matching the query's WHERE clause. Reduces the amount of data scanned in partitioned tables.

See also: Partition

Physical Replication

Replication & HA

Replication by streaming raw WAL bytes from primary to standby. The standby maintains an exact byte-for-byte copy of the primary's data files. Cannot selectively replicate individual tables. Also called streaming replication.

See also: Logical Replication, Streaming Replication

Primary

Replication & HA

The database instance that accepts write operations. In a replication setup, the primary is the source of truth.

See also: Standby, Read Replica

Promotion

Replication & HA

The act of converting a standby/replica into a primary that accepts writes. Triggered during failover. The promoted instance stops replaying WAL and begins accepting write transactions.

See also: Failover

Publication

Replication & HA

A logical replication publisher definition. Specifies which tables' changes should be made available to subscribers. Created with CREATE PUBLICATION.

See also: Subscription, Logical Replication

Any data that can identify an individual, directly or indirectly. Direct PII: name, SSN, email. Indirect PII: age, ZIP code, gender (can identify when combined). Classification varies by regulation and jurisdiction.

See also: DID, QID

pg_dump / pg_restore

Backup & Recovery

Logical backup and restore utilities. pg_dump exports database objects and data as SQL or custom-format archives. pg_restore loads these archives. Supports selective schema/table backup and parallel restore.

Restoring a database to its exact state at a specific timestamp or LSN. Requires a base backup plus all WAL archives from the backup to the target recovery point.

See also: Base Backup, WAL Archiving

PgBouncer

Networking

A lightweight PostgreSQL connection pooler. Sits between applications and the database, multiplexing many client connections onto fewer server connections. Supports session, transaction, and statement pooling modes.

See also: Connection Pooling

pg_hba.conf

Security

PostgreSQL's host-based authentication configuration file. Each line specifies: connection type, database, user, client address, and authentication method. Evaluated top-to-bottom; first matching rule applies.

PostgreSQL DBaaS

Cloud & Deployment

Database-as-a-Service offerings for PostgreSQL. Major providers: AWS RDS/Aurora, Google Cloud SQL/AlloyDB, Azure Database for PostgreSQL, Neon, Supabase, Xata, Crunchy Bridge, Tembo, Aiven. Market consolidation accelerated in 2025 (Databricks acquired Neon, EDB acquired multiple PG companies).

See also: Managed Database

A PostgreSQL system view showing one row per server process with current activity: query text, state, wait events, transaction start time, and client information. Primary tool for monitoring active connections.

A PostgreSQL extension tracking planning and execution statistics for all SQL statements. Records call count, total/mean/min/max time, rows returned, and shared buffer hits. Essential for identifying slow queries.

Prometheus

Monitoring

An open-source monitoring system collecting time-series metrics via a pull model. Database exporters (postgres_exporter) expose PostgreSQL metrics. Paired with Grafana for visualization and alerting.

pg_cron

Extensions

A PostgreSQL extension providing cron-based job scheduling within the database. Runs SQL commands or functions on configurable schedules without external schedulers.

An open-source PostgreSQL schema migration tool enabling zero-downtime, reversible schema changes using the expand/contract pattern. Maintains multiple schema versions simultaneously through versioned views.

See also: Expand / Contract Pattern, Zero-Downtime Migration

An open-source PostgreSQL CDC and streaming replication tool. Captures row-level changes via logical decoding and streams them to targets: PostgreSQL, Kafka, Elasticsearch, webhooks. Supports data transformation and anonymization.

See also: CDC, Logical Decoding

pgvector

Extensions

A PostgreSQL extension adding vector similarity search. Stores embeddings as the vector type with support for L2 distance, inner product, and cosine distance. Uses IVFFlat and HNSW indexes for approximate nearest neighbor search.

pgzx

ExtensionsOpen Source

A framework for building PostgreSQL extensions in the Zig programming language. Provides safe wrappers for PostgreSQL internals: memory contexts, SPI, error handling, data type conversion, and GUC configuration.

PostGIS

Extensions

A PostgreSQL extension adding geographic object support. Provides spatial data types (point, line, polygon), spatial indexes (GiST R-tree), and thousands of spatial functions. The standard for geospatial databases.

Partition Tolerance

Distributed Systems

The ability of a distributed system to continue operating when network communication between nodes is interrupted. One of the three properties in the CAP theorem. All practical distributed databases must handle partitions.

See also: CAP Theorem, Consensus

Parquet

Analytics

An open-source columnar storage format for efficient analytical queries. Supports predicate pushdown, column pruning, and compression. Increasingly used as an interchange format between databases, data lakes, and ML pipelines. Accessible from PostgreSQL via pg_duckdb and foreign data wrappers.

See also: Columnar Storage, OLAP

psql

Developer Tools

PostgreSQL's interactive terminal. Provides a command-line interface for executing SQL, managing databases, and inspecting server state. Supports tab completion, command history, and formatting options.

Q

Query Plan

Indexing

The execution strategy chosen by the optimizer for a SQL statement. Represented as a tree of plan nodes (Seq Scan, Index Scan, Nested Loop, Hash Join, Sort, Aggregate, etc.). Viewable with EXPLAIN.

See also: EXPLAIN / EXPLAIN ANALYZE, Planner / Optimizer

QID (Quasi-Identifier)

Privacy & Anonymization

A field that is not directly identifying alone but can identify individuals when combined with other quasi-identifiers. Classic example: 87% of the US population can be uniquely identified by {ZIP code, gender, date of birth} (Sweeney, 2000).

See also: DID, k-Anonymity

Queue Depth

NVMe & Storage

The number of I/O commands that can be outstanding (submitted but not completed) simultaneously. Higher queue depth enables the storage device to optimize command scheduling. NVMe supports up to 65,535 commands per queue.

QPS (Queries Per Second)

Scaling & Performance

Throughput metric for database workloads. Varies by query complexity: simple key lookups may achieve millions of QPS on a single primary (OpenAI), while analytical queries may sustain only tens per second. Agent workloads tend toward high QPS of simple queries.

See also: Latency Budget

Query Guardrails

AI & Agents

Constraints applied to agent-generated SQL to prevent destructive or expensive operations. Includes: read-only transaction mode, row count limits, query timeout enforcement, DDL restrictions, and cost-based query rejection. Critical for safe agentic database access where agents may generate unbounded queries.

See also: Agent Isolation, Rate Limiting

R

Relation

Core Database

In relational theory, any named table-like object. In PostgreSQL, relations include tables, views, materialized views, indexes, sequences, and composite types. Metadata stored in pg_class.

Row

Core Database

A single record in a table containing one value per column. Also called a "tuple" in relational theory and PostgreSQL internals.

See also: Tuple

Read Replica

Replication & HA

A database instance that replicates data from a primary and serves read-only queries. Used for read scaling, analytics, and geographic distribution.

See also: Primary, Hot Standby

Replica Identity

Replication & HA

Configuration determining which column values are included in WAL for UPDATE and DELETE during logical replication. Options: DEFAULT (primary key only), FULL (all columns), INDEX (specified unique index), NOTHING.

See also: Logical Replication

Replication Lag

Replication & HA

The delay between a change committed on the primary and its application on a replica. Measured in bytes (WAL position difference) or time. High replication lag indicates the replica is falling behind.

Replication Slot

Replication & HA

A server-side object tracking a consumer's position in the WAL stream. Prevents the server from discarding WAL segments that haven't been consumed. Physical slots track LSN; logical slots also track decoding state.

See also: LSN, WAL

Read Replica Scaling

Replication & HA

Adding read-only replicas to distribute query load across multiple PostgreSQL instances. Each replica receives changes via streaming replication. Connection routing directs reads to replicas and writes to the primary. OpenAI scales PostgreSQL to ~50 read replicas per primary, handling millions of QPS.

See also: Read Replica, Streaming Replication, Connection Pooling

Re-identification Risk

Privacy & Anonymization

The probability that an individual can be identified in an anonymized dataset. Assessed through simulated linkage attacks, k-anonymity metrics, or probabilistic models.

See also: Risk Score, k-Anonymity, Linkage Attack

Risk Score (Phisher K-Threshold)

Privacy & AnonymizationXata

A quantified upper bound on re-identification probability computed by simulating linkage attacks. Privacy Dynamics generates 50 synthetic adversary datasets via Monte Carlo simulation and reports the worst-case re-identification probability. Computed before and after treatment to measure anonymization effectiveness.

See also: Re-identification Risk, Linkage Attack

The maximum acceptable amount of data loss measured in time. An RPO of zero requires synchronous replication. An RPO of 1 hour means up to 1 hour of data loss is acceptable.

The maximum acceptable downtime during a failure recovery. Determines the required speed of failover, restore, and catchup processes.

An authorization model where permissions are assigned to roles, and users are granted roles. PostgreSQL implements RBAC natively: roles can own objects, receive GRANT privileges, and be members of other roles.

Role

Security

A PostgreSQL principal that can own database objects and receive privileges. Roles with LOGIN privilege can connect to the database. Roles can be granted membership in other roles for privilege inheritance.

A PostgreSQL feature enabling per-row access control. Policies define which rows a user can see (SELECT) or modify (INSERT, UPDATE, DELETE). Enabled with ALTER TABLE ... ENABLE ROW LEVEL SECURITY.

Raft

Distributed Systems

A consensus algorithm designed for understandability. Used in etcd, CockroachDB, and TiKV. Elects a leader that manages log replication. Provides strong consistency guarantees.

See also: Consensus

Rate Limiting

Scaling & Performance

Restricting the number of queries or connections per unit time. Applied at application, proxy (PgBouncer), and database levels. Prevents individual clients from monopolizing resources. Especially important when AI agents generate high query volumes without built-in backpressure.

See also: Connection Saturation, PgBouncer

S

Schema

Core Database

A named namespace within a database that contains tables, views, functions, types, and other objects. PostgreSQL databases have a public schema by default. Used to organize objects and control access.

Sequence

Core Database

A database object generating unique, incrementing numeric values. Commonly used for auto-incrementing primary keys. PostgreSQL sequences support BIGINT values, custom start/increment, and cycling.

Stored Procedure

Core Database

A named block of procedural code stored in the database and executed via CALL. Unlike functions, stored procedures can manage transactions (commit/rollback within the procedure body). Introduced in PostgreSQL 11.

A C API allowing PostgreSQL extensions and stored procedures to execute SQL statements from within server-side code. Provides functions like SPI_execute(), SPI_prepare(), and result set iteration.

Shared Buffers

PostgreSQL

A region of shared memory used to cache frequently accessed data pages. All backend processes share this cache. Controlled by the shared_buffers parameter (typically 25% of system RAM). Reduces disk I/O by keeping hot data in memory.

See also: Cache Hit Ratio

System Columns

PostgreSQL

Hidden columns present in every table: tableoid (OID of the containing table), xmin (inserting transaction ID), xmax (deleting transaction ID), cmin/cmax (command IDs within transaction), ctid (physical row location).

See also: CTID, Transaction ID

Skip Scan

PostgreSQL

A B-tree index optimization added in PostgreSQL 18 that efficiently handles queries where the leading index column has low cardinality. Instead of scanning the entire index, the executor skips to the next distinct value of the leading column. Eliminates the need for certain multi-column indexes.

See also: B-tree, Index

A query nested inside another query. Can appear in FROM (derived table), WHERE (scalar or correlated subquery), or SELECT (scalar subquery). Correlated subqueries reference the outer query and execute once per outer row.

Reading every row in a table from disk in physical storage order. Optimal when selecting a large fraction of the table or when no suitable index exists.

SpGiST

Indexing

Space-Partitioned GiST — an index type for data that can be partitioned into non-overlapping regions: points, IP addresses, phone numbers, text (radix tree). More efficient than GiST for certain data distributions.

See also: GiST

PostgreSQL subsystem that gathers information about table contents (row counts, most common values, histograms, correlation). The planner uses these statistics for cost estimation. Updated by ANALYZE.

Savepoint

Transactions

A named point within a transaction that can be rolled back to without aborting the entire transaction. Created with SAVEPOINT, rolled back with ROLLBACK TO SAVEPOINT.

Snapshot

Transactions

A consistent view of the database at a specific point in time. Each transaction operates on its own snapshot, seeing only data committed before the snapshot was taken. Foundation of MVCC.

See also: MVCC, Transaction

PostgreSQL's implementation of SERIALIZABLE isolation. Tracks read/write dependencies between concurrent transactions and aborts transactions that would produce results inconsistent with some serial execution order.

See also: Isolation Level

Standby

Replication & HA

A replica server that continuously applies changes from the primary. Can be a hot standby (accepting reads) or warm standby (no queries). Standby servers are candidates for failover promotion.

See also: Hot Standby, Failover

Streaming Replication

Replication & HA

Continuous replication of WAL records from primary to standby over a TCP connection. Changes are sent as they occur, providing near-real-time replication. The default replication method for PostgreSQL HA.

See also: Physical Replication

Subscription

Replication & HA

A logical replication subscriber definition. Connects to a publication on a remote server and applies replicated changes locally. Created with CREATE SUBSCRIPTION.

See also: Publication

Synchronous Replication

Replication & HA

Replication where the primary waits for one or more replicas to confirm that changes have been written before committing the transaction. Guarantees zero data loss at the cost of increased commit latency.

See also: Asynchronous Replication

Scale to Zero

Branching & CoWXata

The ability for idle database branches to automatically release compute resources (CPU, memory) while preserving data on disk. Branches resume when a connection is established. Eliminates the cost of idle environments.

See also: Hibernation

Schema Diff

Schema MigrationsXata

A comparison between two schema versions showing added, removed, and modified objects. Useful for reviewing branch changes before merging.

Synthetic Data

Privacy & Anonymization

Artificially generated data that mimics the statistical properties of real data without containing actual records. Used for DID faking (format-preserving synthetic values via Mimesis) and for generating adversary datasets in risk scoring simulations. Distinct from anonymized data, which starts from real records.

See also: Faking, Risk Score (Phisher K-Threshold)

The recommended password authentication mechanism in PostgreSQL 10+. Uses a challenge-response protocol with salted hashing, preventing password interception and replay attacks.

Serverless

Cloud & Deployment

A compute model where resources are allocated on-demand per request or query, with automatic scaling to zero during inactivity. Eliminates idle resource costs.

StatefulSet

Kubernetes

A Kubernetes workload type for applications requiring stable identities, ordered deployment, and persistent storage. Used for databases where each instance needs a consistent hostname and dedicated storage.

Sharding

Distributed Systems

Distributing a large dataset across multiple database instances (shards), each holding a subset of the data. Partition key determines which shard holds each row. Enables horizontal scaling beyond single-node limits.

See also: Citus, Distributed PostgreSQL

Staging Environment

Developer Tools

A pre-production environment that mirrors production for testing. Traditionally requires full database copies. Database branching provides staging environments as lightweight, instant branches.

See also: Ephemeral Environment, Branch

SQL Gateway

Xata PlatformXata

The service providing PostgreSQL wire protocol access to Xata branches. Handles SSL termination, authentication, and routing to the correct PostgreSQL cluster based on the branch ID encoded in the connection hostname.

Serverless Proxy

Xata PlatformXata

An HTTP/WebSocket interface for connecting to Xata databases from serverless functions and edge environments where persistent TCP connections are impractical. Compatible with the Neon serverless driver.

T

Table

Core Database

The fundamental data storage structure in a relational database. Organized as rows and columns with a defined schema. Tables can have constraints, indexes, triggers, and policies.

Tablespace

Core Database

A named location on disk where PostgreSQL stores data files. Allows distributing tables and indexes across different physical storage devices for performance or capacity management.

Trigger

Core Database

A function automatically executed in response to INSERT, UPDATE, DELETE, or TRUNCATE events on a table. Can fire BEFORE, AFTER, or INSTEAD OF the triggering event. Used for auditing, validation, and data synchronization.

Tuple

Core Database

PostgreSQL's internal term for a row. A heap tuple is the physical on-disk representation of a row, including system columns (xmin, xmax, ctid) used for MVCC.

See also: Row, MVCC

TOAST

PostgreSQL

The Oversized Attribute Storage Technique — PostgreSQL's mechanism for storing large column values (over ~2KB). Values are compressed and/or split across a separate TOAST table. Transparent to applications.

See also: Compression

Transaction

Transactions

An atomic unit of work that either fully completes (COMMIT) or fully reverts (ROLLBACK). Transactions maintain ACID properties and provide isolation from concurrent operations.

See also: ACID, Isolation Level

A 32-bit identifier assigned to each write transaction. Used by MVCC to determine row visibility. Stored in the xmin and xmax system columns of each tuple. Subject to wraparound, managed by VACUUM.

See also: MVCC, Vacuum

A protocol for coordinating transaction commits across multiple database instances. Phase 1: all participants prepare (PREPARE TRANSACTION). Phase 2: coordinator issues COMMIT PREPARED or ROLLBACK PREPARED to all participants.

Thin Provisioning

Branching & CoW

A storage allocation strategy where capacity is assigned on demand rather than pre-allocated. A thin-provisioned volume appears to have a large size but consumes only the space actually written.

Treatment Pipeline

Privacy & AnonymizationXata

The multi-stage anonymization process: (1) entity recognition to classify columns, (2) risk assessment via simulated linkage attacks, (3) DID suppression (redact, mask, fake, or hash), (4) k-member microaggregation for QIDs, (5) distortion analysis measuring information loss. Each stage is configurable per column and per dataset.

See also: Entity Recognition (PII Detection), Microaggregation, Distortion

Thundering Herd

Networking

A failure pattern where many clients simultaneously retry or reconnect after a service interruption, overwhelming the recovering system. Mitigated by exponential backoff with jitter, connection pooling, and cache locking. Critical at scale — OpenAI prevents this across 50+ PostgreSQL read replicas.

See also: Cache Locking, Connection Pooling

A measure of database throughput for write-heavy workloads. Commonly benchmarked with pgbench. Affected by transaction isolation level, WAL configuration, and storage performance.

TimescaleDB

Extensions

A PostgreSQL extension for time-series data. Provides automatic partitioning (hypertables), continuous aggregates, compression, and retention policies optimized for time-series workloads.

Text-to-SQL

AI & Agents

Translating natural language queries into valid SQL statements. Accuracy depends on schema awareness, join path inference, dialect-specific syntax, and column naming conventions. Modern approaches use LLMs with schema context rather than rule-based parsers. A core capability for database-connected AI agents.

The mechanism by which an LLM invokes external functions during response generation. In database contexts, tools typically wrap SQL execution, schema inspection, migration commands, branch management, and data export operations. The bridge between agent reasoning and database actions.

See also: MCP

U

Unique Constraint

Core Database

A constraint ensuring that all values in the specified column(s) are distinct across all rows. Allows one NULL value per column. Automatically creates a unique index.

See also: Constraint, Primary Key

V

View

Core Database

A named query stored in the database and referenced like a table. Views do not store data; they execute their underlying query each time they are accessed. Can be used for access control, simplification, and abstraction.

See also: Materialized View

Vacuum

PostgreSQL

A maintenance operation that reclaims storage from dead tuples, updates visibility maps, and refreshes planner statistics. VACUUM runs without exclusive locks; VACUUM FULL rewrites the entire table and requires an exclusive lock.

See also: Dead Tuple, Visibility Map, Bloat

Visibility Map

PostgreSQL

A bitmap tracking which heap pages contain only tuples visible to all current transactions. Allows index-only scans to skip heap fetches for pages marked all-visible. Updated by VACUUM.

See also: Vacuum, Index-Only Scan

Version Schema

Schema MigrationsOpen Source

A PostgreSQL schema created during a migration containing views that expose the database tables with the appropriate column names and types for that schema version. Allows multiple application versions to coexist.

See also: pgroll, Expand / Contract Pattern

Vertical Scaling

Cloud & Deployment

Increasing database capacity by upgrading to a larger instance with more CPU, RAM, or faster storage. PostgreSQL scales vertically to large instance sizes — OpenAI runs single PostgreSQL primaries handling millions of queries per second. Simple but has upper limits.

See also: Horizontal Scaling

W

work_mem

PostgreSQL

Per-operation memory limit for internal sort operations, hash tables, and bitmap heap scans. Set per session/query. Too low causes disk spills to temporary files; too high risks memory exhaustion under concurrency.

A function that performs a calculation across a set of rows related to the current row without collapsing them into a single output row. Examples: ROW_NUMBER(), RANK(), LAG(), LEAD(), SUM() OVER(). Defined with the OVER() clause.

See also: Aggregate Function

The mechanism ensuring data durability by writing all changes to a sequential log before applying them to data files. On crash recovery, PostgreSQL replays the WAL from the last checkpoint to restore consistency.

See also: LSN, Checkpoint, WAL Level

Configuration controlling how much information is written to WAL. Levels: minimal (crash recovery only), replica (physical replication, default), logical (logical decoding and replication).

See also: Logical Decoding

A fixed-size file (default 16MB) in the pg_wal directory containing sequential WAL records. Segments are recycled after checkpoint and archiving. Segment size is configurable at cluster initialization.

A PostgreSQL output plugin for logical decoding that formats WAL changes as JSON objects. Each change includes the table name, operation type, column names, types, and values.

See also: Output Plugin, CDC

The ratio of actual physical writes to logical writes requested by the application. Caused by journaling, MVCC versioning, WAL, index maintenance, and CoW operations.

Wire Protocol

Networking

The message-based protocol PostgreSQL uses for client-server communication over TCP. Defines message formats for authentication, query execution, result transmission, and error reporting. Increasingly adopted as a compatibility target by non-PostgreSQL databases (CockroachDB, YugabyteDB, Aurora, AlloyDB, QuestDB, ClickHouse).

See also: Wire Protocol Compatibility

Building non-PostgreSQL databases that accept PostgreSQL client connections by implementing the PostgreSQL wire protocol. Allows reuse of existing drivers, ORMs, and tooling. A major industry trend — over 20 databases now advertise PostgreSQL wire protocol support.

See also: Wire Protocol

X

An AI-powered PostgreSQL monitoring and optimization tool. Analyzes query performance, suggests index improvements, and identifies configuration opportunities.

Xata CLI

Xata PlatformXata

Command-line tool for managing Xata projects, branches, migrations, and replication. Commands include xata init, xata branch create, xata clone, xata stream, and xata roll.

The open-source core of the Xata platform implementing copy-on-write database branching on standard PostgreSQL. Enables instant branch creation, delta-only storage, and scale-to-zero compute.

See also: Branch, Copy-on-Write

xatastor

Xata PlatformXata

The NVMe-oF storage target providing high-performance block storage with ZFS-based copy-on-write. Exposes volumes via NVMe over TCP with a gRPC management API. The storage backbone enabling instant branching.

See also: NVMe-oF, ZFS, Copy-on-Write

Z

Zero-Downtime Migration

Schema Migrations

A schema migration that completes without application downtime or degraded performance. Achieved through techniques like expand/contract, column duplication, background backfilling, and versioned views.

See also: Expand / Contract Pattern, pgroll

ZFS

Storage

A combined filesystem and volume manager featuring copy-on-write, snapshots, clones, compression, checksumming, and RAID-Z. Used as a storage backend for database branching systems.

See also: Copy-on-Write, zvol

zvol

Storage

A ZFS volume — a dataset that exposes a block device interface rather than a filesystem. Used to provide raw block storage to databases or NVMe targets. Supports thin provisioning, snapshots, and clones.

See also: ZFS