Supported transformers
PostgreSQL Anonymizer
pg_anonymizer
Description: Integrates with the PostgreSQL Anonymizer extension to provide advanced data anonymization using built-in anonymizer functions. ⚠️ This transformer requires a PostgreSQL database connection to execute transformations, which may impact performance compared to other transformers in this document that generate values locally without database queries.| Supported PostgreSQL types |
|---|
| Dependent on anonymizer function |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| anon_function | string | N/A | Yes | Any valid anon.* function |
| postgres_url | string | N/A | Yes | PostgreSQL connection URL |
| salt | string | "" | No | Salt for deterministic functions |
| hash_algorithm | string | sha256 | No | Algorithm for anon.digest. One of md5, sha224, sha256, sha384, sha512 |
| interval | string | N/A | No | Time interval for anon.dnoise function |
| ratio | float | N/A | No | Noise ratio for anon.noise function |
| sigma | float | N/A | No | Blur sigma for anon.image_blur function |
| mask | string | N/A | No | Mask character for anon.partial function |
| mask_prefix_count | int | 0 | No | Prefix count for anon.partial function |
| mask_suffix_count | int | 0 | No | Suffix count for anon.partial function |
| min | string | N/A | No | Minimum value for anon.random_*_between functions |
| max | string | N/A | No | Maximum value for anon.random_*_between functions |
| range | string | "" | No | Range for anon.random_in_* functions |
| locale | string | en_US | No | Locale for dummy supported functions. One of ar_SA, en_US, fr_FR, ja_JP, pt_BR, zh_CN, zh_TW |
| count | int | 0 | No | Count parameter for functions like anon.random_string and anon.lorem_ipsum |
| unit | string | paragraphs | No | Unit for anon.lorem_ipsum function. One of characters, words, paragraphs |
| prefix | string | "" | No | Prefix for anon.random_phone function |
- The transformer executes functions directly in PostgreSQL, ensuring compatibility with all anonymizer features
- Deterministic functions (pseudo_*, hash, digest) produce consistent output for the same input
- Functions that don’t require parameters (like anon.fake_*()) can be used without additional configuration
- PostgreSQL Anonymizer extension must be installed and enabled on the source (or the configured url)
- Extension must be loaded in
shared_preload_libraries - Run
SELECT anon.init();in order to use the faking functions
- Adding noise
- Randomization
- Faking
- Advanced faking
- Pseudoanonymization
- Generic hashing
- Partial scrambling
- Image blurring
| Input Value | Function Configuration | Output Value |
|---|---|---|
John | anon_function: anon.fake_first_name() | Michael (random) |
john@test.com | anon_function: anon.pseudo_email, salt: "key123" | alice@test.com (deterministic) |
1234567890 | anon_function: anon.partial, mask: "*", mask_prefix_count: 3, mask_suffix_count: 3 | 123****890 |
sensitive_data | anon_function: anon.digest, salt: "key", hash_algorithm: "sha256" | a1b2c3d4e5f6... (hash) |
100.50 | anon_function: anon.noise, ratio: 0.1 | 95.23 (with 10% noise) |
2023-01-15 | anon_function: anon.dnoise, interval: "1 day" | 2023-01-16 (±1 day noise) |
password123 | anon_function: anon.hash | ef92b778bafe771e89245b89ecbc08a4... |
Alice Smith | anon_function: anon.pseudo_first_name, salt: "s1" | Bob Smith (deterministic) |
user@company.com | anon_function: anon.partial_email | u***@company.com |
42 | anon_function: anon.random_int_between(1, 100) | 73 (random between 1-100) |
/path/image.jpg | anon_function: anon.image_blur, sigma: 2.5 | Blurred image data |
| Any value | anon_function: anon.fake_company() | Acme Corporation (random) |
25 | anon_function: anon.random_int_between, min: "18", max: "65" | 42 (random between 18-65) |
| Any value | anon_function: anon.random_in, range: "ARRAY['A', 'B', 'C']" | B (random from array) |
| Any value | anon_function: anon.lorem_ipsum, unit: "words", count: 5 | Lorem ipsum dolor sit amet |
| Any value | anon_function: anon.random_string, count: 8 | aB3xY9z1 (random string) |
| Any value | anon_function: anon.random_phone, prefix: "+1-555-" | +1-555-123-4567 |
John | anon_function: anon.fake_first_name_locale, locale: "fr_FR" | Pierre (French name) |
Greenmask
greenmask_boolean
Description: Generates random or deterministic boolean values (true or false).
| Supported PostgreSQL types |
|---|
boolean |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| Input Value | Configuration Parameters | Output Value |
|---|---|---|
true | generator: deterministic | false |
false | generator: deterministic | true |
true | generator: random | true or false (random) |
greenmask_choice
Description: Randomly selects a value from a predefined list of choices.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| choices | string[] | N/A | Yes | N/A |
| Input Value | Configuration Parameters | Output Value |
|---|---|---|
pending | generator: random | shipped (random) |
shipped | generator: deterministic | pending |
delivered | generator: random | cancelled (random) |
greenmask_date
Description: Generates random or deterministic dates within a specified range.| Supported PostgreSQL types |
|---|
date, timestamp, timestamptz |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| min_value | string (yyyy-MM-dd) | N/A | Yes | N/A |
| max_value | string (yyyy-MM-dd) | N/A | Yes | N/A |
| Input Value | Configuration Parameters | Output Value |
|---|---|---|
2023-01-01 | generator: random, min_value: 2020-01-01, max_value: 2025-12-31 | 2021-05-15 (random) |
2022-06-15 | generator: deterministic | 2020-01-01 |
greenmask_firstname
Description: Generates random or deterministic first names, optionally filtered by gender.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values | Dynamic |
|---|---|---|---|---|---|
| generator | string | random | No | random,deterministic | No |
| gender | string | Any | No | Any,Female,Male | Yes |
gender can also be a dynamic parameter, referring to some other column. Please see the below example config.
Example Configuration:
| Input Name | Configuration Parameters | Output Name |
|---|---|---|
John | preserve_gender: true | Michael |
Jane | preserve_gender: true | Emily |
Alex | preserve_gender: false | Jordan |
Chris | generator: random | Taylor |
greenmask_float
Description: Generates random or deterministic floating-point numbers within a specified range.| Supported PostgreSQL types |
|---|
real, double precision |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| min_value | float | -3.40282346638528859811704183484516925440e+38 | No | N/A |
| max_value | float | 3.40282346638528859811704183484516925440e+38 | No | N/A |
greenmask_integer
Description: Generates random or deterministic integers within a specified range.| Supported PostgreSQL types |
|---|
smallint,integer, bigint |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| size | int | 4 | No | 2,4 |
| min_value | int | -2147483648 | No | N/A |
| max_value | int | 2147483647 | No | N/A |
greenmask_string
Description: Generates random or deterministic strings with customizable length and character set.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| symbols | string | abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 | No | N/A |
| min_length | int | 1 | No | N/A |
| max_length | int | 100 | No | N/A |
greenmask_unix_timestamp
Description: Generates random or deterministic unix timestamps.| Supported PostgreSQL types |
|---|
smallint,integer,bigint |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| min_value | string | N/A | Yes | N/A |
| max_value | string | N/A | Yes | N/A |
greenmask_utc_timestamp
Description: Generates random or deterministic UTC timestamps.| Supported PostgreSQL types |
|---|
timestamp |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
| truncate_part | string | "" | No | nanosecond,microsecond,millisecond,second,minute,hour,day,month,year |
| min_timestamp | string (RFC3339) | N/A | Yes | N/A |
| max_timestamp | string (RFC3339) | N/A | Yes | N/A |
greenmask_uuid
Description: Generates random or deterministic UUIDs.| Supported PostgreSQL types |
|---|
uuid,text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| generator | string | random | No | random,deterministic |
Neosync
neosync_email
Description: Anonymizes email addresses while optionally preserving length and domain.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar, citext |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| preserve_length | bool | false | No | |
| preserve_domain | bool | false | No | |
| excluded_domains | string[] | N/A | No | |
| max_length | int | 100 | No | |
| email_type | string | uuidv4 | No | uuidv4,fullname,any |
| invalid_email_action | string | 100 | No | reject,passthrough,null,generate |
| seed | int | Rand | No |
| Input Email | Configuration Parameters | Output Email |
|---|---|---|
john.doe@example.com | preserve_length: true, preserve_domain: true | abcd.efg@example.com |
jane.doe@company.org | preserve_length: false, preserve_domain: true | random@company.org |
user123@gmail.com | preserve_length: true, preserve_domain: false | abcde123@random.com |
invalid-email | invalid_email_action: passthrough | invalid-email |
invalid-email | invalid_email_action: null | NULL |
invalid-email | invalid_email_action: generate | generated@random.com |
neosync_firstname
Description: Generates anonymized first names while optionally preserving length.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required |
|---|---|---|---|
| preserve_length | bool | false | No |
| max_length | int | 100 | No |
| seed | int | Rand | No |
neosync_lastname
Description: Generates anonymized last names while optionally preserving length.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required |
|---|---|---|---|
| preserve_length | bool | false | No |
| max_length | int | 100 | No |
| seed | int | Rand | No |
neosync_fullname
Description: Generates anonymized full names while optionally preserving length.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required |
|---|---|---|---|
| preserve_length | bool | false | No |
| max_length | int | 100 | No |
| seed | int | Rand | No |
neosync_string
Description: Generates anonymized strings with customizable length.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required |
|---|---|---|---|
| preserve_length | bool | false | No |
| min_length | int | 1 | No |
| max_length | int | 100 | No |
| seed | int | Rand | No |
Xata
template
Description: Transforms the data using go templates| Supported PostgreSQL types |
|---|
| All types with a string representation |
| Parameter | Type | Default | Required |
|---|---|---|---|
| template | string | N/A | Yes |
.GetValue to refer to the value to be transformed. Use .GetDynamicValue "<column_name>" to refer to some other column. Other than the standard go template functions, there are many useful helper functions supported to be used with template transformer, thanks to greenmask’s huge set of core functions including masking function by go-masker and various random data generator functions powered by the open source library faker. Also, template transformer has support for the open source library sprig which has many useful helper functions.
With the below example config pgstream masks values in the column email of the table users, using go-masker’s email masking function. But first, this template checks if there’s a non-empty value to be used in the column email. If not, it simply looks for another column named secondary_email and uses that instead. Then we have another check to see if it’s a @xata email or not. Finally masking the value, only if it’s not a @xata email, passing it without a mask otherwise.
Example Configuration:
masking
Description: Masks string values using the provided masking function.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| type | string | default | No | custom, password, name, address, email, mobile, tel, id, credit_card, url, default |
| Input Value | Configuration Parameters | Output Value |
|---|---|---|
aVeryStrongPassword123 | type: password | ************ |
john.doe@example.com | type: email | joh****e@example.com |
Sensitive Data | type: default | ************** |
custom type, the masking function is defined by the user, by providing beginning and end indexes for masking. If the input is shorter than the end index, the rest of the string will all be masked. See the third example below.
| Input Value | Output Value |
|---|---|
1234567812345678 | 1234********5678 |
sensitive@example.com | sens********ample.com |
sensitive | sens***** |
| Input Value | Output Value |
|---|---|
1234567812345678 | *****67812345678 |
sensitive@example.com | *****tive@example.com |
sensitive | *****tive |
| Input Value | Output Value |
|---|---|
1234567812345678 | 12***********678 |
sensitive@example.com | sen***************com |
sensitive | s******ve |
| Input Value | Output Value |
|---|---|
1234567812345678 | 123************* |
sensitive@example.com | sen****************** |
sensitive | sen****** |
json
Description: Transforms json data with set and delete operations| Supported PostgreSQL types |
|---|
| json, jsonb |
| Parameter | Type | Default | Required |
|---|---|---|---|
| operations | array | N/A | Yes |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| operation | string | N/A | Yes | set, delete |
| path | string | N/A | Yes | sjson syntax* |
| skip_not_exist | boolean | true | No | true, false |
| error_not_exist | boolean | false | No | true, false |
| value | string | N/A | Yes** | Any valid JSON representation |
| value_template | string | N/A | Yes** | Any template with valid syntax |
value or value_template must be provided if the operation is set. If both are provided, value_template takes precedence.
JSON transformer can be used for Postgres types json and jsonb. This transformer executes a list of given operations on the json data to be transformed.
All operations must be either set or delete.
set operations support literal values as well as templates, making use of sprig and greenmask’s function sets. See template transformer section for more details. Also, like the template transformer, .GetValue and .GetDynamicValue functions are supported. Unlike template transformer, here .GetValue refers to the value at given path, rather than the entire JSON object; whereas .GetDynamicValue is again used for referring to other columns.
delete operations simply delete the object at the given path.
Execution of an operation will be skipped if the given path does not exists and the parameter skip_not_exist is set to true, which is also the default behavior.
Execution of an operation errors out if the given path does not exists and the parameter error_not_exist - which is false by default - is set to true; unless the operation is skipped already.
JSON transformer uses sjson library for executing the operations. Operation paths should follow the synxtax rules of sjson
With the below config pgstream transforms the json values in the column user_info_json of the table users by:
- First, traversing all the items in the array named
purchases, and for each element, setting value to ”-” for key “item”. - Then, deleting the object named “country” under the top-level object “address”.
- Completely masking the “city” value under object “address”, using
go-masker’s default masking function supported bypgstream’s templating. - Finally, setting the user’s lastname after fetching it from some other column named
lastname, using dynamic values support. Assuming there’s such column, having the lastname info for users.
hstore
Description: Transforms hstore data with set and delete operations| Supported PostgreSQL types |
|---|
| hstore |
| Parameter | Type | Default | Required |
|---|---|---|---|
| operations | array | N/A | Yes |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| operation | string | N/A | Yes | set, delete |
| key | string | N/A | Yes | |
| skip_not_exist | boolean | true | No | true, false |
| error_not_exist | boolean | false | No | true, false |
| value | string, null | N/A | Yes* | Any string or null |
| value_template | string | N/A | Yes* | Any template with valid syntax |
value or value_template must be provided if the operation is set. If both are provided, value_template takes precedence.
Hstore transformer can be used for Postgres type hstore. This transformer executes a list of given operations on the hstore data to be transformed.
All operations must be either set or delete.
set operations support literal values as well as templates, making use of sprig and greenmask’s function sets. See template transformer section for more details. Also, like the template transformer, .GetValue and .GetDynamicValue functions are supported. Unlike template transformer, here .GetValue refers to the value for the given key, rather than the entire Hstore object; whereas .GetDynamicValue is again used for referring to other columns.
delete operations simply delete the pair with the given key.
Execution of an operation will be skipped if the given key does not exists and the parameter skip_not_exist is set to true, which is also the default behavior.
Execution of an operation errors out if the given key does not exists and the parameter error_not_exist - which is false by default - is set to true; unless the operation is skipped already.
A limitation to be aware of: When using hstore transformer templates, you cannot set a value to the string literal “<no value>”. This is because Go templates produce “<no value>” as output when the result is nil, creating an ambiguity. In such cases, pgstream will interpret it as nil and set the hstore value to NULL rather than storing the actual string “<no value>”.
With the below config pgstream transforms the hstore values in the column attributes of the table users by:
- First, updating the value for key “email” to the masked version of it, using email masking function. If the key “email” is not found, it simply ignores it, since
error_not_existis not set to true explicitly and it is false by default. - Then, deleting the pair where the key is “public_key”. If there’s no such key, errors out, because the parameter “error_not_exist” is set to true.
- Completely masking the value for key “private_key”, using
go-masker’s default masking function supported bypgstream’s templating. - Finally, updating the value for key “newKey” to “newValue”. Since “error_not_exist” is false by default, and there is no such key in the example below, this operation will be done by adding a new key-value pair.
literal_string
Description: Transforms all values into the given constant value.| Supported PostgreSQL types |
|---|
| All types with a string representation |
| Parameter | Type | Default | Required |
|---|---|---|---|
| literal | string | N/A | Yes |
log_message to become {'error': null}.
This transformer can be used for any Postgres type as long as the given string literal has the correct syntax for that type. e.g It can be “5-10-2021” for a date column, or “3.14159265” for a double precision one.
Example Configuration:
phone_number
Description: Generates anonymized phone numbers with customizable length.| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar |
| Parameter | Type | Default | Required | Values | Dynamic |
|---|---|---|---|---|---|
| prefix | string | "" | No | N/A | Yes |
| min_length | int | 6 | No | N/A | No |
| max_length | int | 10 | No | N/A | No |
| generator | string | random | No | random, deterministic | No |
prefix can also be a dynamic parameter, referring to some other column. Please see the below example config.
Example Configuration:
| Supported PostgreSQL types |
|---|
text, varchar, char, bpchar, citext |
| Parameter | Type | Default | Required | Values |
|---|---|---|---|---|
| replacement_domain | string | ”@example.com” | No | |
| exclude_domain | string | "" | No | |
| salt | string | ”defaultsalt” | No |
| Input Email | Configuration Parameters | Output Email |
|---|---|---|
john.doe@company.org | exclude_domain: "company.org", salt: "helloworld" | john.doe@company.org |
jane.doe@company.org | exclude_domain: "exclude.com", salt: "helloworld" | T79P9zlFWzmT0yCUDMEE7S@example.com |
jane.doe@company.org | exclude_domain: "exclude.com", salt: "helloworld", replacement_domain: "@random.com" | 6EIWw5lEa8nsY9JDOm5@random.com |
invalid-email | exclude_domain: "exclude.com", salt: "helloworld" | 1fk5VLgTeoRQCCvqXFoToC1@example.com |
invalid-email | exclude_domain: "exclude.com", salt: "helloworld", replacement_domain: "@random.com" | 6EIWw5lEa8nsY9JDOm5@random.com |
Transformation rules
The rules for the transformers are defined in a dedicated yaml file with the following format:infer_from_security_labels option is enabled, the table transformers will be parsed from the source Postgres SECURITY LABELS for the anon extension. If the option is not enabled, the table transformers need to be explicitly provided.
Below is a complete example of a transformation rules YAML file:
strict or relaxed for all tables at once. Or it can be determined for each table individually, by setting the higher level validation_mode parameter to table_level. When it is set to strict, pgstream will throw an error if any of the columns in the table do not have a transformer defined. When set to relaxed, pgstream will skip any columns that do not have a transformer defined. Also in strict mode, all snapshot tables must be provided in the transformation config.
For details on how to use and configure the transformer, check the transformer tutorial.