Data obfuscation
Technical data masking — obfuscation — may be necessary to comply with the privacy requirements when using Tengri.
In this example, we will show how to obfuscate personal data so that anonymised (obfuscated) data is stored in a separate schema (which can be accessed by an unlimited number of employees), while the original data is stored in a schema with restricted access. Direct access to personal data can thus be completely isolated.
| Protection of personal data is necessary to fulfil the requirements of Federal Law No. 152-FZ "On Personal Data" in the Russian Federation and the General Data Protection Regulation (GDPR) in the European Union. |
We will use three schemas:
-
demo_raw— schema with raw data (not obfuscated)
Access: restricted.
Can be deleted after obfuscation. -
demo_secret— schema with matching keys
Access: restricted. -
demo_obfuscated— schema with obfuscated data
Access: unrestricted.
Data preparation
For demonstration purposes, let’s create an artificial data set of online shop users with the following fields:
-
User_ID— user identifier -
Full_Name— user’s full name -
Phone_Number— user’s phone number -
Address— user’s address -
Birth_Date— user’s date of birth -
Orders_Count— user’s number of orders
To do this, let’s use the library Python Faker:
import pandas as pd
from faker import Faker
# Инициализируем Faker для генерации русскоязычных данных
fake = Faker("ru_RU")
# Задаем количество пользователей
num_users = 1000
users_data = []
for _ in range(num_users):
# Генерируем профили пользователей
profile = fake.simple_profile()
user = {
"User_ID": fake.unique.random_int(min=10000, max=99999),
"Full_Name": profile["name"],
"Phone_Number": fake.phone_number(),
"Address": fake.address().replace("\n", ", "),
"Birth_Date": fake.date_of_birth(minimum_age=18, maximum_age=70).strftime(
"%Y-%m-%d"
),
"Orders_Count": fake.random_int(min=0, max=100),
}
users_data.append(user)
# Записываем в DataFrame
df = pd.DataFrame(users_data)
df.head()
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| | User_ID | Full_Name | Phone_Number | Address | Birth_Date | Orders_Count |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| 0 | 33165 | Ульяна Яковлевна Филиппова | 8 (222) 153-7935 | д. Курчатов, ул. Волкова, д. 4/1 стр. 212, 601586 | 1988-03-05 | 79 |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| 1 | 62431 | Воронова Оксана Натановна | 8 (390) 229-9364 | ст. Дербент, наб. Осипенко, д. 8 стр. 920, 010532 | 1962-04-02 | 75 |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| 2 | 63202 | Яковлев Евсей Евстигнеевич | 8 (022) 393-80-71 | п. Сорочинск, ш. Ушакова, д. 92, 309643 | 1997-01-12 | 53 |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| 3 | 59677 | Селиверст Власович Кулагин | +73939353959 | ст. Ессентуки, бул. Дорожный, д. 8/3, 828001 | 2004-07-09 | 14 |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
| 4 | 39532 | Баранов Ираклий Фролович | 8 (987) 772-04-72 | к. Объячево, ул. Пролетарская, д. 3/7 к. 1/8, ... | 1992-08-24 | 83 |
+---+---------+----------------------------+-------------------+---------------------------------------------------+------------+--------------+
Let’s write the obtained table into the demo_raw schema. We use the functions tngri.sql and tngri.create_table for this purpose:
import tngri
table_name = 'demo_raw.users'
tngri.sql(f'drop table if exists {table_name}')
tngri.create_table(df, table_name)
done at 2026-05-20 12:26:44.803990
Now in a cell of type SQL we will display the written table with unobfuscated data:
SELECT *
FROM demo_raw.users
LIMIT 5;
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| User_ID | Full_Name | Phone_Number | Address | Birth_Date | Orders_Count |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| 33165 | Ульяна Яковлевна Филиппова | 8 (222) 153-7935 | д. Курчатов, ул. Волкова, д. 4/1 стр. 212, 601586 | 1988-03-05 | 79 |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| 62431 | Воронова Оксана Натановна | 8 (390) 229-9364 | ст. Дербент, наб. Осипенко, д. 8 стр. 920, 010532 | 1962-04-02 | 75 |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| 63202 | Яковлев Евсей Евстигнеевич | 8 (022) 393-80-71 | п. Сорочинск, ш. Ушакова, д. 92, 309643 | 1997-01-12 | 53 |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| 59677 | Селиверст Власович Кулагин | +73939353959 | ст. Ессентуки, бул. Дорожный, д. 8/3, 828001 | 2004-07-09 | 14 |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
| 39532 | Баранов Ираклий Фролович | 8 (987) 772-04-72 | к. Объячево, ул. Пролетарская, д. 3/7 к. 1/8, 027088 | 1992-08-24 | 83 |
+---------+----------------------------+-------------------+------------------------------------------------------+------------+--------------+
Data obfuscation
Suppose we need to obfuscate three columns from the demo_raw.users table — Full_Name, Phone_Number and Address. For each of these columns, in the demo_secret schema, we will create a table of matches between the real data and the random identifier UUID v4. When creating the tables, we will generate the identifier using the uuid function. We will use the original user identifier — User_ID — to identify the data in these tables.
CREATE OR REPLACE TABLE demo_secret.user_full_name_resolver AS
SELECT
User_ID,
Full_Name,
uuid() AS name_uuid
FROM demo_raw.users;
CREATE OR REPLACE TABLE demo_secret.user_address_resolver AS
SELECT
User_ID,
Address,
uuid() AS address_uuid
FROM demo_raw.users;
CREATE OR REPLACE TABLE demo_secret.user_phone_number_resolver AS
SELECT
User_ID,
Phone_Number,
uuid() AS phone_uuid
FROM demo_raw.users;
+--------+
| status |
+--------+
| CREATE |
+--------+
The same can be done through a loop in a cell of type Python using the function tngri.sql. Let’s specify a list of columns to obfuscate and table names and iterate over the list of columns. This will allow us to write the code SQL once and not to rewrite it for all columns:
import tngri
source_schema = 'demo_raw'
source_table = 'users'
columns_to_obfuscate = ['Full_Name', 'Address', 'Phone_Number']
secret_schema = 'demo_secret'
for column in columns_to_obfuscate:
cur_table = f'{secret_schema}.{source_table}_{column}_resolver'
tngri.sql(f'''
CREATE OR REPLACE TABLE {cur_table} AS
SELECT
User_ID,
{column},
uuidv4() AS {column}_uuid
FROM {source_schema}.{source_table};
''')
print(f'Column "{column}" obfuscated')
print(f'Created table: {cur_table}')
Column "Full_Name" obfuscated
Created table: demo_secret.users_Full_Name_resolver
Column "Address" obfuscated
Created table: demo_secret.users_Address_resolver
Column "Phone_Number" obfuscated
Created table: demo_secret.users_Phone_Number_resolver
Let’s check the obtained tables.
SELECT *
FROM demo_secret.user_full_name_resolver
LIMIT 5;
+---------+----------------------------+--------------------------------------+
| User_ID | Full_Name | name_uuid |
+---------+----------------------------+--------------------------------------+
| 33165 | Ульяна Яковлевна Филиппова | c6ead1c1-7135-4efc-9b8f-cc84ae43422d |
+---------+----------------------------+--------------------------------------+
| 62431 | Воронова Оксана Натановна | a3bcd3f5-38b9-4b75-925d-23ee248355f8 |
+---------+----------------------------+--------------------------------------+
| 63202 | Яковлев Евсей Евстигнеевич | 82c201e0-2b05-4ea9-8a31-dfcb932efe50 |
+---------+----------------------------+--------------------------------------+
| 59677 | Селиверст Власович Кулагин | 398fc6fd-c040-4049-8cbd-6db877fdf9f9 |
+---------+----------------------------+--------------------------------------+
| 39532 | Баранов Ираклий Фролович | e6a226d6-294e-4ca2-8392-c31c56ef9a38 |
+---------+----------------------------+--------------------------------------+
SELECT *
FROM demo_secret.user_phone_number_resolver
LIMIT 5;
+---------+-------------------+--------------------------------------+
| User_ID | Phone_Number | phone_uuid |
+---------+-------------------+--------------------------------------+
| 33165 | 8 (222) 153-7935 | c66fe50a-d9c5-4537-9a71-c6684ea6f72b |
+---------+-------------------+--------------------------------------+
| 62431 | 8 (390) 229-9364 | 4a8e6c45-75fa-4ad6-9da5-dca1433b778d |
+---------+-------------------+--------------------------------------+
| 63202 | 8 (022) 393-80-71 | 1873bd3d-9788-41f4-a7bf-7a7826e8af7a |
+---------+-------------------+--------------------------------------+
| 59677 | +73939353959 | ee464308-54df-4634-89ee-24053b738651 |
+---------+-------------------+--------------------------------------+
| 39532 | 8 (987) 772-04-72 | 2dc785b8-ddeb-49e2-ae3a-387540edb40d |
+---------+-------------------+--------------------------------------+
SELECT *
FROM demo_secret.user_address_resolver
LIMIT 5;
+---------+------------------------------------------------------+--------------------------------------+
| User_ID | Address | address_uuid |
+---------+------------------------------------------------------+--------------------------------------+
| 33165 | д. Курчатов, ул. Волкова, д. 4/1 стр. 212, 601586 | 0c5ff6f9-d55e-4cbb-80d3-452e40134475 |
+---------+------------------------------------------------------+--------------------------------------+
| 62431 | ст. Дербент, наб. Осипенко, д. 8 стр. 920, 010532 | 20704512-0e5e-41ec-bd5c-3383c53a2f53 |
+---------+------------------------------------------------------+--------------------------------------+
| 63202 | п. Сорочинск, ш. Ушакова, д. 92, 309643 | 8108f398-dc2c-41f2-b8b3-26c584fd69f8 |
+---------+------------------------------------------------------+--------------------------------------+
| 59677 | ст. Ессентуки, бул. Дорожный, д. 8/3, 828001 | 3db054ee-4aa4-4fe8-bf39-4aae9e1f5637 |
+---------+------------------------------------------------------+--------------------------------------+
| 39532 | к. Объячево, ул. Пролетарская, д. 3/7 к. 1/8, 027088 | c19d2a24-7ebf-44af-a2d3-e4df2f3b4f84 |
+---------+------------------------------------------------------+--------------------------------------+
We have obtained key correspondence tables in a separate demo_secret schema. Now we can restrict privileges to this schema so that only a limited number of users can view the real data. If necessary, we can grant privileges to individual tables of this schema as well.
-
More information about granting privileges: Operations with privileges.
Now let’s create a fully obfuscated table in the demo_obfuscated schema. This table will contain all data from the original table demo_raw.users, but the selected columns will be obfuscated.
CREATE OR REPLACE TABLE demo_obfuscated.users AS
SELECT
u.User_ID,
n.name_uuid AS Full_Name,
p.phone_uuid AS Phone_Number,
a.address_uuid AS Address,
u.Birth_Date,
u.Orders_Count
FROM demo_raw.users u
JOIN demo_secret.user_full_name_resolver n ON u.User_ID = n.User_ID
JOIN demo_secret.user_phone_number_resolver p ON u.User_ID = p.User_ID
JOIN demo_secret.user_address_resolver a ON u.User_ID = a.User_ID;
+--------+
| status |
+--------+
| CREATE |
+--------+
Let’s check the created table:
SELECT *
FROM demo_obfuscated.users
LIMIT 5;
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| User_ID | Full_Name | Phone_Number | Address | Birth_Date | Orders_Count |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| 33165 | c6ead1c1-7135-4efc-9b8f-cc84ae43422d | c66fe50a-d9c5-4537-9a71-c6684ea6f72b | 0c5ff6f9-d55e-4cbb-80d3-452e40134475 | 1988-03-05 | 79 |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| 62431 | a3bcd3f5-38b9-4b75-925d-23ee248355f8 | 4a8e6c45-75fa-4ad6-9da5-dca1433b778d | 20704512-0e5e-41ec-bd5c-3383c53a2f53 | 1962-04-02 | 75 |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| 63202 | 82c201e0-2b05-4ea9-8a31-dfcb932efe50 | 1873bd3d-9788-41f4-a7bf-7a7826e8af7a | 8108f398-dc2c-41f2-b8b3-26c584fd69f8 | 1997-01-12 | 53 |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| 59677 | 398fc6fd-c040-4049-8cbd-6db877fdf9f9 | ee464308-54df-4634-89ee-24053b738651 | 3db054ee-4aa4-4fe8-bf39-4aae9e1f5637 | 2004-07-09 | 14 |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
| 39532 | e6a226d6-294e-4ca2-8392-c31c56ef9a38 | 2dc785b8-ddeb-49e2-ae3a-387540edb40d | c19d2a24-7ebf-44af-a2d3-e4df2f3b4f84 | 1992-08-24 | 83 |
+---------+--------------------------------------+--------------------------------------+--------------------------------------+------------+--------------+
Strictly speaking, it is not necessary to have obfuscated columns in this table, as this data is already stored in the demo_secret schema’s correspondence tables. In this example, we save all columns from the original table here for clarity.
|
The demo_obfuscated schema can be accessed in the widest possible way, as it no longer has personal data.
The original table in the demo_raw schema can then be deleted, as all data from it has now been saved and distributed to two other schemas with different access levels.
|
Using obfuscated data
Suppose an analyst with limited access to personal data is working in the demo_obfuscated schema and is looking for the top 5 users by number of purchases aged 18-25.
SELECT
User_ID,
Orders_Count,
date_diff('year', Birth_Date::DATE, current_date) AS Age
FROM demo_obfuscated.users
WHERE date_diff('year', Birth_Date::DATE, current_date) BETWEEN 18 AND 25
ORDER BY Orders_Count DESC
LIMIT 5;
+---------+--------------+-----+
| User_ID | Orders_Count | Age |
+---------+--------------+-----+
| 43184 | 100 | 23 |
+---------+--------------+-----+
| 20635 | 100 | 25 |
+---------+--------------+-----+
| 37231 | 99 | 20 |
+---------+--------------+-----+
| 71821 | 99 | 23 |
+---------+--------------+-----+
| 38810 | 96 | 20 |
+---------+--------------+-----+
It gets an anonymized list of 5 User_ID as the result.
Suppose he passes this list to another employee (e.g. a manager) who has access to personal data. The manager, in turn, can run a query using the list of user IDs passed to him and the tables from the demo_secret schema, but without using the source table from the demo_raw schema (!), to get the real data of the users selected by the analyst:
SELECT
u.User_ID,
n.Full_Name AS Real_Name,
p.Phone_Number AS Real_Phone,
a.Address AS Real_Address,
u.Orders_Count,
u.Birth_Date
FROM demo_obfuscated.users u
JOIN demo_secret.user_full_name_resolver n ON u.User_ID = n.User_ID
JOIN demo_secret.user_phone_number_resolver p ON u.User_ID = p.User_ID
JOIN demo_secret.user_address_resolver a ON u.User_ID = a.User_ID
WHERE u.User_ID IN (43184,
20635,
37231,
71821,
38810);
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| User_ID | Real_Name | Real_Phone | Real_Address | Orders_Count | Birth_Date |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| 43184 | Павлова Ульяна Дмитриевна | +7 (606) 457-73-87 | к. Троицк (Моск.), пер. Ручейный, д. 738, 509804 | 100 | 2003-05-02 |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| 20635 | Федосеева Александра Наумовна | 8 (937) 491-6658 | п. Каменск-Шахтинский, ул. Большая, д. 362, 653554 | 100 | 2001-11-15 |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| 38810 | Лыткин Пахом Демидович | +78903266054 | к. Чегем, пер. Дорожников, д. 7/6 стр. 2, 408961 | 96 | 2006-05-28 |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| 37231 | Ангелина Оскаровна Лобанова | 8 (012) 843-50-16 | к. Мелеуз, ш. Заовражное, д. 518 стр. 6, 512707 | 99 | 2006-08-26 |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
| 71821 | Аксенов Максимильян Викентьевич | 8 (802) 245-72-10 | с. Яшалта, бул. Щербакова, д. 46 к. 847, 306033 | 99 | 2003-01-02 |
+---------+---------------------------------+--------------------+----------------------------------------------------+--------------+------------+
In this way, data handling is restricted to different access rights for different employees, and personal data remains securely protected.