Data loading
You can upload data to Tengri in the following ways:
-
Using the upload wizard — button Upload file.
-
Via the Python
tngrimodule.
Uploading data from a local file using the upload wizard
Supported file extensions for uploading:
-
.csv -
.json -
.parquet -
.xlsx -
.zip
To load data from a file:
-
Click Upload file in the Tengri interface.
-
Select or move the file to the opened upload area.
-
In the Parse file window that opens, check if the parsed data is correct.
If necessary, select the required recognition settings and press Next. -
You will then see a box with the code for downloading the file to be inserted into the notebook.
For the filefile_name.csvand for the useruser_nameit will look like this:select * from read_csv( "user_name/<id>_file_name.csv" )Press Add cell to add this cell to your notebook.
Once the cell is added to the notebook, the data from the file will be available for work.
Loading data from a file via the Python tngri module
To load data from non-local files, you can use the Python tngri module.
Loading data from a file via URL
Sample code at Python to load data from the iris.csv file located at the specified URL into Tengri:
import polars (1)
import tngri
df = polars.read_csv(
"https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv" (2)
)
tngri.upload_df(df) (3)
| 1 | Import the required modules Python |
| 2 | Read the file at the specified URL and write it to the df variable using the polars module |
| 3 | Upload data from df to Tengri using the function tngri.upload_df. |
In addition to .csv files, this method can be used for other extensions — .json, .xlsx and others (see Polars data upload documentation for details).
After that, a message like this will appear in the output cell:
UploadedFile(s3_path='s3://<path>/<file_id>.parquet')
Now the data from the file is available for work. To work with them, you need to use the read_parquet function and specify the id of the loaded .parquet file from the previous step:
SELECT * FROM read_parquet('<file_id>.parquet');
The .parquet extension will be the extension of the file loaded in this way in any case, regardless of the extension of the initial file.
|
Loading data from a file saved in S3
Example code at Python to load into Tengri data from the my_file.parquet file located in your bucket S3:
import tngri (1)
tngri.upload_s3(
object="s3://my_folders/my_file.parquet", (2)
access_key="***",
secret_key="***"
)
| 1 | Import module Python tngri. |
| 2 | Set the parameters of your bucket S3 (file path and access keys) |
The function tngri.upload_s3 uploads the file from your S3 bucket to Tengri.
The data from the file will then be available for work.
The file extension of the file can be anything. It will remain the same as it was in the initial file. To work with different extensions you should use different functions.
To work with data from a file in our example, use the read_parquet function:
SELECT * FROM read_parquet('my_file.parquet');
If necessary, you can specify a path and name for the uploaded file inside Tengri via the optional filename parameter of the tngri.upload_s3 function:
filename="new_path/new_name.parquet".
Working with data from downloaded files
Let’s show basic variants of working with data from loaded files of .csv type using read_csv function.
-
Check that the loaded
.csvfile is available:SELECT * FROM read_csv('customer_country.csv'); -
Create a table with data from the loaded
.csvfile:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_csv('customer_country.csv'); -
Load data from the
.csvfile into an existing table:INSERT INTO customer_country SELECT * FROM read_csv('customer_country.csv');
Working with data from downloaded files of different extensions
To work with loaded files of other extensions, you need to use the corresponding functions.
-
Create a table with data from the loaded file
.parquetusing theread_parquetfunction:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_parquet('customer_country.parquet'); -
Create a table with data from the loaded
.jsonfile using theread_jsonfunction:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_json('customer_country.json'); -
Create a table with data from the loaded
.xlsxfile using theread_xlsxfunction:CREATE OR REPLACE TABLE customer_country AS SELECT * FROM read_xlsx('customer_country.xlsx');
An example of working with data from the downloaded .zip file can be seen here.