|
In Re: Invent 2024, we launched Amazon S3 tables, the first cloud object storage with the built -in Apache Iceberg support to streamline the storage of tabular data on the scale and Amazon Sagemaker Lakehouse to simplify analytics and AI with a uniform, open and secure Lakehouse. You can also view the integration of S3 tables with Amazon Web Services (AWS) analytical services to stream, ask and visualize S3 Table data using Amazon Athena, Amazon Data Firehose, Amazon EMR, AWS GLUE, Amazon Redshift and Amazon QuickSight.
Our customers wanted to simplify the management and optimization of Apache Iceberg storage, which led to the development of S3 tables. At the same time, they worked on the distribution of data forces that prevents cooperation with analysis and generation of insight using Lakehouse Sagemaker. When, in addition to the built -in integration with AWS Analytics Services, they can get a platform unifying access to multiple data sources that enably both analytical and machine learning (ML).
Today we announce general availability Amazon S3 integration tables with Amazon Sagemaker Lakehouse Provide unified access to access to S3 data across different analytical engine and tools. You can access Sagemaker Lake from Amazon Sagemaker Unified Studio, the only AI data and development environment that brings together AWS Analytics and AI/ML Services. All S3 data integrated into the Sagemaker Lakehouse can be taken from SageMaker Unified Studio and Motors such as Amazon Athena, Amazon EMR, Amazon Redshift and Apache Iceberg-Compaatible Engines such as Apache Spark or Pyiceberg.
With this integration, you can simplify building secure analytical workflows where you can read and write to S3 tables and connect to data in Amazon Redshift data warehouses and third -party sources and federated data sources such as Amazon Dynamodb or PostgreSQL.
You can also centrally set up and manage the authorization of improved access to data in S3 tables along with other data in the Lakehouse Sagemaker and land them across all analytical and interrogation machines.
S3 Integration tables with Sagemaker Lake in action
Want to start, go to the Amazon S3 and select Bucket table from the navigation pane and select Enable integration If you want to get access to the AWS Analytics Services.
Now you can create a table bucket for integration with Sagemaker Lake. If you want to learn more, visit the S3 tables in the AWS documentation.
1. Create a table with Amazon Athena in the Amazon S3 console
You can create a table, fill it with data and ask it directly from the Amazon S3 console using Amazon Athena with a few steps. Select in the table bucket and select Create a table with Athenaor you can select an existing table and select Query table with Athena.
If you want to create a table with Athena, you should first get specific to your table. The S3 table bucket is an equivalent database in AWS Glue and you are using the table name as a database in your Athena questions.
Select NASEPACE and select Create a table with Athena. It goes to Question Editor In the Athena console. In the table, you can create a table in the S3 table bucket or inquiry.
2. Inquiry with Sagemaker Lake in Sagemaker Unified Studio
You can now access uniform data across S3 data lakes, red shift data warehouses, third -party and federated data sources in Sagemaker Lake directly from SageMaker Unified Studio.
If you want to get started, go to the Sagemaker and create a unified SageMaker Studio domain and the project using a sample project profile: Ai-ML Data Analysis and Development of AI-ML. If you want to learn more, visit the creation of Amazon Sagemaker Unified Studio Domain in AWS documentation.
After creating the project, go to the project overview and go down to the project details and note the role of Amazon Resource Name (RNA).
Go to the console and grant permission of AWS Lake and the roles and AWS roles and to manage identity and access (IAM). In in Main section, select <project role ARN>
Stated in the previous paragraph. Choose Named Data Catalog Sources in LF-ZAGRAINS OR CATALOG RESOURCES Section and select the name of the bucket of table to create Catalogs. If you want to know more, visit an overview of the permission to form the lake in the AWS documentation.
When you return to Sagemaker Unified Studio, you can see your bucket project on the table below Lakehouse in Data Offer on the left navigation pane of the project. When you decide ActionYou can choose how to ask the table data in Amazon Athena, Amazon Redshift or Jupyterlab.
When you decide Query with Athenagoes automatically Question Editor To start the data of data queries (DQL) and queries about manipulating data manipulations (DML) on S3 tables using Athena.
Here is a question about the sample using Athena:
select * from "s3tablecatalog/s3tables-integblog-bucket”.”proddb"."customer" limit 10;
If you want to ask with Amazon Redshift, you should set Amazon Redshift without Compute Resources to analyze data queries. And then you choose Query with a red shift and run SQL in Question Editor. If you want to use the JupyterLab notebook, you should create a new JUPYTERLAB space in Amazon EMR Serverless.
3. Connect to data from other sources with S3 table data
With the S3 Tables, which is now available in SageMaker Lake, you can connect to data from data warehouses, online transaction processing (OLTP), such as relational or non-relay databases, glacier tables and other third-party sources to gain complete and depeper knowledge.
For example, you can add a connection to data sources such as Amazon DocumentDB, Amazon Dynamodb, Amazon Reedshift, PostgreSQL, MySQL, Google BigQuery or Snowflake, and SQL combined data without extract, transformation and load.
Now you can start a SQL query in queries editor and connect to data in S3 tables with data in Dynamodb.
Here is a sample question that connects between Athena and Dynamodb:
select * from "s3tablescatalog/s3tables-integblog-bucket"."blogdb"."customer",
"dynamodb1"."default"."customer_ddb" where cust_id=pid limit 10;
If you want to learn more about this integration, visit the Amazon S3 integration tables with Amazon Sagemaker Lake in the AWS documentation.
Now available
The integration of S3 tables with SageMaker Lake is actually available in all AWS regions where S3 tables are available. If you want to know more, visit the S3 Tables product and Sagemaker Lakehouse.
Try the S3 tables in Sagemaker Unified Studio today and send AWS Re: Post for Amazon S3 and AWS Re: Post for Amazon Sagemaker or through the usual AWS support contacts.
At the annual celebration of the launch of the Amazon S3, we will present more amazing start for the Amazon S3 and Amazon Sagemaker. If you want to know more, join the AWS PI Day 14. March.
– Channels
–
How’s the Blog of news? Take this 1 minute survey!
(This survey is hosted by an external company. AWS processes your information as described in the AWS Privacy Notice. AWS will own data collected via this survey and will not share the collection of Lissel survey.)