Get information from multimodal content with Amazon Bedrock data automation, now generally available Amazon Web Services

Many applications must interact with the content available through various modalities. Some of these applications process complex documents such as insurance claims and medical accounts. Mobile applications must analyze user -general media. The organization must create a semantic index at the top of its digital assets that contain documents, images, audio and video files. However, it is not easy to set up knowledge from non -comprehensive multimodal content: you must implement pipe processing for different data formats and go through a few steps to get the necessary information. This usually means having multiple models in production for which you have to manage cost optimization (through fine fine -tuning and fast engineering), warranty (for example against hallucinations), integration with target applications (including data formats) and model updates.

To make this process easier, we have introduced in the AWS Re: Invent Amazon Bedrock Data Automation, Amazon Bedrock’s ability, which makes generating valuable knowledge of non -structures, multimodal content such as documents, images, sound and videos. With data automation based on subsoil, you can shorten the development and efforts to build intelligent processing processing, media analysis and other multimodal solutions focused on data.

You can use data automation to Bedrock as a separate feature or as Amazon Bedrock Nowledge Bases analyzer for index knowledge of multimodal content and provide greater lifting for search-auled generation (RAG).

Currently, bed data automation is generally available with endpoints between the region to be available in multiple AWS regions and use calculation across different rent. Based on your feedback during the preview, we have also improved accuracy and added support to recognize the logo for images and videos.

Let’s take a look at how it works in practice.

Using Amazon Bedrock data automation with endpoints for cross regions
The blog post published for the preview of data automation Bedrock shows how to use the visual demo in the Amazon Bedrock console to extract information from documents and videos. I recommend that you go through the demo with the console experience to understand how this ability works and what you can do to customize it. For this post I focus more on how data automation works in your applications, starting with a few steps in the console and follower with code samples.

Tea Data automation The Amazon Bedrock console section is now asking for confirmation to allow the cross region to support the first access to it. For example:

Screen console.

From the perspective of the API InvokeDataAutomationAsync Operation now requires another parameter (dataAutomationProfileArn) For specific use of data automation profile. The value for this parameter depends on the AWS account and ID:

arn:aws:bedrock:<REGION>:<ACCOUNT_ID>:data-automation-profile/us.data-automation-v1

Also dataAutomationArn The parameter was renamed to dataAutomationProjectArn Better to reflect that it contains the name of the project Amazon Resource (RNA). You must now use a project or plan to automate data automation. If you go in blueprints, you will get your own output. If you want to continue getting standard default output, configure the parameter DataAutomationProjectArn use arn:aws:bedrock:<REGION>:aws:data-automation-project/public-default.

As the name suggests, InvokeDataAutomationAsync The operation is asynchronous. You pass on the input and output configuration, and when the result is ready, it is a writer in Amazon Simple Storage Service (Amazon S3) Kbelík, as specified in the output configuration. You can receive Amazon Eventbridge notifications from Bedrock data automation using notificationConfiguration parameter.

You can configure the outputs in two ways with the data automation:

  • Standard output Predefined knowledge is given under data type, such as document semantics, video chapter summary and audio transcripts. With standard outputs, you can set the required information in several steps.
  • The actual output It allows you to determine that extraction needs user plans for multiple knowledge.

If you want to view new skills in action, create a project and customize the standard output settings. I choose the simple text of the Intetead of Markdown for documents. Note that you can automate these configuration steps using the API Automation Data Automation Bedrock.

Screen console.

For videos I want a complete audio transcript and summary of the video entitare. I also ask for a summary of each chapter.

Screen console.

To configure the plan, I choose Own output settings in Data automation Part of the Amazon Bedrock Console Navigation pane. I am looking for US-DRIVER-LICENSE Sample plan. You can go through other sample plans for other example and ideas.

Sample plans cannot be adjusted so I use Action Offer for duplicating the plan and add it to my project. There I can fine -tune the data so that it is also accumulated by adjusting the plan and adding customs that can extract or calculate the data in the format I need using generative AI.

Screen console.

I record a picture of an American driving license to the S3 bucket. Then I use this sample script python that uses Bedrock data automation via AWS SDK for Python (Boto3) to extra draw text from the picture:

import json
import sys
import time

import boto3

DEBUG = False

AWS_REGION = '<REGION>'
BUCKET_NAME = '<BUCKET>'
INPUT_PATH = 'BDA/Input'
OUTPUT_PATH = 'BDA/Output'

PROJECT_ID = '<PROJECT_ID>'
BLUEPRINT_NAME = 'US-Driver-License-demo'

# Fields to display
BLUEPRINT_FIELDS = (
    'NAME_DETAILS/FIRST_NAME',
    'NAME_DETAILS/MIDDLE_NAME',
    'NAME_DETAILS/LAST_NAME',
    'DATE_OF_BIRTH',
    'DATE_OF_ISSUE',
    'EXPIRATION_DATE'
)

# AWS SDK for Python (Boto3) clients
bda = boto3.client('bedrock-data-automation-runtime', region_name=AWS_REGION)
s3 = boto3.client('s3', region_name=AWS_REGION)
sts = boto3.client('sts')


def log(data):
    if DEBUG:
        if type(data) is dict:
            text = json.dumps(data, indent=4)
        else:
            text = str(data)
        print(text)

def get_aws_account_id() -> str:
    return sts.get_caller_identity().get('Account')


def get_json_object_from_s3_uri(s3_uri) -> dict:
    s3_uri_split = s3_uri.split('/')
    bucket = s3_uri_split(2)
    key = '/'.join(s3_uri_split(3:))
    object_content = s3.get_object(Bucket=bucket, Key=key)('Body').read()
    return json.loads(object_content)


def invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id) -> dict:
    params = {
        'inputConfiguration': {
            's3Uri': input_s3_uri
        },
        'outputConfiguration': {
            's3Uri': output_s3_uri
        },
        'dataAutomationConfiguration': {
            'dataAutomationProjectArn': data_automation_arn
        },
        'dataAutomationProfileArn': f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-profile/us.data-automation-v1"
    }

    response = bda.invoke_data_automation_async(**params)
    log(response)

    return response

def wait_for_data_automation_to_complete(invocation_arn, loop_time_in_seconds=1) -> dict:
    while True:
        response = bda.get_data_automation_status(
            invocationArn=invocation_arn
        )
        status = response('status')
        if status not in ('Created', 'InProgress'):
            print(f" {status}")
            return response
        print(".", end='', flush=True)
        time.sleep(loop_time_in_seconds)


def print_document_results(standard_output_result):
    print(f"Number of pages: {standard_output_result('metadata')('number_of_pages')}")
    for page in standard_output_result('pages'):
        print(f"- Page {page('page_index')}")
        if 'text' in page('representation'):
            print(f"{page('representation')('text')}")
        if 'markdown' in page('representation'):
            print(f"{page('representation')('markdown')}")


def print_video_results(standard_output_result):
    print(f"Duration: {standard_output_result('metadata')('duration_millis')} ms")
    print(f"Summary: {standard_output_result('video')('summary')}")
    statistics = standard_output_result('statistics')
    print("Statistics:")
    print(f"- Speaket count: {statistics('speaker_count')}")
    print(f"- Chapter count: {statistics('chapter_count')}")
    print(f"- Shot count: {statistics('shot_count')}")
    for chapter in standard_output_result('chapters'):
        print(f"Chapter {chapter('chapter_index')} {chapter('start_timecode_smpte')}-{chapter('end_timecode_smpte')} ({chapter('duration_millis')} ms)")
        if 'summary' in chapter:
            print(f"- Chapter summary: {chapter('summary')}")


def print_custom_results(custom_output_result):
    matched_blueprint_name = custom_output_result('matched_blueprint')('name')
    log(custom_output_result)
    print('\n- Custom output')
    print(f"Matched blueprint: {matched_blueprint_name}  Confidence: {custom_output_result('matched_blueprint')('confidence')}")
    print(f"Document class: {custom_output_result('document_class')('type')}")
    if matched_blueprint_name == BLUEPRINT_NAME:
        print('\n- Fields')
        for field_with_group in BLUEPRINT_FIELDS:
            print_field(field_with_group, custom_output_result)


def print_results(job_metadata_s3_uri) -> None:
    job_metadata = get_json_object_from_s3_uri(job_metadata_s3_uri)
    log(job_metadata)

    for segment in job_metadata('output_metadata'):
        asset_id = segment('asset_id')
        print(f'\nAsset ID: {asset_id}')

        for segment_metadata in segment('segment_metadata'):
            # Standard output
            standard_output_path = segment_metadata('standard_output_path')
            standard_output_result = get_json_object_from_s3_uri(standard_output_path)
            log(standard_output_result)
            print('\n- Standard output')
            semantic_modality = standard_output_result('metadata')('semantic_modality')
            print(f"Semantic modality: {semantic_modality}")
            match semantic_modality:
                case 'DOCUMENT':
                    print_document_results(standard_output_result)
                case 'VIDEO':
                    print_video_results(standard_output_result)
            # Custom output
            if 'custom_output_status' in segment_metadata and segment_metadata('custom_output_status') == 'MATCH':
                custom_output_path = segment_metadata('custom_output_path')
                custom_output_result = get_json_object_from_s3_uri(custom_output_path)
                print_custom_results(custom_output_result)


def print_field(field_with_group, custom_output_result) -> None:
    inference_result = custom_output_result('inference_result')
    explainability_info = custom_output_result('explainability_info')(0)
    if '/' in field_with_group:
        # For fields part of a group
        (group, field) = field_with_group.split('/')
        inference_result = inference_result(group)
        explainability_info = explainability_info(group)
    else:
        field = field_with_group
    value = inference_result(field)
    confidence = explainability_info(field)('confidence')
    print(f'{field}: {value or '<EMPTY>'}  Confidence: {confidence}')


def main() -> None:
    if len(sys.argv) < 2:
        print("Please provide a filename as command line argument")
        sys.exit(1)
      
    file_name = sys.argv(1)
    
    aws_account_id = get_aws_account_id()
    input_s3_uri = f"s3://{BUCKET_NAME}/{INPUT_PATH}/{file_name}" # File
    output_s3_uri = f"s3://{BUCKET_NAME}/{OUTPUT_PATH}" # Folder
    data_automation_arn = f"arn:aws:bedrock:{AWS_REGION}:{aws_account_id}:data-automation-project/{PROJECT_ID}"

    print(f"Invoking Bedrock Data Automation for '{file_name}'", end='', flush=True)

    data_automation_response = invoke_data_automation(input_s3_uri, output_s3_uri, data_automation_arn, aws_account_id)
    data_automation_status = wait_for_data_automation_to_complete(data_automation_response('invocationArn'))

    if data_automation_status('status') == 'Success':
        job_metadata_s3_uri = data_automation_status('outputConfiguration')('s3Uri')
        print_results(job_metadata_s3_uri)


if __name__ == "__main__":
    main()

The initial configuration in the script includes the name of the S3, which should be used at the input and output, renting an input file in the bucket, the output path for the results, the project ID that uses its own data automation to the subsoil and the Blueprint field that appears on the output.

I will start the script to hand over the input file name. In the output I see information about automation of data on the bedrock. Tea US-DRIVER-LICENSE It is the match and the name and data in the driving license are printed at the output.

python bda-ga.py bda-drivers-license.jpeg

Invoking Bedrock Data Automation for 'bda-drivers-license.jpeg'................ Success

Asset ID: 0

- Standard output
Semantic modality: DOCUMENT
Number of pages: 1
- Page 0
NEW JERSEY

Motor Vehicle
 Commission

AUTO DRIVER LICENSE

Could DL M6454 64774 51685                      CLASS D
        DOB 01-01-1968
ISS 03-19-2019          EXP     01-01-2023
        MONTOYA RENEE MARIA 321 GOTHAM AVENUE TRENTON, NJ 08666 OF
        END NONE
        RESTR NONE
        SEX F HGT 5'-08" EYES HZL               ORGAN DONOR
        CM ST201907800000019 CHG                11.00

(SIGNATURE)



- Custom output
Matched blueprint: US-Driver-License-copy  Confidence: 1
Document class: US-drivers-licenses

- Fields
FIRST_NAME: RENEE  Confidence: 0.859375
MIDDLE_NAME: MARIA  Confidence: 0.83203125
LAST_NAME: MONTOYA  Confidence: 0.875
DATE_OF_BIRTH: 1968-01-01  Confidence: 0.890625
DATE_OF_ISSUE: 2019-03-19  Confidence: 0.79296875
EXPIRATION_DATE: 2023-01-01  Confidence: 0.93359375

As expected, I see the output of the information I chose from the plan associated with the Data Automation Project.

Similarly, I run the same script for a video file by my colleague Mike Chambers. To keep the output small, I will not print the entire audio transcription or the text shown in the video.

python bda.py mike-video.mp4
Invoking Bedrock Data Automation for 'mike-video.mp4'.......................................................................................................................................................................................................................................................................... Success

Asset ID: 0

- Standard output
Semantic modality: VIDEO
Duration: 810476 ms
Summary: In this comprehensive demonstration, a technical expert explores the capabilities and limitations of Large Language Models (LLMs) while showcasing a practical application using AWS services. He begins by addressing a common misconception about LLMs, explaining that while they possess general world knowledge from their training data, they lack current, real-time information unless connected to external data sources.

To illustrate this concept, he demonstrates an "Outfit Planner" application that provides clothing recommendations based on location and weather conditions. Using Brisbane, Australia as an example, the application combines LLM capabilities with real-time weather data to suggest appropriate attire like lightweight linen shirts, shorts, and hats for the tropical climate.

The demonstration then shifts to the Amazon Bedrock platform, which enables users to build and scale generative AI applications using foundation models. The speaker showcases the "OutfitAssistantAgent," explaining how it accesses real-time weather data to make informed clothing recommendations. Through the platform's "Show Trace" feature, he reveals the agent's decision-making process and how it retrieves and processes location and weather information.

The technical implementation details are explored as the speaker configures the OutfitAssistant using Amazon Bedrock. The agent's workflow is designed to be fully serverless and managed within the Amazon Bedrock service.

Further diving into the technical aspects, the presentation covers the AWS Lambda console integration, showing how to create action group functions that connect to external services like the OpenWeatherMap API. The speaker emphasizes that LLMs become truly useful when connected to tools providing relevant data sources, whether databases, text files, or external APIs.

The presentation concludes with the speaker encouraging viewers to explore more AWS developer content and engage with the channel through likes and subscriptions, reinforcing the practical value of combining LLMs with external data sources for creating powerful, context-aware applications.
Statistics:
- Speaket count: 1
- Chapter count: 6
- Shot count: 48
Chapter 0 00:00:00:00-00:01:32:01 (92025 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hooded sweatshirt with various logos and text, is sitting at a desk in front of a colorful background. He discusses the frequent release of new large language models (LLMs) and how people often test these models by asking questions like "Who won the World Series?" The man explains that LLMs are trained on general data from the internet, so they may have information about past events but not current ones. He then poses the question of what he wants from an LLM, stating that he desires general world knowledge, such as understanding basic concepts like "up is up" and "down is down," but does not need specific factual knowledge. The man suggests that he can attach other systems to the LLM to access current factual data relevant to his needs. He emphasizes the importance of having general world knowledge and the ability to use tools and be linked into agentic workflows, which he refers to as "agentic workflows." The man encourages the audience to add this term to their spell checkers, as it will likely become commonly used.
Chapter 1 00:01:32:01-00:03:38:18 (126560 ms)
- Chapter summary: The video showcases a man with a beard and glasses demonstrating an "Outfit Planner" application on his laptop. The application allows users to input their location, such as Brisbane, Australia, and receive recommendations for appropriate outfits based on the weather conditions. The man explains that the application generates these recommendations using large language models, which can sometimes provide inaccurate or hallucinated information since they lack direct access to real-world data sources.

The man walks through the process of using the Outfit Planner, entering Brisbane as the location and receiving weather details like temperature, humidity, and cloud cover. He then shows how the application suggests outfit options, including a lightweight linen shirt, shorts, sandals, and a hat, along with an image of a woman wearing a similar outfit in a tropical setting.

Throughout the demonstration, the man points out the limitations of current language models in providing accurate and up-to-date information without external data connections. He also highlights the need to edit prompts and adjust settings within the application to refine the output and improve the accuracy of the generated recommendations.
Chapter 2 00:03:38:18-00:07:19:06 (220620 ms)
- Chapter summary: The video demonstrates the Amazon Bedrock platform, which allows users to build and scale generative AI applications using foundation models (FMs). (speaker_0) introduces the platform's overview, highlighting its key features like managing FMs from AWS, integrating with custom models, and providing access to leading AI startups. The video showcases the Amazon Bedrock console interface, where (speaker_0) navigates to the "Agents" section and selects the "OutfitAssistantAgent" agent. (speaker_0) tests the OutfitAssistantAgent by asking it for outfit recommendations in Brisbane, Australia. The agent provides a suggestion of wearing a light jacket or sweater due to cool, misty weather conditions. To verify the accuracy of the recommendation, (speaker_0) clicks on the "Show Trace" button, which reveals the agent's workflow and the steps it took to retrieve the current location details and weather information for Brisbane. The video explains that the agent uses an orchestration and knowledge base system to determine the appropriate response based on the user's query and the retrieved data. It highlights the agent's ability to access real-time information like location and weather data, which is crucial for generating accurate and relevant responses.
Chapter 3 00:07:19:06-00:11:26:13 (247214 ms)
- Chapter summary: The video demonstrates the process of configuring an AI assistant agent called "OutfitAssistant" using Amazon Bedrock. (speaker_0) introduces the agent's purpose, which is to provide outfit recommendations based on the current time and weather conditions. The configuration interface allows selecting a language model from Anthropic, in this case the Claud 3 Haiku model, and defining natural language instructions for the agent's behavior. (speaker_0) explains that action groups are groups of tools or actions that will interact with the outside world. The OutfitAssistant agent uses Lambda functions as its tools, making it fully serverless and managed within the Amazon Bedrock service. (speaker_0) defines two action groups: "get coordinates" to retrieve latitude and longitude coordinates from a place name, and "get current time" to determine the current time based on the location. The "get current weather" action requires calling the "get coordinates" action first to obtain the location coordinates, then using those coordinates to retrieve the current weather information. This demonstrates the agent's workflow and how it utilizes the defined actions to generate outfit recommendations. Throughout the video, (speaker_0) provides details on the agent's configuration, including its name, description, model selection, instructions, and action groups. The interface displays various options and settings related to these aspects, allowing (speaker_0) to customize the agent's behavior and functionality.
Chapter 4 00:11:26:13-00:13:00:17 (94160 ms)
- Chapter summary: The video showcases a presentation by (speaker_0) on the AWS Lambda console and its integration with machine learning models for building powerful agents. (speaker_0) demonstrates how to create an action group function using AWS Lambda, which can be used to generate text responses based on input parameters like location, time, and weather data. The Lambda function code is shown, utilizing external services like OpenWeatherMap API for fetching weather information. (speaker_0) explains that for a large language model to be useful, it needs to connect to tools providing relevant data sources, such as databases, text files, or external APIs. The presentation covers the process of defining actions, setting up Lambda functions, and leveraging various tools within the AWS environment to build intelligent agents capable of generating context-aware responses.
Chapter 5 00:13:00:17-00:13:28:10 (27761 ms)
- Chapter summary: A man with a beard and glasses, wearing a gray hoodie with various logos and text, is sitting at a desk in front of a colorful background. He is using a laptop computer that has stickers and logos on it, including the AWS logo. The man appears to be presenting or speaking about AWS (Amazon Web Services) and its services, such as Lambda functions and large language models. He mentions that if a Lambda function can do something, then it can be used to augment a large language model. The man concludes by expressing hope that the viewer found the video useful and insightful, and encourages them to check out other videos on the AWS developers channel. He also asks viewers to like the video, subscribe to the channel, and watch other videos.

What to know
Amazon Bedrock data automation is now available through the inference of the cross region in the following two AWS regions: Us East (N. Virginia) and the US West (Oregon). When using data automation from these areas from these regions, data can be processed by inference in any of the oven areas: US East (Ohio, N. Virginia) and US West (N. California, Oregon). All these regions are in the US, so data is processed with the same geography. Later in 2025 we try to add support for multiple regions in Europe and Asia.

Compared to preview and using Cross-Region-regions, inference does not change prices compared to preview. For more information, visit Amazon Bedrock.

Bed data automation now also includes a number of security, administrative and correct abilities such as AWS Key Management Service (AWS KMS) Customer Management Games Support for Granular Encryption, AWS Profitrin will connect directly to API to automate data based on your virtual private private private private private private private private private private private private private private private private private private Cloud (VPC) Intear Internet interconnection and sources and tasks for data automation and cost monitoring and promotion and promotion of Access Management (IAM).

I used Python in this blog post, but Bedrock data automation is available at all AWS SDKs. For example, you can use Java, .NET or REST for Backend document processing application; JavaScript for a web application that processes images, videos, sound gold files; and Swift for the native mobile application that processes the content provided by end users. It has never been so easy to get information from multimodal data.

Here are several reading designs that you can learn more (including code samples):

Tax

How’s the Blog of news? Take this 1 minute survey!

(This survey is hosted by an external company. AWS processes your information as described in the AWS Privacy Notice. AWS will own data collected via this survey and will not share information with dew survey.)

Leave a Comment