Azure AI Agent Service SDKを使用してマルチエージェントAIを実現する

Microsoft Ignite 2024で、マルチエージェントAIをAzure基盤上で実現できる機能が発表されました。

本記事では、Azure AI Agent Service SDKをPythonで使用する方法について、具体的なコードを交えながら解説します。

概要
前準備
手順
おわりに

概要

Azure AI Agent Service は、マルチエージェントをAzure AI Service上で実現することのできるサービスです。単一のエンドポイントに対して要求を送ることで、内部で最適なツールを選択して回答生成を行わせることができます。

例えば1つのエンドポイントのみで、外部から情報を取得したり、Code Interpreterを使用したりといったことを、AIに自動で選択させることができます。

learn.microsoft.com

本機能はAssistants APIをベースに実現されています。

1つのAssistantを作成するとIDが発行されて、Code Interpreterやファイル検索などを活用することができましたが、そこにAgent用の機能が加わったようなイメージが近いと思います。スレッドIDや実行をrunで管理する点など、Assistants APIを触った経験があれば、より理解しやすくなるはずです。

以下、コードを交えながら内容解説していきます。

前準備

ローカルPC

Pythonを実行する環境に以下のパッケージをインストールしておきます。

※ 本記事ではPython3.11環境を使用しています。

pip install azure-ai-projects
pip install azure-identity

Azure Portal

Azure Portalより、Azure AI Foundryを検索してProjectを新規デプロイします。

※ Azure AI FoundryのProjectデプロイ時、AI Hubを新規作成します。この紐付けにより、AI FoundryのProjectから、Azure Open AI Serviceやストレージアカウント、Application Insightを使用することができます。

プロジェクトを開き、メニューの[概要]から、プロジェクト接続文字列をコピーしておきます。

次に[モデル＋エンドポイント]を開きます。ここでChatGPTなどのモデルをデプロイすることができます。今回はgpt-4o-miniを事前にデプロイしておきます。

Entra ID認証

credencialについてはアプリの接続を想定しています。検証の場合は、個人のID（DefaultAzureCredential）でも問題ありません。

Azure Portalから所属しているテナントのEntra IDの設定を開き、「アプリの登録」を実施します。

作成したアプリから、ディレクトリ (テナント) ID、アプリケーション (クライアントID）を控えておきます。

[管理]-[証明書とシークレット]より、任意期限で有効なシークレットを作成しておきます。

手順

接続

AI FoundryのプロジェクトをSDKから使用するためには、以下の通り接続を設定しておく必要があります。

from azure.ai.projects import AIProjectClient
from azure.identity import ClientSecretCredential, DefaultAzureCredential

credential = ClientSecretCredential(
    tenant_id="", # テナントID
    client_id="", # アプリケーション (クライアントID）
    client_secret="" # 作成したアプリのシークレット
)

# 補足：個人で認証できる場合は credential = DefaultAzureCredential() でもOK

project_client = AIProjectClient.from_connection_string(
    credential=credential,
    conn_str="" # プロジェクト接続文字列
)

Agentに使用するツールの定義

AgentにはBing検索を実施させたり、AI Searchのインデックスを検索させたり、外部APIから情報を取得したり、といった役割を与えます。そのために外部のAPIと連携する必要がありますが、それを「ツール」という形で定義します。

from azure.ai.projects.models import (
    ToolSet,
    ToolOutput,
    MessageTextContent,
    SubmitToolOutputsAction,
    FunctionTool,
    RequiredFunctionToolCall,
    CodeInterpreterTool,
    BingGroundingTool
)

toolset = ToolSet() # この後、toolsetに各種Toolをaddしていきます

FunctionTool

FunctionToolは、自身で定義したプログラムをAgentのToolとして定義します。

関数の上部コメントに記載した情報は、開発者向けの情報という意味合いのみならず、実際のAIに使用されるという仕様に注意する必要があります。

description（「外部のAPIに接続するサンプルです。」を記載している箇所）などのコメント部分は、ユーザーからの要求時にどのToolを使用するか判断する際に使われるため、AIに読み込ませることを考えて丁寧に記載します。

以下は最小限でFunctionToolを定義する例です。

※ APIのURLやキーは空文字で仮置きしていますが、環境変数などから読み込む形にするとよいでしょう。

def search_outside_api(keyword):
    """
    外部のAPIに接続するサンプルです。

    :param keyword:  入力をこのように:param [変数名]で定義して、情報を付加します。
    """

    api_url = ""
    api_key = "" 
    params = {
        "api_key" : api_key
    }
    response = requests.get(base_url, params=params)
    return response.json()

定義した関数をtoolsetに追加します。toolsetには複数のToolを追加していくことができ、最終的にはそれらがAgentに使用されます。

user_functions: Set[Callable[..., Any]] = {
   search_outside_api
}
functions = FunctionTool(user_functions)
toolset.add(functions)

コメントで記載した情報は、以下の通り、プログラムとして格納されていることを確認できます。

print(functions.definitions)

出力例

[{'type': 'function', 'function': {'name': 'search_outside_api', 'description': '外部のAPIに接続するサンプルです。', 'parameters': {'type': 'object', 'properties': {'keyword': {'type': 'string', 'description': '入力を:param [変数名]で定義します'}}, 'required': ['keyword']}}}]

azure.ai.projects.modelsには、Azure FunctionsやAzure AI Searchなどと接続するためのクラスも用意されていますが、FunctionToolが汎用性が高いため、これをカスタマイズして使用する方法が有効です。

具体的なuser functionsの定義方法やコメントの書き方は、公式リポジトリのサンプルソースを参考にします。 github.com

CodeInterpreterTool

Assistants APIで使用できたCode Interpreterも、AgentのToolとして活用できます。

code_interpreter = CodeInterpreterTool()
toolset.add(code_interpreter)

BingGroundingTool

Bing検索を実施するツールですが、APIキーが必要になるため、あらかじめBingリソースをAzureで作成しておきます。

URLは「https://api.bing.microsoft.com/」の固定値で、キーはデプロイしたBingリソースの[リソース管理]-[キー]より取得したものを使います。

管理センターからBingをプロジェクトに接続しておきます。[+新しい接続]より、APIキーを選択し、URLとキーは上記のものを使用します。（接続名は任意で決めます。）

以下のコードを使用します。connection_nameの値は上記で設定した任意の接続名です。

bing_connection = project_client.connections.get(
    connection_name=""
)
conn_id = bing_connection.id
bing = BingGroundingTool(connection_id=conn_id)
toolset.add(bing)

注意点として、Grounding with Bing SearchはGPT-4o-miniなどのモデルをサポートしていません。2024年12月末日時点では、gpt-3.5-turbo-0125, gpt-4-0125-preview, gpt-4-turbo-2024-04-09, gpt-4o-0513であれば使用可能です。

最新の情報は以下のリファレンスを参考にしてください。

learn.microsoft.com

Agentの作成

上記で定義したToolsetを使用してAgentを作成できます。

Agent名、使用するモデルと、Agentに与える役割のプロンプトをそれぞれ定義してAgentを作成します。

ここで、上記手順で作成してきた複数のtoolを追加したtoolsetを定義することによって、Agentは最適なtoolを選択して、ユーザーからの質問に回答する挙動を作り出すことができます。

agent_name = "Original Assistant"
model_name="gpt-4o-mini"
instructions="""
あなたはAIアシスタントです。
"""

agent = project_client.agents.create_agent(
    model=model_name,
    name=agent_name,
    instructions=instructions,
    toolset=toolset,
    headers={"x-ms-enable-preview": "true"},
)

print(f"Created agent, agent ID: {agent.id}")

発行されるAgent IDは、Assistant IDに相当します。

assistant_id = agent.id

作成済みAgentの一覧は以下のコードで取得できます。

agents = project_client.agents.list_agents()
for agt in agents.data:
    print(agt.name, 'ID:', agt.id)

Agentが不要になった際は、以下のコードで削除します。

project_client.agents.delete_agent(agt.id)

Agentの使用

Agentを使用するために、以下のコードを使用します。

def search_aiagent_api(
    assistant_id, 
    query,
    thread_id = None
):
    if thread_id is None:
        print("create new thread")
        thread = project_client.agents.create_thread()
    else:
        thread = project_client.agents.get_thread(thread_id=thread_id)
    messages = project_client.agents.list_messages(thread_id=thread.id)

    message = project_client.agents.create_message(
        thread_id=thread.id,
        role="user",
        content=query,
    )
    print(f"Created message, ID: {message.id}")

    run = project_client.agents.create_run(thread_id=thread.id, assistant_id=assistant_id)
    print(f"Created run, ID: {run.id}")

    while run.status in ["queued", "in_progress", "requires_action"]:
        time.sleep(1)
        run = project_client.agents.get_run(thread_id=thread.id, run_id=run.id)

        if run.status == "requires_action" and isinstance(run.required_action, SubmitToolOutputsAction):
            tool_calls = run.required_action.submit_tool_outputs.tool_calls
            if not tool_calls:
                print("No tool calls provided - cancelling run")
                project_client.agents.cancel_run(thread_id=thread.id, run_id=run.id)
                break

            tool_outputs = []
            for tool_call in tool_calls:
                if isinstance(tool_call, RequiredFunctionToolCall):
                    try:
                        print(f"Executing tool call: {tool_call}")
                        output = functions.execute(tool_call)
                        response = {"answer": output, "success": True}
                        tool_outputs.append(
                            ToolOutput(
                                tool_call_id=tool_call.id,
                                output=json.dumps(response, ensure_ascii=False),
                            )
                        )
                    except Exception as e:
                        print(f"Error executing tool_call {tool_call.id}: {e}")

            print(f"Tool outputs: {tool_outputs}")
            if tool_outputs:
                project_client.agents.submit_tool_outputs_to_run(
                    thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs
                )

        print(f"Current run status: {run.status}")

    print(f"Run completed with status: {run.status}")
    print(f"Run error logs: {run.last_error}")
    print(f"Run consumption tokens: {run.usage}")
    return run, thread.id



def get_aiagent_api_message(run, thread_id):
    run_steps = project_client.agents.list_run_steps(run_id=run.id, thread_id=thread_id)
    run_steps_data = run_steps['data']
    final_text = None
    if len(run_steps_data)>1:
        if "bing_grounding" not in list(run_steps_data[1]["step_details"]["tool_calls"][0].keys()):
            print("never userd bing search.")
        else:        
            encoded_text = run_steps_data[1]["step_details"]["tool_calls"][0]["bing_grounding"]["requesturl"]
            decoded_text = bytes(encoded_text, 'latin1').decode('utf-8')
            final_text = urllib.parse.unquote(decoded_text)   
    
    message_texts = []
    messages = project_client.agents.list_messages(thread_id=thread_id)
    for data_point in reversed(messages.data):
        last_message_content = data_point.content[-1]
        if isinstance(last_message_content, MessageTextContent):
            message_texts.append(f"{data_point.role}: {last_message_content.text.value}")
    return message_texts, final_text

任意のAgentから取得できるID（Assistant ID）と、チャットメッセージを入力してレスポンスを取得できます。

assistant_id = ""
run, thread_id = search_aiagent_api(assistant_id,"この部分にユーザーのチャットを記載します。")
messages, final_txt = get_aiagent_api_message(run, thread_id)
for message in messages:
    print(message)

# bing検索を実施している場合は、final_txtにbingの検索結果が入ります
print(final_txt)