# Tool Success Evaluator

### Getting Started
This sample demonstrates how to use tool success evaluator on agent data. The supported input formats include:
- simple data such as strings and `dict` describing agent responses;
- user-agent conversations in the form of list of agent messages. 

Before you begin:

```bash
pip install azure-ai-evaluation
```
Set these environment variables with your own values:
1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
2) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
3) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
4) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.

The Tool Success evaluator determines whether tool calls done by an AI agent includes failures or not.

This evaluator focuses solely on tool call results and tool definitions, disregarding user's query to the agent, conversation history and agent's final response. Although tool definitions is optional, providing them can help the evaluator better understand the context of the tool calls made by the agent. Please note that this evaluator validates tool calls for potential technical failures like errors, exceptions, timeouts and empty results (only in cases where empty results could indicate a failure). It does not assess the correctness or the tool result itself, like mathematical errors and unrealistic field values like name="668656".

Scoring is binary:
- TRUE: All tool calls were successful
- FALSE: At least one tool call failed

This evaluation focuses on measuring the technical success of tool execution, not the semantic correctness of the results.

Tool Success requires following input:
- Response - Response from Agent (or any GenAI App). This can be a single text response or a list of messages generated as part of Agent Response. The evaluator examines tool call results within the response.
- Tool Definitions - (Optional) Tool(s) definition used by Agent. Providing tool definitions helps the evaluator better understand the context of the tool calls made by the agent.


### Initialize Tool Success Evaluator


In [None]:
import os
from azure.ai.evaluation._evaluators._tool_success import _ToolSuccessEvaluator
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)


tool_call_success = _ToolSuccessEvaluator(model_config=model_config)

### Samples

#### Evaluating Successful Tool Calls

In [None]:
# Successful tool execution
successful_response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_weather_123",
                "name": "fetch_weather",
                "arguments": {"location": "Seattle"},
            }
        ],
    },
    {
        "tool_call_id": "call_weather_123",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": {"temperature": "15°C", "condition": "partly cloudy", "humidity": "68%", "wind": "8 mph NW"}}],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "The current weather in Seattle is partly cloudy with a temperature of 15°C.",
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    }
]

result = tool_call_success(response=successful_response, tool_definitions=tool_definitions)
pprint(result)

#### Response as String (str)

In [None]:
# Response as a simple string containing tool call information
# This format is less common but still valid for the evaluator
response_str = """Tool call executed: fetch_weather(location="Paris")
Tool result: {"temperature": "18°C", "condition": "clear sky", "humidity": "55%"}
The weather in Paris is currently clear with a temperature of 18°C."""

tool_definition = {
    "name": "fetch_weather",
    "description": "Fetches the weather information for the specified location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
    },
}

result = tool_call_success(response=response_str, tool_definitions=tool_definition)
pprint(result)

#### Tool Definition as Single Dict

In [None]:
# Successful tool execution with single tool definition dict
successful_response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_nyc_weather",
                "name": "fetch_weather",
                "arguments": {"location": "New York"},
            }
        ],
    },
    {
        "tool_call_id": "call_nyc_weather",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": {"temperature": "22°C", "condition": "sunny", "humidity": "45%"}}],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "The weather in New York is currently sunny with a temperature of 22°C.",
            }
        ],
    }
]

tool_definition_dict = {
    "name": "fetch_weather",
    "description": "Fetches the weather information for the specified location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
    },
}

result = tool_call_success(response=successful_response, tool_definitions=tool_definition_dict)
pprint(result)

#### Multiple Successful Tool Calls

In [None]:
# Multiple successful tool executions
multiple_successful_response = [
    {
        "createdAt": "2025-03-26T17:27:35Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
                "name": "fetch_weather",
                "arguments": {"location": "Seattle"},
            }
        ],
    },
    {
        "createdAt": "2025-03-26T17:27:37Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "tool_call_id": "call_CUdbkBfvVBla2YP3p24uhElJ",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": {"temperature": "14°C", "condition": "rainy", "precipitation": "light rain"}}],
    },
    {
        "createdAt": "2025-03-26T17:27:38Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_iq9RuPxqzykebvACgX8pqRW2",
                "name": "send_email",
                "arguments": {
                    "recipient": "your_email@example.com",
                    "subject": "Weather Information for Seattle",
                    "body": "Current weather: 14°C and rainy with light rain.",
                },
            }
        ],
    },
    {
        "createdAt": "2025-03-26T17:27:41Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "tool_call_id": "call_iq9RuPxqzykebvACgX8pqRW2",
        "role": "tool",
        "content": [
            {"type": "tool_result", "tool_result": {"message": "Email successfully sent to your_email@example.com.", "status": "delivered"}}
        ],
    },
    {
        "createdAt": "2025-03-26T17:27:42Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "I have successfully sent you an email with the weather information for Seattle.",
            }
        ],
    },
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    },
    {
        "name": "send_email",
        "description": "Sends an email with the specified subject and body to the recipient.",
        "parameters": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string", "description": "Email address of the recipient."},
                "subject": {"type": "string", "description": "Subject of the email."},
                "body": {"type": "string", "description": "Body content of the email."},
            },
        },
    },
]

result = tool_call_success(response=multiple_successful_response, tool_definitions=tool_definitions)
pprint(result)

#### Tool Call Success without Tool Definitions

In [None]:
# Successful execution without providing tool definitions
simple_successful_response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_calendar_123",
                "name": "get_calendar",
                "arguments": {"start_date": "2025-01-01", "end_date": "2025-01-31"},
            }
        ],
    },
    {
        "tool_call_id": "call_calendar_123",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": {"events": [{"date": "2025-01-15", "title": "Team Meeting"}, {"date": "2025-01-20", "title": "Project Review"}]}}],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "I found 2 events in your calendar for January: a Team Meeting on the 15th and a Project Review on the 20th.",
            }
        ],
    }
]

# Evaluation without tool definitions
result = tool_call_success(response=simple_successful_response)
pprint(result)

#### Example of Failed Tool Call

In [None]:
# Failed tool execution with error
failed_response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_weather_456",
                "name": "fetch_weather",
                "arguments": {"location": "InvalidCity"},
            }
        ],
    },
    {
        "tool_call_id": "call_weather_456",
        "role": "tool",
        "content": [{"type": "tool_result", "tool_result": {"error": "Location not found", "status": "failed", "code": 404}}],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "I'm sorry, I couldn't retrieve the weather information due to an error.",
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    }
]

# This should score FALSE due to tool failure
result = tool_call_success(response=failed_response, tool_definitions=tool_definitions)
pprint(result)