# Tool Input Accuracy Evaluator

### Getting Started
This sample demonstrates how to use tool input accuracy evaluator on agent data. The supported input formats include:
- simple data such as strings and `dict` describing tool calls;
- user-agent conversations in the form of list of agent messages. 

Before you begin:
```bash
pip install azure-ai-evaluation
```
Set these environment variables with your own values:
1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
2) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
3) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
4) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.

The Tool Input Accuracy evaluator performs a strict binary evaluation (PASS/FAIL) of parameters passed to tool calls. It ensures that ALL parameters meet ALL criteria:

- Parameter grounding: All parameters must be derived from conversation history/query
- Type compliance: All parameters must match exact types specified in tool definitions
- Format compliance: All parameters must follow exact format and structure requirements
- Completeness: All required parameters must be provided
- No unexpected parameters: Only defined parameters are allowed

The evaluator uses strict binary evaluation:

    - 1: Only when ALL criteria are satisfied perfectly for ALL parameters
    - 0: When ANY criterion fails for ANY parameter

This evaluation focuses on ensuring tool call parameters are completely correct without any tolerance for partial correctness.

Tool Input Accuracy requires following input:
- Query - This can be a single query or a list of messages(conversation history with agent). The original task request from the user.
- Response - Response from Agent (or any GenAI App). This can be a single text response or a list of messages generated as part of Agent Response. The evaluator extracts tool calls from the response.
- Tool Definitions - Tool(s) definition used by Agent to answer the query. Required to validate parameter types and structures.


### Initialize Tool Input Accuracy Evaluator


In [None]:
import os
from azure.ai.evaluation._evaluators._tool_input_accuracy import _ToolInputAccuracyEvaluator
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)


tool_input_accuracy = _ToolInputAccuracyEvaluator(model_config=model_config)

### Samples

#### Evaluating Correct Tool Input Parameters

In [None]:
query = "How is the weather in Seattle?"
response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_weather_123",
                "name": "fetch_weather",
                "arguments": {"location": "Seattle"},
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
            "required": ["location"]
        },
    }
]

result = tool_input_accuracy(query=query, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Response as String (str)

In [None]:
query = "Check the weather in Miami"

# Response as a simple string containing tool call information
response_str = """I'll check the weather for you. Calling fetch_weather with location="Miami"."""

tool_definition = {
    "name": "fetch_weather",
    "description": "Fetches the weather information for the specified location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        "required": ["location"]
    },
}

result = tool_input_accuracy(query=query, response=response_str, tool_definitions=tool_definition)
pprint(result)

#### Response as a List and Tool Definition as Single Dict

In [None]:
query = "What's the temperature in Boston?"
response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_boston_weather",
                "name": "fetch_weather",
                "arguments": {"location": "Boston"},
            }
        ],
    }
]

tool_definition_dict = {
    "name": "fetch_weather",
    "description": "Fetches the weather information for the specified location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        "required": ["location"]
    },
}

result = tool_input_accuracy(query=query, response=response, tool_definitions=tool_definition_dict)
pprint(result)

#### Complex Tool Parameters with Multiple Fields

In [None]:
query = "Can you send an email to john@example.com with the subject 'Weather Update' and tell him the weather in Seattle is 15°C and partly cloudy?"
response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_email_456",
                "name": "send_email",
                "arguments": {
                    "recipient": "john@example.com",
                    "subject": "Weather Update",
                    "body": "The weather in Seattle is 15°C and partly cloudy."
                },
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "send_email",
        "description": "Sends an email with the specified subject and body to the recipient.",
        "parameters": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string", "description": "Email address of the recipient."},
                "subject": {"type": "string", "description": "Subject of the email."},
                "body": {"type": "string", "description": "Body content of the email."},
            },
            "required": ["recipient", "subject", "body"]
        },
    }
]

result = tool_input_accuracy(query=query, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Query as Conversation History (List of Messages)
The evaluator also supports query as a list of messages representing conversation history. This helps validate parameters are grounded in the conversation context.

In [None]:
# Query as conversation history instead of a single string
query_as_conversation = [
    {
        "role": "system",
        "content": "You are a helpful assistant that can fetch weather information and send emails."
    },
    {
        "role": "user", 
        "content": "Hi, can you check the weather in Seattle for me?"
    },
    {
        "role": "user",
        "content": "Please send the results to admin@company.com with the subject 'Daily Weather Report'."
    }
]

response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_weather_123",
                "name": "fetch_weather", 
                "arguments": {"location": "Seattle"},
            }
        ],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_email_456",
                "name": "send_email",
                "arguments": {
                    "recipient": "admin@company.com",
                    "subject": "Daily Weather Report",
                    "body": "Weather information for Seattle as requested."
                },
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
            "required": ["location"]
        },
    },
    {
        "name": "send_email",
        "description": "Sends an email with the specified subject and body to the recipient.",
        "parameters": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string", "description": "Email address of the recipient."},
                "subject": {"type": "string", "description": "Subject of the email."},
                "body": {"type": "string", "description": "Body content of the email."},
            },
            "required": ["recipient", "subject", "body"]
        },
    },
]

result = tool_input_accuracy(query=query_as_conversation, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Example of Incorrect Tool Parameters

In [None]:
query = "How is the weather in Seattle?"
# Missing required parameter or wrong type
incorrect_response = [
    {
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "call_weather_123",
                "name": "fetch_weather",
                "arguments": {"city": "Seattle"},  # Wrong parameter name (should be "location")
            }
        ],
    }
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
            "required": ["location"]
        },
    }
]

# This should score 0 due to incorrect parameter name
result = tool_input_accuracy(query=query, response=incorrect_response, tool_definitions=tool_definitions)
pprint(result)