# Intent Resolution Evaluator

### Getting Started

This sample demonstrates how to use Intent Resolution Evaluator
Before running the sample:
```bash
pip install azure-ai-projects azure-identity azure-ai-evaluation
```
Set these environment variables with your own values:
1) **PROJECT_CONNECTION_STRING** - The project connection string, as found in the overview page of your Azure AI Foundry project.
2) **MODEL_DEPLOYMENT_NAME** - The deployment name of the AI model, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
3) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
4) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
5) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.
6) **AZURE_SUBSCRIPTION_ID** - Azure Subscription Id of Azure AI Project
7) **PROJECT_NAME** - Azure AI Project Name
8) **RESOURCE_GROUP_NAME** - Azure AI Project Resource Group Name


The Intent Resolution evaluator measures how well an agent has identified and resolved the user intent.
The scoring is on a 1-5 integer scale and is as follows:

  - Score 1: Response completely unrelated to user intent
  - Score 2: Response minimally relates to user intent
  - Score 3: Response partially addresses the user intent but lacks complete details
  - Score 4: Response addresses the user intent with moderate accuracy but has minor inaccuracies or omissions
  - Score 5: Response directly addresses the user intent and fully resolves it

The evaluation requires the following inputs:

  - Query    : The user query. Either a string with a user request or a list of messages with previous requests from the user and responses from the assistant, potentially including a system message.
  - Response : The response to be evaluated. Either a string or a message with the response from the agent to the last user query.

There is a third optional parameter:
  - ToolDefinitions : The list of tool definitions the agent can call. This may be useful for the evaluator to better assess if the right tool was called to resolve a given intent.

### Initialize Intent Resolution Evaluator


In [None]:
import os
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from azure.identity import DefaultAzureCredential
from azure.ai.evaluation import IntentResolutionEvaluator
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)

intent_resolution_evaluator = IntentResolutionEvaluator(model_config)

### Samples

#### Evaluating query and response as string

In [None]:
#Success example. Intent is identified and understood and the response correctly resolves user intent
result = intent_resolution_evaluator(query="What are the opening hours of the Eiffel Tower?",
                                     response="Opening hours of the Eiffel Tower are 9:00 AM to 11:00 PM.",
                                    )
pprint(result)

In [None]:
#Failure example. Even though intent is correctly identified, the response does not resolve the user intent
result = intent_resolution_evaluator(query="What is the opening hours of the Eiffel Tower?",
                                     response="Please check the official website for the up-to-date information on Eiffel Tower opening hours.",
                                    )
pprint(result)

#### Evaluating query and response as list of messages

In [None]:
query = [
    {
        "role": "system",
        "content": "You are a friendly and helpful customer service agent."
    },
    {
        "createdAt": "2025-03-14T06:14:20Z",
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Hi, I need help with the last 2 orders on my account #888. Could you please update me on their status?"
            }
        ]
    }
]

response = [
    {
        "createdAt": "2025-03-14T06:14:30Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "Hello! Let me quickly look up your account details."
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:14:35Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_001",
                "name": "get_orders",
                "arguments": {
                    "account_number": "888"
                }
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:14:40Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_001",
        "role": "tool",
        "content": [
            {
                "type": "tool_result",
                "tool_result": "[{ \"order_id\": \"123\" }, { \"order_id\": \"124\" }]"
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:14:45Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "Thanks for your patience. I see two orders on your account. Let me fetch the details for both."
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:14:50Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_002",
                "name": "get_order",
                "arguments": {
                    "order_id": "123"
                }
            },
            {
                "type": "tool_call",
                "tool_call_id": "tool_call_20250310_003",
                "name": "get_order",
                "arguments": {
                    "order_id": "124"
                }
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:14:55Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_002",
        "role": "tool",
        "content": [
            {
                "type": "tool_result",
                "tool_result": "{ \"order\": { \"id\": \"123\", \"status\": \"shipped\", \"delivery_date\": \"2025-03-15\" } }"
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:15:00Z",
        "run_id": "0",
        "tool_call_id": "tool_call_20250310_003",
        "role": "tool",
        "content": [
            {
                "type": "tool_result",
                "tool_result": "{ \"order\": { \"id\": \"124\", \"status\": \"delayed\", \"expected_delivery\": \"2025-03-20\" } }"
            }
        ]
    },
    {
        "createdAt": "2025-03-14T06:15:05Z",
        "run_id": "0",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "The order with ID 123 has been shipped and is expected to be delivered on March 15, 2025. However, the order with ID 124 is delayed and should now arrive by March 20, 2025. Is there anything else I can help you with?"
            }
        ]
    }
]

#please note that the tool definitions are not strictly required, and that some of the tools below are not used in the example above and that is ok.
#if context length is a concern you can remove the unused tool definitions or even the tool definitions altogether as the impact to the intent resolution evaluation is usual minimal.
tool_definitions = [
    {
        "name": "get_orders",
        "description": "Get the list of orders for a given account number.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_number": {
                    "type": "string",
                    "description": "The account number to get the orders for."
                }
            }
        }
    },
    {
        "name": "get_order",
        "description": "Get the details of a specific order.",
        "parameters": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "The order ID to get the details for."
                }
            }
        }
    },
    {
        "name": "initiate_return",
        "description": "Initiate the return process for an order.",
        "parameters": {
            "type": "object",
            "properties": {
                "order_id": {
                    "type": "string",
                    "description": "The order ID for the return process."
                }
            }
        }
    },
    {
        "name": "update_shipping_address",
        "description": "Update the shipping address for a given account.",
        "parameters": {
            "type": "object",
            "properties": {
                "account_number": {
                    "type": "string",
                    "description": "The account number to update."
                },
                "new_address": {
                    "type": "string",
                    "description": "The new shipping address."
                }
            }
        }
    }
]

result = intent_resolution_evaluator(query            = query,
                                     response         = response,
                                     tool_definitions = tool_definitions,
                                    )
pprint(result)

### Evaluating an agent conversation loaded from disk

In [None]:
import json
from azure.ai.evaluation import AIAgentConverter

def load_conversations(filename):
    with open(filename, 'r') as file:
        lines = file.readlines()
        parsed_conversations = [json.loads(line) for line in lines]
    print(f"Loaded {len(parsed_conversations)} conversations from {filename}.")
    return parsed_conversations

conversations_filename = r'sample_synthetic_conversations.jsonl'

#this loads 90 conversations from the file sample_synthetic_conversations.jsonl
sample_conversations = load_conversations(conversations_filename)

#get the first conversation from the loaded conversations
conversation = sample_conversations[10]

run_ids = AIAgentConverter.run_ids_from_conversation(conversation)
print(f"Run IDs in conversation: {run_ids}")
run_id = str(run_ids[0]) # convert run_id to string in case it is some other type, e.g. an int
converted_conv = AIAgentConverter.convert_from_conversation(conversation, run_id)
# Extract the query and response from the conversation
query = converted_conv['query']
response = converted_conv['response']
tool_definitions = converted_conv['tool_definitions']

print(f"Run ID: {run_id}")
print(f"Query: {query}")
print(f"Response: {response}")
print(f"Tool Definitions: {tool_definitions}")

result = intent_resolution_evaluator(query = query, response = response, tool_definitions = tool_definitions)
print(f"Evaluation result")
pprint(result)

# Putting it all together and evaluate an entire conversation run by run

In [None]:
def evaluate_conversation_run(conversation : dict, run_id : str, verbose=False):
    converted_conv = AIAgentConverter.convert_from_conversation(conversation, str(run_id))
    # Extract the query and response from the conversation
    query = converted_conv['query']
    response = converted_conv['response']
    tool_definitions = converted_conv['tool_definitions']
    
    if verbose:
        print(f"*********************************************")
        print(f"Evaluating conversation run with ID: {run_id}")
        print(f"Run ID: {run_id}")
        print(f"Query: {query}")
        print(f"Response: {response}")
        print(f"Tool Definitions: {tool_definitions}")

    # Evaluate the query and response using the intent resolution evaluator
    evaluation_result = intent_resolution_evaluator(query = query, response = response, tool_definitions = tool_definitions)
    if verbose:
        print(f"Evaluation Result:")
        pprint(evaluation_result)

    return evaluation_result

def evaluate_conversation(conversation, verbose=False):
    run_ids = AIAgentConverter.run_ids_from_conversation(conversation)
    print(f"Runs available in conversation: {run_ids}")
    results = []
    for run_id in run_ids:
        result = evaluate_conversation_run(conversation, str(run_id), verbose)
        results.append(result)
    return results

sample_conversation = sample_conversations[20]
evaluate_conversation(sample_conversation, verbose=True)