# Task Completion Evaluator

### Getting Started
This sample demonstrates how to use task completion evaluator on agent data. The supported input formats include:
- simple data such as strings and `dict` describing task responses;
- user-agent conversations in the form of list of agent messages. 

Before you begin:
```bash
pip install azure-ai-evaluation
```
Set these environment variables with your own values:
1) **MODEL_DEPLOYMENT_NAME** - The deployment name of the model for this AI-assisted evaluator, as found under the "Name" column in the "Models + endpoints" tab in your Azure AI Foundry project.
2) **AZURE_OPENAI_ENDPOINT** - Azure Open AI Endpoint to be used for evaluation.
3) **AZURE_OPENAI_API_KEY** - Azure Open AI Key to be used for evaluation.
4) **AZURE_OPENAI_API_VERSION** - Azure Open AI Api version to be used for evaluation.

The Task Completion evaluator assesses whether an AI agent successfully completes the requested task by examining:
- Whether the task was fully completed
- Quality of task execution
- Appropriateness of the response to the original request

The evaluator uses a binary scoring system (0 or 1):

    - Score 0: The task was not completed or only partially completed
    - Score 1: The task was successfully and fully completed

This evaluation focuses on measuring whether the agent's response indicates successful completion of the user's request, regardless of the specific methods or tools used to achieve the task.

Task Completion requires following input:
- Query - This can be a single query or a list of messages(conversation history with agent). The original task request from the user.
- Response - Response from Agent (or any GenAI App). This can be a single text response or a list of messages generated as part of Agent Response.
- Tool Definitions - (Optional) Tool(s) definition used by Agent to answer the query. Providing tool definitions helps the evaluator better understand the context and capabilities available to the agent.


### Initialize Task Completion Evaluator


In [None]:
import os
from azure.ai.evaluation._evaluators._task_completion import _TaskCompletionEvaluator
from azure.ai.evaluation import AzureOpenAIModelConfiguration
from pprint import pprint

model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version=os.environ["AZURE_OPENAI_API_VERSION"],
    azure_deployment=os.environ["MODEL_DEPLOYMENT_NAME"],
)


task_completion = _TaskCompletionEvaluator(model_config=model_config)

### Samples

#### Evaluating Simple Task Completion

In [None]:
query = "How is the weather in Seattle?"
response = "The current weather in Seattle is partly cloudy with a temperature of 15°C (59°F). There's a light breeze from the northwest at 8 mph, and the humidity is at 68%. No precipitation is expected for the rest of the day."

# Basic evaluation without tool definitions
result = task_completion(query=query, response=response)
pprint(result)

#### Task Completion with Tool Context

In [None]:
query = "How is the weather in Seattle?"
response = "I've checked the weather for Seattle and found that it's currently partly cloudy with a temperature of 15°C. There's a light breeze and no rain expected today."

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    }
]

result = task_completion(query=query, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Task Completion with Tool Definition as Single Dict

In [None]:
query = "What's the current temperature in Boston?"
response = "The current temperature in Boston is 22°C (72°F) with clear skies. It's a beautiful day with low humidity and a gentle breeze."

tool_definition_dict = {
    "name": "fetch_weather",
    "description": "Fetches the weather information for the specified location.",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
    },
}

result = task_completion(query=query, response=response, tool_definitions=tool_definition_dict)
pprint(result)

#### Complex Task with Multiple Steps

In [None]:
query = "Can you send me an email with weather information for Seattle?"
response = [
    {
        "createdAt": "2025-03-26T17:27:35Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "I'll get the current weather information for Seattle and then send you an email with the details.",
            }
        ],
    },
    {
        "createdAt": "2025-03-26T17:27:42Z",
        "run_id": "run_zblZyGCNyx6aOYTadmaqM4QN",
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": "I have successfully sent you an email with the weather information for Seattle. The current weather is partly cloudy with a temperature of 15°C. You should receive the email shortly at your registered email address.",
            }
        ],
    },
]

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    },
    {
        "name": "send_email",
        "description": "Sends an email with the specified subject and body to the recipient.",
        "parameters": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string", "description": "Email address of the recipient."},
                "subject": {"type": "string", "description": "Subject of the email."},
                "body": {"type": "string", "description": "Body content of the email."},
            },
        },
    },
]

result = task_completion(query=query, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Query as Conversation History (List of Messages)
The evaluator also supports query as a list of messages representing conversation history. This helps evaluate task completion in the context of a full conversation.

In [None]:
# Query as conversation history instead of a single string
query_as_conversation = [
    {
        "role": "system",
        "content": "You are a helpful assistant that can fetch weather information and send emails."
    },
    {
        "role": "user", 
        "content": "Hi, I need to plan my day. Can you check the weather in Seattle for me?"
    },
    {
        "role": "user",
        "content": "Also, please send me an email summary of the weather so I can reference it later."
    }
]

response = "I've checked the weather in Seattle for you. It's currently 15°C and partly cloudy with light winds. I've also sent you an email summary with these details so you can reference them throughout your day. The task has been completed successfully."

tool_definitions = [
    {
        "name": "fetch_weather",
        "description": "Fetches the weather information for the specified location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string", "description": "The location to fetch weather for."}},
        },
    },
    {
        "name": "send_email",
        "description": "Sends an email with the specified subject and body to the recipient.",
        "parameters": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string", "description": "Email address of the recipient."},
                "subject": {"type": "string", "description": "Subject of the email."},
                "body": {"type": "string", "description": "Body content of the email."},
            },
        },
    },
]

result = task_completion(query=query_as_conversation, response=response, tool_definitions=tool_definitions)
pprint(result)

#### Example of Incomplete Task

In [None]:
query = "Can you send me an email with weather information for Seattle?"
incomplete_response = "I can see that you want weather information for Seattle. The weather there is usually quite nice this time of year, with temperatures ranging from mild to warm depending on the season."

# This response doesn't complete the email sending task
result = task_completion(query=query, response=incomplete_response, tool_definitions=tool_definitions)
pprint(result)