Expense Tracker Bot - Upsonic AI

A Telegram bot built with Upsonic’s AutonomousAgent that reads receipt photos via OCR, extracts structured data, and logs expenses to a CSV file in its workspace. The agent’s behavior (how to parse receipts, what CSV columns to use, how to handle duplicates) is defined entirely in AGENTS.md, not in code.

Overview

The setup has three parts:

AutonomousAgent with a workspace directory and one custom tool (ocr_extract_text)
TelegramInterface in CHAT mode for conversational context
Workspace files (AGENTS.md, SOUL.md) that define the agent’s behavior and identity

The agent handles CSV creation, writing, duplicate checking, and monthly summaries on its own through workspace filesystem access. The only custom tool is OCR, because the agent can’t read images natively.

Project Structure

expense_tracker_bot/
├── main.py              # AutonomousAgent + TelegramInterface
├── tools.py             # OCR extraction tool
├── requirements.txt     # upsonic[ocr], anthropic, etc.
└── workspace/
    ├── AGENTS.md        # Behavior: receipt workflow, CSV schema, rules
    ├── SOUL.md          # Identity and personality
    ├── expenses.csv     # Created by agent at runtime
    └── memory/          # Daily session logs

Environment Variables

ANTHROPIC_API_KEY=your-api-key
TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_WEBHOOK_URL=https://xxxx.ngrok-free.app

Installation

cd examples/autonomous_agents/expense_tracker_bot
uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

Create a Telegram bot via @BotFather, then start ngrok:

ngrok http 8000

Usage

python main.py

The server starts on 0.0.0.0:8000 and registers the Telegram webhook.

Message	What happens
Photo of a receipt	OCR reads text, agent parses and saves to `expenses.csv`
”summary” or “this month”	Agent reads CSV and returns category breakdown
`/reset`	Clears conversation context

How It Works

Component	Role
AutonomousAgent	Reads workspace files, manages CSV, handles all logic
`ocr_extract_text`	The only custom tool: EasyOCR reads receipt images
AGENTS.md	Defines receipt workflow, CSV format, duplicate rules, summary logic
SOUL.md	Agent identity and personality
TelegramInterface	Webhook-based chat with conversation memory

Flow

User sends a receipt photo in Telegram
Agent calls ocr_extract_text (auto-detects the image path)
Agent parses OCR output following rules in AGENTS.md: converts dates, normalizes amounts, picks a category
Agent reads expenses.csv to check for duplicates, then appends the new row
Agent replies with a short confirmation and monthly running total

Complete Implementation

main.py

import os
from dotenv import load_dotenv
from upsonic import AutonomousAgent
from upsonic.interfaces import InterfaceManager, TelegramInterface, InterfaceMode

from tools import ocr_extract_text

load_dotenv()

agent = AutonomousAgent(
    model="anthropic/claude-sonnet-4-5",
    tools=[ocr_extract_text],
    workspace=os.path.join(os.path.dirname(__file__), "workspace"),
)

telegram = TelegramInterface(
    agent=agent,
    bot_token=os.getenv("TELEGRAM_BOT_TOKEN"),
    webhook_url=os.getenv("TELEGRAM_WEBHOOK_URL"),
    mode=InterfaceMode.CHAT,
    reset_command="/reset",
    parse_mode="Markdown",
)

manager = InterfaceManager(interfaces=[telegram])
manager.serve(host="0.0.0.0", port=8000)

No system prompt, no hardcoded behavior. The agent reads everything from its workspace.

tools.py

import glob
import os
import tempfile

from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine
from upsonic.tools.config import tool


def _find_latest_telegram_image() -> str | None:
    """Find the most recently created telegram_media temp file."""
    tmp_dir = tempfile.gettempdir()
    candidates = glob.glob(os.path.join(tmp_dir, "telegram_media_*"))
    if not candidates:
        return None
    return max(candidates, key=os.path.getmtime)


@tool
def ocr_extract_text(image_path: str = "") -> str:
    """Extracts text from receipt/invoice photos sent by the user."""
    if not image_path or not os.path.isfile(image_path):
        discovered = _find_latest_telegram_image()
        if discovered:
            image_path = discovered
        else:
            return "ERROR: No image found to process. Please send a photo again."

    try:
        engine = EasyOCREngine(languages=["tr"], gpu=False, rotation_fix=True)
        ocr = OCR(layer_1_ocr_engine=engine)
        result = ocr.process_file(image_path)
    except Exception as e:
        return f"ERROR: OCR operation failed: {e}"

    lines = []
    total_confidence = 0.0
    block_count = 0

    for block in result.blocks:
        text = block.text.strip()
        if not text:
            continue
        conf = block.confidence
        total_confidence += conf
        block_count += 1
        lines.append(f"[{conf:.0%}] {text}")

    if block_count == 0:
        return "OCR could not detect any text. The image may not be clear."

    avg_confidence = total_confidence / block_count
    output = f"=== OCR Result (Average Confidence: {avg_confidence:.0%}) ===\n"
    output += "\n".join(lines)

    if avg_confidence < 0.6:
        output += (
            "\n\nWARNING: OCR confidence score is low (<60%). "
            "Results may be incorrect, ask the user for confirmation."
        )

    return output

One tool. Auto-detects Telegram media files, runs EasyOCR, returns text with confidence scores.

Workspace: AGENTS.md

The key to this example. Instead of hardcoding CSV logic in Python, the agent reads its instructions from AGENTS.md:

Receipt workflow: call OCR, parse output, check duplicates, save, confirm
CSV schema: columns, types, format rules (dates as YYYY-MM-DD, amounts as floats)
Summary logic: group by category, compute percentages, show totals
Rules: always use ocr_extract_text for images, never delete data files

Change the CSV schema or add new categories by editing AGENTS.md. The agent adapts without touching code.

Notes

OCR language is set to Turkish (tr). Change the languages parameter in tools.py for other languages.
Install with upsonic[ocr] to get EasyOCR and its dependencies.
The agent has full filesystem access within the workspace but no shell access (enable_shell defaults to disabled for this setup).

Repository

View the full example: Expense Tracker Bot

Documentation Index

​Overview

​Project Structure

​Environment Variables

​Installation

​Usage

​How It Works

​Flow

​Complete Implementation

​main.py

​tools.py

​Workspace: AGENTS.md

​Notes

​Repository