Build an AI Bookkeeping Agent for Estonian OÜs (2026)

TL;DR: An AI bookkeeping agent for an Estonian OÜ should be a reasoning layer (Claude) on top of the Merit Aktiva API as the system of record, with tool calls for read and write and a human-approval gate before anything is filed or paid. Let it categorize purchase invoices, match bank transactions, draft sales invoices, pre-fill the KMD, and flag anomalies. Do not let it auto-file the VAT return, auto-pay, or post anything to the ledger without a human approving the diff. Build it read-only first, add a single write path at a time, and log every tool call.

What an agent can actually own

Bookkeeping for a small OÜ is mostly classification and matching, which is exactly what current LLMs are good at. The work that an agent can own, end to end, with a human only reviewing the output:

Categorizing purchase invoices. Given a supplier, line items, and amount, the agent picks the expense account and VAT rate (24% standard, 9%, 0%, or exempt) and proposes the entry. Recurring suppliers become near-deterministic after a few examples.
Matching bank transactions to invoices. The agent reconciles a bank statement line against an open invoice by amount, date, and reference. Most lines match cleanly; the agent surfaces only the ambiguous ones.
Drafting sales invoices. From a deal record, a contract, or a short instruction, it produces a draft invoice with the right customer, VAT treatment, and payment terms, ready for a human to send.
Pre-filling the KMD (käibedeklaratsioon). It assembles the period's VAT figures from booked transactions and produces a draft return for review. This is a pre-fill, never an auto-submit.
Flagging anomalies. Duplicate invoices, a VAT rate that does not match the supplier's usual rate, a round-number transaction with no matching document, an expense that looks personal. The agent is good at "this is unusual, look here."

What stays human, every time: approving the KMD before it goes to EMTA, releasing any payment, and posting anything irreversible to the ledger. The agent prepares; a person commits. That line is the whole design.

Architecture: reasoning layer over a system of record

Four parts, in order of importance.

System of record: Merit Aktiva. The accounting truth lives in Aktiva, not in the agent and not in a spreadsheet. The agent never holds state that the books do not. Everything it does is a proposed change to Aktiva via the API. We covered the API surface in depth in the Merit Aktiva API guide for AI agents; this post assumes that as the data layer.

Reasoning layer: Claude. The model classifies, drafts, and reconciles. It does not have direct database access. It only sees what you put in the prompt and it can only act through tools you define. Claude's tool-use (function calling) is the mechanism: you describe each tool as a JSON schema, the model decides which to call and with what arguments, your code executes the call, and the result goes back into the conversation.

Orchestration: n8n or an MCP server. Something has to trigger the agent (new invoice in the inbox, nightly bank sync), pass data in, execute the tool calls, and route proposals to a human. For most OÜ-scale setups n8n is the pragmatic choice, because it already speaks HTTP, has Merit and email nodes, and gives you a visual audit of each run. The Merit Aktiva automation comparison walks through Make vs n8n vs a direct-API agent. An MCP server is the cleaner option if you want the same Aktiva tools available to multiple agents and to Claude directly.

Approval gate. The non-negotiable component. Any write that affects the ledger, a filing, or money pauses and waits for a human yes/no, with the proposed change shown as a readable diff. No gate, no production.

Tool design: read-only first

Define narrow, single-purpose tools, not one "do_accounting" tool. Each tool maps to one Aktiva operation and has its own schema. Start with read-only tools and ship those before you write a single write path.

A minimal read toolset (Aktiva API; verify exact paths and field names against current docs):

list_purchase_invoices(start_date, end_date) -> GET on the purchase-invoice list endpoint
get_bank_transactions(account, start_date, end_date)
list_open_sales_invoices()
get_vat_rates()

A tool schema as Claude sees it:

{
  "name": "categorize_purchase_invoice",
  "description": "Propose an expense account and VAT rate for a purchase invoice. Returns a proposal only; does not post anything to Merit Aktiva.",
  "input_schema": {
    "type": "object",
    "properties": {
      "supplier_name": { "type": "string" },
      "supplier_reg_no": { "type": "string" },
      "total_amount": { "type": "number" },
      "currency": { "type": "string", "default": "EUR" },
      "line_descriptions": {
        "type": "array",
        "items": { "type": "string" }
      }
    },
    "required": ["supplier_name", "total_amount", "line_descriptions"]
  }
}

Note the description says "Returns a proposal only." That is deliberate. In the read-only phase, even the "categorize" tool just returns structured JSON your orchestrator stores. Nothing reaches Aktiva until you add a separate, gated write tool.

Merit Aktiva uses an API key per company plus a signed request. Treat the key as you would a payment credential: one key per environment, scoped, rotated, never in the prompt. Authentication mechanics and the full endpoint list are in the API guide.

A worked build: invoice in, proposal out

The first useful loop is purchase-invoice categorization with a human approval at the end.

n8n triggers on a new invoice (email attachment, e-invoice, or a Merit webhook).
An OCR or document-parsing step extracts supplier, amounts, and line items into structured fields.
n8n calls Claude with the invoice data and the tool definitions. The system prompt states the rules: Estonian VAT rates, the OÜ's chart of accounts, and "you propose, you never post."
Claude returns a tool call, for example categorize_purchase_invoice with the account and VAT rate it chose, plus a one-line rationale.
n8n formats the proposal and sends it to a human (Slack, email, or an n8n form) with Approve / Edit / Reject.
On Approve, and only then, n8n calls the Aktiva write endpoint to create the purchase invoice. On Reject, it logs the reason, which becomes a future few-shot example.

The Claude API call, simplified:

curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-opus-4-8",
    "max_tokens": 1024,
    "system": "You are a bookkeeping assistant for an Estonian OU. Estonian standard VAT is 24%. You propose entries; you never post to Merit Aktiva. Always call a tool, never free-text an account number.",
    "tools": [ /* schemas above */ ],
    "messages": [
      { "role": "user", "content": "Categorize this invoice: { ... extracted fields ... }" }
    ]
  }'

The write step lives entirely in your orchestrator, after the human approves. Claude never gets a write tool that fires without the gate in between.

Guardrails that make this safe

These are not optional hardening for later. They are the reason an OÜ owner can sleep with this running.

Read-only first, one write path at a time. Run the agent for a few weeks proposing entries that a human enters manually. Compare its proposals to what the bookkeeper actually did. Only when the match rate is boring do you wire up the first write tool, and only that one.

An approval gate on every irreversible action. Posting a ledger entry, sending an invoice, submitting the KMD, and releasing a payment all stop for a human. Show the diff in plain language: "Create purchase invoice, supplier X, 1 200 EUR, account 4010, VAT 24%." A human approving a clear diff in five seconds is the whole safety model.

Never auto-file the KMD. The agent pre-fills the return and explains its numbers. A human submits it to EMTA. The deadline is the 20th of the following month, and the person, not the agent, owns that submission. The KMD automation guide covers how to automate the assembly without crossing into auto-submission.

Never auto-pay. Categorizing and even booking a purchase invoice is fine. Paying it is a separate, human-only decision. Keep payment out of the agent's tool list entirely.

Audit log everything. Every tool call, input, output, model version, and the approve/reject decision goes to an append-only log. When an accountant or the tax authority asks why an entry exists, you can show the exact proposal, who approved it, and when. This is also what lets you debug the agent: most "the AI got it wrong" cases are bad input data, visible only in the log.

Constrain, do not trust. The agent should never free-text an account number; it picks from your actual chart of accounts passed in the prompt. It should never invent a VAT rate; it picks from the real list. Validation in your orchestrator rejects any tool call with an account or rate that does not exist, before it ever reaches a human.

Honest limits in 2026

This is genuinely useful and genuinely not autonomous. The agent will misclassify unusual invoices, especially one-off suppliers and mixed-purpose expenses where the right account depends on intent it cannot see. Reverse-charge and cross-border EU VAT remain the hardest cases and need human judgment more often than domestic ones. OCR on bad scans still produces garbage that the model will confidently categorize, which is why the audit log and the approval gate matter more than the model.

The right framing is not "fire your accountant." It is: the agent removes the mechanical 80% (clean recurring invoices, obvious matches, first-draft VAT returns) so the human spends their time on the 20% that actually needs a brain, plus the five-second approvals. For a small OÜ that is the difference between bookkeeping being a weekend chore and a few minutes a day.

FAQ

Can an AI agent fully replace a bookkeeper for an Estonian OÜ? No, and you should not build it to. It can own categorization, matching, drafting, and VAT pre-fill, but a human must approve filings and payments and handle the judgment-heavy edge cases. The agent reduces the work; it does not remove the accountable person.

Which LLM should the reasoning layer use? Claude (Anthropic) is a strong fit because of reliable tool/function calling, which is the entire mechanism here. The model's job is to choose the right tool with the right structured arguments, not to write prose, so calling discipline matters more than raw eloquence.

Is it safe to give an AI agent write access to Merit Aktiva? Only behind an approval gate and only one write path at a time, after a read-only trial period. Scope the API key, validate every tool call against your real chart of accounts and VAT rates, and never put payment or KMD submission in the agent's reach.

Can the agent submit the KMD VAT return automatically? No. It should pre-fill the return and explain its figures, but a human submits it to EMTA by the 20th. Auto-submission removes the accountable human from a legal filing, which is exactly the line this design refuses to cross.

Do I need n8n, or can I use MCP? Either works. n8n is the pragmatic choice for a single OÜ because of its built-in HTTP, Merit, and email nodes and visual run history. An MCP server is cleaner if you want the same Aktiva tools shared across multiple agents or available to Claude directly.

What is the first thing to build? Read-only purchase-invoice categorization with a human approval step. It is the highest-volume, lowest-risk task, it produces immediate value, and the proposals you approve or reject become the training examples that make the rest of the agent better.

If you want this built and wired into your Merit Aktiva account with the guardrails in place, that is exactly the kind of system Aluslabs builds for Estonian companies.