Accounting & LLMs?

Original post:

I’ve tried everything from Excel with the ‘new’ paid copilot through MS Autogen that I’d hoped would generate python to do what I wouldn’t trust it to, on the quantitative side; for context, I’m not an accountant, but a self-employed real estate broker and entrepreneur in Canada with a finance/mba background.

“In Canada” has at least two implications, but the more significant one from a tax perspective is our requirement to file depreciation according to CCA (Capital Cost Allowance) for all capital assets, which requires identifying them in the ledger and not counting them as expenses, as well as depreciating them strictly according to the CRA’s published d-rates.

This implies that, if you are intending to ever need the same information (that is, the underlying expense and depreciated assets–which may or may not even exist as depreciable assets on your books from your point of view), then you have to take the entire headache and do it all over again.

The main stumbling block, then, is clearly, the requirement of the A.I. agent you are employing for this (and really, the only way I can see it feasible is if you empower it with agentic abilities to research each and every entry in your general ledger to determine if it pertains to an expense/capital asset or not, and if so, to determine which CRA (our version of the US’ IRS) category it falls under, which is not so straightforward–even for humans–and further, to consider within your stated strategic goals how you’d like to look at it: are you considering it an expense or a depreciable asset, and if an asset, which method to depreciate it you’re using and other assumptions you’ll need it to research, such as useful life and salvage value; and if not, then which expense category it should fall under, which may differ to the CRA-allowable ones (which for entrepreneurs, for example, is likely a Form T2125 category).

This, at every step, is not a straightforward ask, even of a human bookkeeping assistant, and if you’re not a one-person operation, gets even more contentious if you have a manager responsible for that portion of the business and wants to consider it in a wholly-different way.

And we haven’t even considered the issue of revenue recognition yet, right?

So, while I’ve found the extant set of LLM “ecosystems” to be useful only up to the point of giving you general advice on an approach, based on a general understanding of your situation, and I only even looked into automating with autogen/taskweaver because of a CRA request that involved going back to actual scanned/emailed invoices (which I used an appsumo-purchased tool called “Wellybox” to extract from over 10 email addresses and other sources, which it did flawlessly, giving them to me individually-linked by URL and in a spreadsheet that was easily pivot-tabled by vendor but any and every LLM I’ve tried to involve thus far to do something even as simple as, for every Amazon invoice, to look up the order number and from the list of items, break down into a best-guess category, and been foiled by everything from the seeming increasing laziness of closed-source models to not being able to put that successfully into a meaningful prompt form, but I have not been able to get any LLMs to take any stabs at that actual categorization.

Maybe if anyone out there has managed to successfully get models to get this far and also successfully even attempt at categorizing, I’d be interested to find out where my prompts were lacking….?

The other issue that I’ve run into that has essentially negated the benefit of handing any of the work to a model due to having to manually intervene enough times it is actually more expeditious to do the entire thousands of transactions manually in the first place, has been the reconciliation of actual vendor names to invoices and further to credit card/banking statements, which often leads to multiple (ie. more than three) references to the same vendor by as many different apparent names, and the models may have no way to reconcile these meaningfully, even if I tell them that ‘on a given day, if you find no match by vendor name, then look at the approximate value of the USD-CAD-converted amount for that day and coimpare it to all invoices dated on that day, as there is generally only one transaction that will be a match (CRA requires the expense to be stated in CAD calculated for the neaerest banking day at the posted tables of Bank of Canada USD-CAD exchange rates, which considering credit card foreign transaction fees + different exchange rates, will always deviate from the credit card’s transaction amount by more than 2%, and will even always deviate from the invoice itself by at least the foreign exchange surcharge, but almost certainly the exchange rate, too, since the Bank of Canada is not a retail consumer bank; yet, we are required to calculate expenses according to their exchange rate on the closest banking day–the VLOOKUP works for the ‘nearest banking day,’ but cannot compensate in any meaningful way for difference in exchange rate and/or credit card surcharge for a given card on a given date. Even if you are American, you will run into this issue with non-US vendors and suppliers, but perhaps over 75-80% of my relevant transactions are subject to this layer of obfuscation that none of the models has successfully navigated to-date.

It WOULD be useful if textgen webui had an interface cognizant of/potentially useable with ms autogen/taskweaver/that type of agentic system, because I feel that any real-world applications in the LLM direction will increasingly suggest agentic involvement, even at the smallest requirement of taking into account external information, but increasingly more relevant ad hoc information that can’t be anticipated in this kinds of complex dynamic systems as we live in will make it an imperative to perform research and based on that research, perform analyses that may then suggest further research before taking any final action or commitment to a decision…