Invoice Processing Automation: challenge, options and solution.

Receiving invoices by email from suppliers and manual invoice processing, is a burden (and bottleneck) in a lot of, if not most companies.

The OfficeBots platform is designed to easily build and maintain Bots, to perform repetitive back-office tasks, with each Bot being assigned its own email address to interact with.

As such, OfficeBots are perfect helpers to alleviate this mundane, resource and time consuming, yet critical task.

Before diving into what OfficeBots can do, let us first take a step back though.

Automating the processing of inbound invoices was actually the first robotic process automation (RPA) project I developed in my previous startup, Convolus.

Before making the decision to embark on building our own solution though, I first thought I could buy a cheap and quick off-the-shelf solution to help us do just that.

Well, it’s not that easy…

Managing the flow of incoming invoices to process can be a challenge

First things first, what’s the challenge?

In an ideal world, processing invoices for payment would be straight-through, ie invoices being processed without any human intervention (assuming the system would ensure no errors).

In reality, there are several invoice to payment processing steps that are required, each of which can represent a significant amount of time spent on manual, repetitive tasks:

  1. receiving the invoice
  2. extracting the data from the invoice
  3. checking that it is correct, based on the business’ own criteria
  4. getting it approved by the relevant stakeholder(s)
  5. keying the data and input into the accounting system
  6. paying the invoice

Each step is critical in its own way.

Ideally, in the first two steps, businesses would send each other invoices in a format that can be immediately ingested by the other party’s accounting system.

That actually exists.

Receiving invoices: Electronic Invoicing (or e-invoicing)

Proper electronic invoicing (e-invoicing) uses formats like EDI or XML.

Note: if acronyms and understanding underlying technologies trigger a blank stare in you, feel free to jump to the next section.

EDI: Electronic Data Interchange

EDI is the “computer to computer exchange, between two companies, of standard business documents in electronic format”. And because things would be boring if, for once, there would be a single format, there are actually different formats, like ANSI X12 (US) and EDIFACT (EU).

It originated almost 50 years ago!

And since you might not have heard of it, it can be said that it never achieved to become a mainstream solution.

EDI is widely used in high-volume industries though, such as automotive, retail, manufacturing, healthcare, utility and construction.

There are a lot of software supporting EDI.

EDI was standardised by committees wanting to ensure that ANY type of information sharing between two businesses was following exact specifications. Obviously businesses are quite unique, and the lack of flexibility of the format is forcing people to try and get square pegs in round holes.

Here is an example of how it looks like:

Here is an example:


With EDI data is not easily parsed (ie analysed into logical syntactic components), and written. Each piece of data is labelled, but in a rather cryptic way.

The full example can be seen here.

EDI is a legacy format, not future-proofed.

UBL: Universal Business Language, a XML-based specification

XML is a widely-used, generic standard for storing and sharing data. As such it is extremely flexible. Meaning you can customise to your needs.

Various specifications exist (with some inbreeding between some), designed to help specifically with exchanging business information, like OIO, UBL, cXML, ebXML, etc..

UBL is the one set of specification using the XML language commonly accepted as the “standard”.

It is a royalty free library of standard XML electronic business documents, first released in 2004.

The latest UBL specification was published in July 2018, and continues to be updated regularly (next publication expected in December 2019).

Here is an example:


Data is cleanly structured, labelling every piece of information, so it can be recognised by the recipient’s system.

From contact details:


to amounts:


and bank details:


The full example can be seen here.

UBL can be integrated in any software. A lot of accounting software already have it (so you can export an invoice as UBL instead of PDF for example).

UBL files can be sent as attachment in an email, or over a secure network.

UBL is the format of choice for many European governments. It uses PEPPOL (Pan-European Public Procurement OnLine) to enable companies to send their UBL documents (eg invoices) securely.

With e-invoicing, no “one format to rule them all”

With EDI and XML-based solutions, both parties need to have compatible software to enable a seamless exchange of information (invoices in our case here).

The benefits of proper electronic invoicing are clear:

  • 100% accuracy
  • 0% human error
  • Immediacy
  • Faster payment (UBL invoices are paid 16 days faster on average)
  • No/less (wo)manpower required
  • Less costs

Large enterprises are able to impose to their suppliers the use of a specific format, as a sine qua non criteria to work with them.

But that leaves the “long tail” of businesses, ie 90% of companies, in a jungle of solutions to pass invoices to each other.

While UBL seems to be the most future-proof solution for mainstream e-invoicing, I believe that blockchain-based solutions will also emerge to try and solve once and for all this issue of standardising the exchange of invoices between businesses.

But as EDI and XML-based solutions have shown over the last decades, it is a mammoth task to try bringing structure to the jungle.

Receiving invoices: PDF, the mainstream way to share invoices

So, in the absence of mainstream e-invoicing, most invoices businesses receive are in the form of PDFs.

PDF file icon

You might know that PDF (Portable Document Format) is a file format invented by Adobe (and now an open standard), with the goal to ensure that a document has the same layout (“looks the same”), no matter what computer or device it is opened on.

So sending an invoice as a PDF file is mainly to replace sending it via post (snail mail), while ensuring it “looks good”.

What’s less known, is that a PDF can be one of two things.

  • Computer-generated: meaning the PDF file has been created by a software (like an accounting system, or even just Microsoft word)
  • A scanned document: meaning a scanner was used to “photograph” a document.

We’ll dedicate a future, more detailed post about that topic, but at high level, the same looking invoice..

Invoice Sample

.. can either be a PDF file with the data embedded as text which can be read by a computer with 100% accuracy (if computer-generated), like that:

Conceptual representation of a text layer embedded in a computer-generated PDF

or simply an image contained in the PDF, with nothing for a computer to “read”:

Conceptual representation of a scanned PDF with just an image included

.. in which case OCR (Optical Character Recognition) can be applied, whereby dedicated software will try to decipher the pixels in the image and extract letters and numbers:

Conceptual representation of a scanned PDF with OCR applied

This means though that accuracy cannot be guaranteed at 100%.

So what should be $662.75 can be “recognised” by the software as $662750 !

See the dot that merges with the 2? Not easy for a “dumb” computer to see it as a separator, rather than just the continuation of the number 2.

Depending on the technology used by the software, advances in OCR (quality of recognition) combined with Artificial Intelligence (ie if line items are in units and amounts in the hundreds, then result should be in the hundreds too, not thousands) are improving the results though.

So we’re inching closer to 100% accuracy with OCR, but it cannot be guaranteed.

Error handling becomes a critical component of an automated invoice process where OCR is involved.

Invoice processing software can be complex, inadequate and expensive

So assuming you do not work with e-invoicing (UBL or XML) but rather receive PDF invoices from your suppliers, how can you streamline the process of extracting the data from these documents, checking it, and inputting it in your system(s)?

There are plenty of tools for invoice processing that focus on outbound invoices (for you to generate invoices) and plenty of solutions to help with ingesting your inbound invoices.

To name but just a few, Xerox, Basware, Kofax, Coupa, ProcessFlows, Corcentric, Medius, Redro and Conciliator.

There are also industry-specific solutions, like Fourth in hospitality, where automated invoice processing is part of a wider platform that tackles other automations relevant to businesses in that industry.

You will have to engage them directly to get a better understanding of what they offer.

Some are US-based, which means data processing happens exclusively on US servers. If data privacy is a hot topic for you as a EU-based company, this can be a dealbreaker.

Pricing-wise you will most of the time see the addition of a setup fee (ca. £15-20k, one time, in my experience) and a monthly fee, which will depend on invoice types and volumes most of the time (ca. £1k-5k/month).

And of course, general purpose RPA (Robotic Process Automation) tools, like UiPath, Automation Anywhere or Blue Prism, are also great solutions for automating invoice processing… if you have the volume and means though.

With an average cost of acquisition of £50k (the Total Cost of Ownership being much higher) and the need for dedicated internal resources to setup and operate a RPA solution, they remain the remit of large enterprises. I have put together an overview of the RPA landscape if you want to know more.

The challenge for all those software vendors is to build a toolbox, that can fit the needs of anyone. While a lot of aspects (eg extracting data from files) can be similar for all, businesses can have unique needs and challenges.

This means paying for a full toolbox where you use only certain tools within it, while trying to fit square pegs in round holes, or building custom add-ons atop, to address your unique needs.

Thus those tools can be an inadequate solution, based on price, server location, capabilities, or requirement to dedicate valuable in-house resources to make the system work for you.

One alternative, and sometimes the best way, is to use a custom invoice processing solution that will perfectly fit your business needs.

Here is an example of automated invoice processing, that we have implemented and that can be achieved using a custom OfficeBot:

The Problem we needed to solve

The company operates in the aviation industry.

It receives hundreds of PDF invoices per month, for costs relating to the operation of an aircraft (ie airport services / FBO, fuel, etc..). The invoice processing checklist goes something like this:

  • copying and pasting the data from computer-generated PDF files. Or typing manually the scanned outliers.
  • checking if the aircraft was indeed at that airport on that date
  • checking if the price charged is correct
  • engage the supplier if an error is flagged during those checks
  • copying & pasting of the data in the company’s database
  • import the data in the accounting system

Checks are critical as mistakes from suppliers do happen. And paying for a service which was not rendered, or at the wrong price (weirdly enough most often higher than expected 🤔), impacts directly the bottom line. While assigning a cost to the wrong aircraft or date will generate significant headaches and costs (possibly lack of revenue) down the line.

If an issue is encountered, the invoice needed to be raised to the supplier, and the issue monitored and followed-up with.

Finally, the data needed to be uploaded in a database (Postgresql), itself linked to an accounting system.

Task required several (wo)man-days of manual work per month and was hard to scale as new aircraft were onboarded.

All the solutions explored, including most of the ones cited above in this article, were not adequate, either because:

  • too expensive
  • not able to handle the custom checks required (eg checking aircraft location, pricing from a database using a complex lookup formula)
  • not able to populate the data in a proprietary database
  • data processed in the US

To note here is that the problem included an invoice approval process solely based on data (as long as pricing was correct and the aircraft was at the airport, the invoice could be passed on to be imported into the company’s accounting system), rather than a requirement for someone in leadership to approve manually each invoice. Your own invoice approval checklist might be different.

The Solution: a custom invoice management system

We built a solution that answered all of the custom requirements, automatically:

  • extracting data from the PDF
  • checking with an external database via API if the aircraft was indeed at that airport on that date
  • checking with an internal database if the price is correct
  • create an internal ticket and notify the team if invoice needs to be looked into manually and disputed with the provider
  • update the company’s database
  • provide data for ingestion by the accounting system, with keying of each line item
  • rename the original PDF (for consistency) and save it for archive

Suppliers were given the bot’s email address to send their invoices to, moving forward, along the lines of:

Depending on the outcome of the checks:

if error: data was loaded to database with a flag, our team informed so that they could engage the vendor for dispute, and a Kanban-tool (Trello) updated for overview and follow-up.

if no error: data loaded to database, with notification triggered to Slack, and the data exported to the accounting system.

The overall workflow looked something like this:

Invoice Processing Flowchart

Further automation, like the ability to identify if the invoice could be candidate for VAT reclaim, is possible.

Outsource or Automate? Choose both.

When trying to streamline the resource-intensive task of invoice processing, you can either scout the market for the “off-the-shelf” available solution, outsource the task, or decide to have something custom-built for your needs.

The steps for processing invoices can vary from one company to the other.

If your needs are specific, and don’t fit in the fixed frame of existing solutions, a custom invoice processing platform can be the solution to free up your team from “busy work” and get everyone focused on higher value tasks.

Building custom bots around our client’s requirements is our focus at OfficeBots. This enables to match exactly the steps in accounts payable processing particular to your business.

Outsource the setup and maintenance of your bots – work with bots.

This approach tackles multiple challenges of delegating simple, non-value adding tasks – let expert people complete the automation learning curve and allow bots to follow through with the legwork.

  1. No need to learn software
  2. No disruption of your existing processes
  3. You get your own outsourced technical team

So you can have a solution that fits your own invoice processing procedures, without the headache and resources required to scout, try, buy and implement an off-the-shelf solution, to realise it does not fit your (sometimes evolving) needs.

If you are interested in exploring further, you can contact us here.