As our (business) world revolves now mainly around data, stored in various systems and files, that is being passed around (internally and externally), a lot of repetitive manual office tasks include copying, pasting, verifying, cleaning and preparing data.

For example:

  • transferring orders from one system to the other
  • receiving price updates from suppliers in Excel files, which need to be imported in your own system.
  • receiving invoices in PDF, which need to be processed into the accounting system
  • and many more

This not only makes for boring, repetitive office work, but also opens the door to errors (which can end up being costly), and creates bottlenecks in your processes, hurting the company’s growth and/or client experience.

If some of your tasks are making you feel like a robot, read further to understand how automating this task could save you – and your business – time (which can be spent on more valuable tasks), reduce errors, and increase efficiency of your processes.

What does ETL stand for?

ETL stands for “Extract, Transform and Load”. It is not new – just a fancy name for what has been done since the 1970s.

At high level, the goal of an ETL process is to find ways to take data from one system (source) and put it into another (destination), somehow transforming it along the way, so that the new system is able to make sense of it and work with it.

As the name suggests, ETL follows 3 steps:

1. Extract

The first step includes extracting data from the original system, which can be an external database, a CRM, ERP, marketing platform, etc..

Extraction can also be done from files. Excel and CSV files are the ideal candidates, as they hold the data in a structured format (with rows and columns).

It can also be done from “unstructured” files, like PDFs (think invoices), as long as the PDF has some structure to it (eg tables where the data is always presented in the same way). A big caveat though with PDF files, is that extraction can only be guaranteed accurate if the PDF has been generated by a computer (not scanned). We will dedicate a later post to this topic.

2. Transform

The next step includes transforming the data into a unique format, to match the structure and format of the new system in which you want to put it. This data transformation can be diverse, including but not limited to the following actions:

  • Removing unnecessary columns
  • Appending new ones (for example, with the name of the person that uploaded the data)
  • Cleaning the data (for example, removing records that do not fit certain criteria)
  • Formatting the data (for example, putting all dates in the proper format or ensuring that all fields are of the type expected by the target system)

3. Load

Step number three includes loading the data in the target (destination) system. In a manual process, this is done by importing a properly formatted file (CSV or Excel) within the system/software. In an automated way, this can be done by having the ETL tool connect directly to the system to push (or post) the data to it.

“Telling” the machines to communicate with each other and do the task themselves is by far the most effective way to transfer data between 2 systems.

This is what is meant by “integration”.

Integrating Systems with APIs

With the rise of APIs, integrating systems has become easier over the years.

We’ll discuss APIs in more detail in another blog post. But, to keep it simple, here is a short API explanation:

APIs are “doors” to a system – by system, we mean a computer, with data in it – which can be accessed by another system (computer).

Those doors act as:

  • Translators. The API’s documentation tells the programmer how their computer needs to be “talked” to, and as:
  • Gatekeepers. They give access only to authorised computers.

Therefore, if a system has an API, it has become relatively straightforward to integrate it into another or to merge two or more systems together, and thus to automate the ETL process.

However, we need to keep in mind the following points:

  1. Not every system has an API (a “door”). Legacy (read: “old” or conventional) systems might not have one yet, and some probably never will.
  2. Integrating systems, even with an API available, requires work to set up and to maintain. You or your team might be left to do mind-numbing daily/weekly manual ETL work because doing a full-blown system integration just to alleviate you from this one task (albeit boring and repetitive) is not a priority or financially relevant.

Popular ETL Tools & Integration Platforms

There is an explosive need to transfer data between multiple systems. Developer of ETL tools have been very active in the last decade, to create solutions that would address those needs.

The ETL market now provides a number of platforms that have done the work of integrating with as many systems (apps) as possible, to facilitate the automation for businesses.

Well-known examples include:

Zapier: Basic ETL for SMEs

Zapier is a leader in self-service workflow automation. It is an effective platform for anyone looking to integrate web applications without the need to code anything. Zapier has been around since 2012, is based in San Francisco and boasts integrations with 1,500+ apps.

The ETL platform provides free tiers for a basic workflow that runs every 15 minutes. Otherwise you’re looking at $20-$300+/month, depending on your needs.

Zapier is limited to passing data “as is” between systems. Consequently, you’ll have to look elsewhere if your ETL process is more complex (for example, if it requires data transformation).

https://zapier.com

Enterprise ETL Solutions

If you are on the search for an enterprise solution, with needs spanning beyond what is offered with Zapier’s more “consumer-oriented” solution, specific integration solutions exist. To name a couple:

Mulesoft

Founded in 2006 in San Francisco, Mulesoft is focusing on the needs of enterprise clients. The company was floated on the NYSE stock market in 2017, but it has been delisted since, in 2018.

MuleSoft provides a widely used integration platform for connecting SaaS and enterprise applications in the cloud and on-premises.

As it is often the case with enterprise solutions, the company doesn’t publish pricing, If you want to know how much will this ETL tools cost you, you will need to ask for a quote specifically for your business.

https://www.mulesoft.com

Tray.io

Tray.io was founded in 2012, also in San Francisco (obviously, no one got the “Hey, someone else is doing that already!” memo). The automation company adopted a similar approach to Zapier. but, in contrast, it was focused on enterprise clients.

This ETL tool (or platform) starts at $600/month for 2 workflows. It is more powerful than its rival as it enables connecting to any API-based system and transforming data, as well as a number of other ETL processes that can automate tasks and simplify boring office work for you.

https://tray.io

Elastic.io

Founded in 2013 in Germany (had to showcase some European success here!), Elastic.io provides an integration platform that helps connect multiple company systems together. It offers a cloud and an on-premise approach (a handy solution if you are a business that needs to protect data by keeping it in your network).

Elastic.io packages start at 2,388 EUR/year for system integrators and 17,988 EUR/year for enterprises.

https://www.elastic.io

Many more ETL tools exist in the market, some of them extremely specialised (for example between particular types of data warehouses).

OfficeBots: The Alternative ETL Solution

Don’t want to get your hands dirty diving into integrations yourself?

Or perhaps your systems are not included in the list of integrations supported by the platforms available on the market?

Or you are extracting data from a file – like Excel, CSV or PDF?

This is where OfficeBots can help.

We can set up a Bot for you that:

  • Works off the Excel, CSV or PDF file (generated by the source system or received from a 3rd party like client or supplier)
  • Extracts the data
  • Transforms it by formatting or cleaning
  • Loads it at your chosen location and/or
  • Provides you a new file, ready to be imported.

As OfficeBots get their own email address assigned, once they’re set up, you only need to send your file to its email for the ETL process to happen automatically.

Alternatively, we can program the Bot to fetch the data on its own, either via API (if available) or via a web-based system (whereby the Bot acts as if it was a person accessing the system and clicking where it needs to get the data).

Some legacy systems enable setting up a scheduled email with the data attached (Excel or CSV file). Simply set it up to send the email to your OfficeBot, and you have a fully automated ETL process.

Feel free to contact us for any ETL-related question. You can book a free audit to discuss your automation needs. We are happy to guide you in the right direction if OfficeBots are not the right solution for you.