Systems don’t Create Data Silos, People do!


Silos are the trolls of analytic projects. Although we hope to see information seamlessly flowing and connecting, to give us that 360° holistic view on our business, we find that the landscape often ends up with data that simply don’t make sense.

Our first reaction is to shrug it off and blame silos like many generations did before us. And why not? Data is hosted in so many different systems, under so many different technologies, using so many different norms. How could it possibly come together easily?

While our assumption is technically sound, the reasoning is not -and this sort of thinking is becoming less and less valid.

Let’s explain why in 4 points!

  1. Data is data and provided it is stored in a database, it is always accessible and retrievable with little to no friction–provided you have access rights and you know the query language (most of the time SQL).
  2. The majority of the sources we complain about are transactional systems that run processes for different lines of business (production, logistics, marketing, sales etc.). All of these systems store the data they process in databases, which can be accessed directly (ETL, API etc.) or via flat file export.
  3. If data is extracted to a central place, no system hurdle should prevent the connection of this data. It is achieved by “joins” in database language and it is 101 knowledge that business analysts should master. Even if data remains in place, in the ERP, connecting the dots remains possible by stretching the analytics tool.
  4. With today’s data integration technologies, data extraction, preparation, and loading to the analytics platform has never been easier.

So what’s the root cause of the “silo-curse” if technology should have no problems handling this complexity?

The origin of the problem relies upon ourselves. We create problems by neglecting to consider data processing as a supply chain where any information has an origin and a destination. If we don’t take into consideration the upstream and downstream usage, we end up setting norms, rules, and formats that are disconnected with other data that it will need to be connected to.

Silos are created by all of us in many different ways:

  1. We don’t follow company standards when we code product, entities etc. When we want to connect the dots, these identifiers can’t be matched between sources. Discipline and proper Master Data Management (MDM) solve this problem
  2. We omit to insert keys in data we input so that we can use them for easy matching in subsequent processes. Planning and vision of what the analytics questions will be can give most of the answers on which keys should be added to in data entries for further join.
  3. We get discouraged by the different granularity levels of our data sources and don’t dare to take on a homogenization of it to a common level. A simple data integration process would solve that problem and deliver evenly aggregated data

For all these reasons, we need to stop blaming the data silos for all our difficulties to build broad analytics views. If we expose our teams to the holistic view of a data process, if we commit ourselves to being good data players by caring about our upstream and downstream colleagues, and if we dare to learn about the simple solutions that can facilitate the “dot-connection”, silos will be something of the past.