Expanding on a random tweet of mine…
Ever since we gained the ability to run multiple applications/system simultaneously, either on one machine or across machines, we’ve had the need to share data between applications. The general ledger system needs payroll data. The order processing system needs inventory data. The customer web site needs pricing data. And so it goes.
Over the decades various schemes and theories have been developed for solving this problem. Some of them were good long ago when no better solution was possible. Some of them are still good in limited scenarios. Some are the current state of the art. And as time goes on, I have no doubt someone will devise some method we haven’t considered or dreamed of yet.
One of the earlier techniques used was to have multiple applications use the same database. This allowed them to all see the same data, which made it possible for the general ledger system to get at the payroll data, and so forth. It saved costs, too, in the era where a server capable of hosting a database was expensive and difficult to maintain and administer. When only a few dozen people in a couple of accounting focused (or other similar domain) departments needed to access the data. In short, it was a really good idea from a technical and business perspective.
But then machines got faster and cheaper. More and more people needed to have direct access to data. And we created more and more applications for different groups. Some of these groups had different ideas what the different “entities” in the business domain meant. To some people, an order is a request to pull some item from inventory. To others it’s a request to generate a bill. Still others find it is historical information that can be used to determine future demand for marketing plans.
And then we figured out that if we needed the accounting folks to invoice a customer for an order, we could just put a InvoiceCustomer flag on the order table and set it to some value representing true. So we set it to “T” and hoped that they would then change it to something else when they generated the invoice. We never documented that we expected it to be “F”, we just assumed they would know.
Eventually, someone decided that we added a lot of flags. Maybe we should find a way to make it easier so we didn’t have to bother the DBAs with a new state every few weeks. So we created a new field. We called it “OrderState”. Now we decided that an order that needed to be invoiced would have a value of “NI” (for needs to be invoiced) in the “OrderState” field. But we can’t get rid of the InvoiceCustomer flag, because we’re not sure if any other application is using it.
Before long, there is a massive trail of poorly documented and specified fields. Nobody has a clear picture of why (or even if) they are required. The usage is across a bunch of applications and reports, perhaps even some applications that aren’t even being developed anymore. You don’t dare touch any of items on this trail. They are like the breadcrumbs from an old folk tale – for various reasons they disappear or degrade over time until they no longer are able to show you the way home.
No matter what the data elements, no matter how tightly you control the database, no matter how well you document it, some of these breadcrumbs will materialize. The best way to combat this problem is to have a separate data store for every application/service. That way, you can know for sure how the application makes use of the data, and be sure that no one else is using the data differently. Don’t even allow another application to have read access. Require other systems to interface with the data via your application/service using messaging, pub/sub, or web services.