I’ve been thinking a lot recently about remnants in language — speech patterns and phrases we still use long after we have forgotten their original purpose. (You guessed it: my studies on the formation and change of human language are still going strong.)
We repeat the same phrases without any real understanding of their original meaning. You might describe a cruel individual as "ruthless", even though the word "ruth" has fallen out of common parlance. Somehow these things live on despite their original use having been lost. Like junk DNA … or the appendix.
This made me think about historical remnants that live on today within data management.
Remnants in data management? How did that happen?
Remember those heady days when physical tape was the de facto standard medium for data transfer? What about when we had to use obtuse, condensed binary formats to make our data fit onto any available tape or file system.
The data was just too large to do anything else with it. All we ever did with that data was load it into software bought from a service company.
How did companies let this happen? Why was it okay to give away access to the data, to let it be controlled by service companies? If you see value in this data (and you certainly pay enough to acquire it), why not retain the ability to read it?
Parsing the problem
For the last few days I have been writing a parser for DLIS files — log files created by sensors used to monitor petroleum wells. Yes, seriously.
In the oil industry, we have a standard that we use for some really important — and really expensive — log data from wells. That standard is not very easy to read.
The DLIS format — stuck in 1991 as it is — is the digital embodiment of a hangover from a less technologically capable time. Technically, it is an open industry standard, officially owned by Energistics. The only documentation that seems to exist is the original RPC66 V1 documentation, which is far from easy to understand. As far as I can see, an open-source parser for this format is not readily available. In essence, you have to pay someone in order to read these files.