OK so what’s a data structure scientist I hear you ask! Yes, we have all heard of that “sexy” data scientist role but perhaps never of the data structure scientist. You might imagine it to be a role demanding significant technical know-how; well nothing could be further from the truth in fact. Let me share.
A data structure scientist is a business person who unlocks business value from having conversations with their business data in order to draw pictures about that data.
No that’s not a typo, I do in fact say ‘having conversations with their data’! So being a data structure scientist is as much about being an artist as it is a scientist.
Some things are often best explained with an example, so let me ‘show’ you what I mean. Imagine you are the HR manager of a business and you are interested in your employee roles. To get this picture of employees and their roles you query the data on the HR system. Now you know you currently have 100 employees but the result of your query returns 135 employees. Something is wrong! It’s obvious you asked the wrong how many question of the data! So a quick call to your techie colleague and you ask them to check this out for you and to your surprise you get confirmation that “135 employees” is in fact the correct number. BUT your IT colleague clarifies, with perhaps a sense of “I thought you would already know this”, while the business has only 100 employees, the HR system has 135 employee data records because some employees perform multiple roles in the business!
So you ask yourself, why is the HR data telling us we have 135 employees and not 100? Enter your need to be a data structure scientist.
In short, if some of the 100 employees perform multiple roles (as confirmed by your IT colleague) and you have 135 data records, it is most likely that you have redundancy in your data caused by an inappropriate data structure. Figure 1 below is the most likely structure that exists. The assumption built into this data structure is that an employee performs only 1 role. However, to work-around this inappropriate assumption, because the business needs employees to perform multiple roles, another instance of an employee data record will be created, for an already existing employee, to accommodate an additional role. As it happens if this employee also performs a further role, then a 3rd instance of an employee data record will be created.
So how do you overcome this data structure problem and remove the undesirable data redundancy? The answer is captured in Figure 2 where you introduce an associative entity (Employee Role). You need to introduce a more dynamic data structure that is appropriate for the business model and the logic that an employee can perform 1 or many roles. This associative entity now provides the appropriate structure where an employee is matched with their role or roles over time.
The business value of this data structure (Figure 2) is very simple. When you now ask the how many employees question, you will get 100 employees, but 135 Employee Role data records will also exist to cater for an employee performing 1 or many roles.
So the next time you spot an anomaly in your business data (for example the wrong answer to a how many question), have a conversation with your data in order to draw a picture about the data structure. This behaviour will present a challenge to the IT side of the house and start you on a journey of treating your business data as an asset!
Dr.David Sammon and Dr. Tadhg Nagle are programme directors on the IMI/UCC MSc in Data Business. Dave is a Senior Lecturer in Business Information Systems at University College Cork and Tadhg is a Lecturer in Business Information Systems at University College Cork.