Observing the progression of COVID-19 across the globe has been fascinating for multiple reasons. One of them has been how it has helped re-enforce my thoughts on how we can represent vast sets of data!
COVID-19 – The “Corona Virus”
So, by now, unless you’ve literally been hiding out somewhere, with no access to internet, tv, radio, or your fellow humans, you’re likely already feeling the impact of COVID-19, also known as the Corona Virus..
What I’ve certainly witnessed has been a fascinating evolution of where in the world the emerging confirmed infections are. There are multiple layers to this data:
- Total confirmed infected
- Total confirmed recovered
- Number of deaths
- Location – ie “where” the individuals are that make up each of the above pieces of information
For each of the above sets of data, there are other demographics that I observe people as being interested in:
- Heath (eg. are there any pre-existing conditions?)
- Recent travel habits
- and so much more
Devil is in the detail – or is it ?
The devil is (often) always in the detail – an inquisitive mind often wants to get to the detail to truly understand the message of what is being represented. There are lots of factors that I observe my friends, colleagues, and family members trying to understand such as those listed above.
At the heart of the data, is a complex set of attributes (such as those indicated above) – depending on the role of the person who is inquiring, they may need more specific information in order to get the correct answer to their question.
But at a macro-level, there really is only a small set of data that many people are interested in. And with this information, most of us in fact have our thirst for information quenched.
Data Residency and Privacy
One of the problems, when you get into the detail – right to the heart of the data-sources – is that it is filled with Sensitive Personal Information (SPI) – names, addresses, phone numbers, data of birth, etc.
Such SPI is very important for certain roles, but actually is not needed for many use cases. This is an important aspect, because it allows us to keep the core, sensitive, in a set of localised data sources, which can implement their own controls over access.
When trying to represent the “macro-level” summary data, what instead is needed is the METADATA, to allows for the high level visualisations that might be needed for executive decision making; knowing that when it’s needed, that there is a way to get (controlled, and appropriate) access to the detail.
For example – to know the number of people infected with COVID-19, do we really need to know their name? or their age, or the gender? Does such knowledge really empower those of us who are not healthcare or government employees?
The benefit it keeping to the METADATA as much as possible, is that it allows you to create a centralised data lake that doesn’t fall afoul of the myriad of data residency and privacy legislations that exist (GDPR being only one of many)
In case you are interested, the site below is an incredible resource for observing the evolving picture that is COVID-19 – it has both a mobile and a desktop version of the interface.
What I love about it, is that it perfectly demonstrates the power of representing METADATA, while also pointing the viewer to the SOURCE of the information – somewhere that they can go to get more specific and detailed information, if they need (and have appropriate right) to access it.
Yes – the devil is in the detail
I want to emphasise, that information is interesting (and for me, fascinating), but it is easy to make wrong interpretations. The site above is an incredible resource, but it is only one perspective of a rapidly evolving and complex situation. To derive more complex outcomes, you likely need a combination of more specific detail (beyond the metadata), as well as knowledge and experience in the area you are investigating.
But … I have to say – isn’t it just incredible what the team have done?!
At the heart of the message is the importance of following our (collective) local government guidelines, and practicing common sense.
More than just helping in my own understanding of the evolving story that is COVID-19, the visualisation by CSSE has triggered further thoughts in me about how I can better help visualise data in a way that doesn’t cause problems with data residency legislation !
Keep washing those hands!
This post was originally authored on this blog, you can also see the corresponding LinkedIn Article here – https://www.linkedin.com/pulse/observing-covid-19-thoughts-representing-data-andrew-barnes