Interview with Steve Mac Feely, Chief Statistician, UNCTAD
By Maya Plentz, Editor in Chief | The UN Brief
Road to Bern
Preparatory Conference for the UN World Data Forum
Maya Plentz – I was reading your papers and watching a very interesting presentation you gave a couple of years ago at the European Network of Statistics Professionals. I would like today to speak about the Road to Bern series of events, which was led by the Permanent Mission of Switzerland to the UN Office, in Geneva. What is the main take away, what did you think, based on what we discussed and in your published papers, what do you see is the way forward? I think you participated in a couple of events before?
Steve Mac Feely – I was the keynote speaker in the previous one. There are four Road to Bern sessions in Geneva. Different organizations have hosted these events, the first was hosted jointly by the World Health Organisation and World Meteorological Organisation, they invited me to give a keynote about Big Data and data collection.
I spoke about the changing world that statisticians find themselves with regards to big data, and the challenges and threats, but also the opportunities, that this new kind of ecosystem introduces.
Each session has a different underlying theme; it’s a very interesting process because usually the UN World Data Forum just happens, but the way the Swiss government have organized it is quite nice, they have these sessions in advance to whet your appetite and stir a bit of debate and get people ready for the WDF itself. It’s been a very good process.
MP – Yes, one can see what is happening, what are the main trends that are being discussed, particularly because the private sector now has so much power in terms of data collection on citizens all over the world, and these platforms and private sector actors are also working with international organizations, so how do International Organisations (IOs) collaborate with the private sector, and you mentioned the national statistics offices are also grappling with issues of using this open data. Of course, there is great part of this data that is private, it belongs to Google, to Facebook, to Amazon and so on, there are challenges there too. So, does the WDF tries to bring together all stakeholders meaning also the private sector is going to be participating?
SM – The idea of the WDF is that, you bring together all the actors, so not just international organizations and national statistics offices, but private sector, NGOs, citizen science, academics. It’s meant to be kind of a jamboree around data issues where you bring together all the players; the idea is really to try and avoid an echo chamber because the danger is that statisticians chat to statisticians, we generally have a similar perspective, a similar view. I’m sure this is equally the case in the private sector; so the idea is to try and get these groups to meet with each other and see where there are synergies and maybe where there’s possible conflict and then how do we how do we address those issues.
MP – So, there isn’t really a preset agenda or there was sort of a framework that the Swiss government, in collaboration with the United Nations, decided, on a few ideas and topics?
SM – There are themes announced, like any big conference. They invite speakers to organize sessions and so they post broad themes, everything from data collection, data stewardship, open data, big data. A range of themes, and then the idea is that people then submit ideas and say, well I think that would be an interesting discussion.
The WDF, like most conferences, has been massively oversubscribed. I’m not on the organizing committee but from what I’ve heard, the numbers of proposals are five or six times the number of suggestions that they could possibly accommodate, so the organizing committee that has to sift through all the proposals, and decide, first of all, does the proposals speak to the themes that were proposed?
Does the proposed session look reliable and well organized? Does it have credible speakers or speakers who will attract attention, or does it look like the idea hasn’t really been fully baked. Based on that they make their selection.
For example, this year I submitted a proposal discussing the idea of accreditation and the organizing committee came back and said “well this is an interesting idea but we also got a proposal from citizen science which is similar so would you be willing to merge your proposals?” and work together so that’s what we’ve done. Some people get outright rejections, some people get accepted and that’s how you get your conference.
MP – Can you explain a bit more?
SM – What I had put forward for consideration was to have a debate on the accreditation of unofficial statistics to be used as official statistics for the Sustainable Development Goals (SDGs) In some sessions you’ll have a discussion, somebody will present an idea. What I want to do was actually have a debate where I have two people proposing or supporting the idea and two people dead against the idea. I want the audience to hear the debate and then we would take a vote at the end and ask, who convinced you?
MP – You mean open data, like open data that you can scrape off the internet?
SM – The idea I presented is based on a paper that I published a couple of months ago with a former colleague of mine. What we were looking at is, if you look at the SDG global indicator framework it has quite a few gaps, meaning that there’s a lot of new concepts for which we don’t yet have data in the formal statistical system and yet we see universities, we see NGOs, we see private sector with lots of data that are
addressing these types of issues, so our argument is simply, why would a national statistics office reinvent all of that data and conceptual work if somebody’s already done it? So why not look at what they’ve done and if we think that it is valid then we would essentially put it through a quality assurance assessment and if we say it passes that assessment, we give it a gold star, we certify that statistic has sufficient quality for SDG purposes, that’s the argument in a nutshell.
MP – That’s something you brought up during your talk a few years ago at the European Network of Statistics Officers. That the private sector has so much data already, and perhaps you can use that data, as long as you maintain the standards because national statistics offices usually have very high standards on how to treat the data, how to collect data, how to evaluate the data, and how to draw insights from it. You mean the same international standards?
SM – You want to make sure the quality of the data themselves are good but you also want to make sure that the data aren’t biased, for example, if the data come from an NGO, one of the risks that a national statistics office might perceive is the advocacy risk, say the data come from “Save the Rare Mushroom”, then there’s a risk that they will want statistics that show a certain view, that they’ll have statistics that will support their argument.
A statistics office must be careful that the statistics are rigorous and that they’re not biased or that they don’t deliberately find the result they want. So one of the things you really have to be careful aboutis motive – why were the statistics compiled in the first place?
Are they compiled as a public good just to inform people or they are they deliberately constructed to win an argument. That’s not always an easy thing to do but an essential =deliberation for a National Statistical Office if they’re going to use data from the outside. There are other challenges too. For example, take Big Data, which is all the buzz at the moment. If you’re using mobile phone data, for example, or social mediadata, there may be biases.
There could be gender biases in certain parts of the world because women don’t always have access to phones or if they have access, it is controlled. Older people may not be able to use or may not be comfortable using a lot of the new technology, so you missthem. There are large digital divides right across Africa.
But in fact, recently I was reading Brad Smith’s book, Tools and Weapons, who is President and Chief Legal Advisor to Microsoft, and in his book, astonishingly, he says that there are massive broadband gaps right across the US.
If that’s the case then you could be missing important cohorts, which means your data may be skewed towards an urban view.
This would important, for example, if you are constructing a sentiment index, so even though you’ve got lots of data and it looks fine it may actually give you very skewed results. So, you have to be really, really, careful when you’re using somebody else’s data that you understand how it was collected and what the strengths and the weaknesses of those data are.
MP – Yes, absolutely, this is one of the questions. I think really that is at the core of what we are are talking, like we discussed at the Road to Bern last week, there is this big data sharing movement right now, which is sort of a big term, a big umbrella. What are ways we can look at it, when should we be sharing data, and in which ways?
SM – Data sharing is a small word but a very loaded term, as we’ve discussed before. For me, the first thing to clarify is what do we mean by ‘data’, and what do we mean by ‘sharing’?
Depending on how you answer those two questions my reaction will be very different. If we are talking about aggregate statistics then sharing may not be that complex.
For official statistics I would say that they should be shared and by shared now I mean equal access to everybody because they should be a public good. But if we’re talking about sensitive microdata that includenames and addresses, information like that, then that shouldn’t be shared under any circumstances unless everybody in that dataset has giventheir agreement – and it’s highly unlikely they would So we need to be very careful when we say ‘data’.
What type of data are we talking about? We must be careful about ‘sharing’ too. Does sharing mean equal access or does it mean that I’m going to give the data to you only. The next question is “do I have the right to do that”? Do I own the data or did I get the data from somebody?
For example, if I collected the data myself, say I went out and counted the number of trees in a field next to my house. I can share that data, I collected it myself; it’s not sensitive. But if I got tax data from the revenue authorities, which is highly unlikely, but even if they gave it to me for some purpose, it’s highly unlikely that they would agree that I could give it to you.
So, we need to understand the terms and conditions. We need to make sure that the terms and conditions on every data set is clear. In particular, we need to be careful that when the data were collected, if we gave promises or conditions to the subjects then we have to respect those terms and conditions. So, if we told people these data will never be released to anybody then that’s the end of the story – we can’t share those data. If we said the data will be released after a hundred years, then we have to wait for a hundred years.
So depending on what we said, those conditions must be respected. There are further complications from a legal point of view because we can also get into the whole ownership debate. For example, if you give me a data set. Let’s say you are Google and you give me data and then I create new data using those data, this is called recursive data.
There’s a lot of ambiguity now about what I can do with these new data because you didn’t actually give them to me, I created them, but I created it using the data that you gave me. So, who owns it? Who has the IP on those data. Do I have the right to share them?
That issue is at the crux, at the moment, of an ongoing debate. Some people are arguing that every time I use my mobile I’m creating data. The phone companies have the data. But who owns that data? Is it mine because it was manufactured through my labor, or is it the data holder, the mobile phone company, that owns the data? This is going be a huge issue in terms of who’s allowed to use data who’s allowed to share data.
Moving data across international borders is another super complex issue, especially, if you take for example, the situation now where Europe, which has the GDPR and the United States which doesn’t, and takes a more laissez faire view regarding personal data. In Europe now, with the GDPR, we are saying that the basic principle is that — as the data owner — I have rights to that data, whereas in the US, under the third party doctrine, the basic assumption is that you have no reason to assume that your data are private.
These are two very different ideological perspectives. That makes data sharing complex, because if I’m giving my data to a company in Europe which then shares it with a parent company in the US, I have legitimate concerns. This is an important issue; this will be the geopolitics of the future because data fuels the digital economy, data fuels artificial intelligence, data fuels globalization and internationally traded services. So these are super, super, important issues and this is why we’re discussing them.
MP – In particularly in the light of what just has been happening in the United States with the riots and the attacks on the press, the violence towards the press. I think that this debate is going to find even more strength because if up till now some people would say “oh it is ok, I voluntarily give data to some social media platforms because I like to use their services and I agree to this exchange”, but people are realizing it’s not that simple, it’s not just that they gathered your data in exchange for a service, they may use this data and they are already using it, they’re selling. It has great value, biomedical data too, so there should be some sort of regulatory framework that guides the information that is being given away. That an app created for health purposes, test and trace, is now being used for something else, just one of the examples of data misuse, there are many. Do we need to discuss, we need to create a global regulatory framework beyond the GDPR and the California Consumer Act?
SM – I would agree completely, in fact I have a paper coming out in the in the Statistical Journal of the International Association of Official Statistics, where I make this point. There needs to be some sort of a new global data deal that addresses these issues. Issues of ethics, issues around ownership, issues around principles – there’s a whole range of issues that really need to be agreed at the global level; it’s not enough to do it at a regional level. Data are a globalized product and they moves so fast, and their power is incredible. We’ve seen now, both in the Brexit campaign and in the 2016 US campaign, the important role that data played, .
It’s clear now that data have tremendous power and you know you can look at them as a tool, or as a weapon. I mean, tthey can do tremendous good – and this is why when the UN Secretary General says “leaving no one behind” it is a wholesome idea – using data so that everyone is counted. But on the flipside, it can also be used for terrible evil.
We’ve seen this throughout history, where data has been weaponized. Take that expression “leave no one behind” – when the Secretary-General of the UN says that, we understand it is aspirational, optimistic – it is a good thing. Now take totalitarian dictator of your choice, and now he or she says that same phrase and it can take on a completely different meaning which now is instead of being optimistic and powerful is suddenly quite sinister and that’s the danger.
Like with any tool, like with computers, it’s a tremendous tool but it can also be used for extreme evil, and with data is the same so we need to have some sort of a new global data deal, that sets out what the basic standards are, and hopefully, that the world will sign up to.
Interview granted via Zoom, June 2020, to discuss the Road to Bern preparatory conference for the UN World Data Forum.
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, mixing, blogging, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations, and with credit to the author. © 2020-2025 Maya Plentz