Previously only available in hard copy, the data and documents of South Africa’s first all-race countrywide household survey are now available in accessible digital format. View the SALDRU publications in the OpenSALDRU Publication Repository.
Big Data is viewed as the next frontier for innovation, competition and productivity, and with the building and deployment of ‘Massive Data Repositories’, it may become as important to business – and society – as the Internet. From a social perspective, more data is likely to lead to more accurate analyses and more responsive evidence-based policy.
As a major open data activist, I like to remind people that data is one of the few things in the world that becomes more valuable the more you give it away. Governments around the world are accepting that, which is why there’s been such a huge drive toward open government data. If governments put their data out there, it's got to get better and if it gets better, it’s good for them. Plus, if they put it out there, more people will use it and feed back to their planning. Through their research, academics offer governments free quality control, free policy analysis.
DataFirst is a multi-faculty interdisciplinary data service unit at the University of Cape Town (UCT) based in the Commerce Faculty. The unit is dedicated to making South African and other African survey and administrative data available to researchers and policy analysts, and provides national census and survey microdata for research purposes to academic staff and students across several Faculties. We encourage data usage and data sharing. The availability of this data for secondary analysis has contributed significantly to the development of interdisciplinary quantitative social science research in South Africa.
DataFirst holds SALDRU’s (Southern Africa Labour & Development Research Unit) early research output, including the data and documents from the seminal 1993 Project for the Statistics on Living Standards (PSLSD) and Development, the first country-wide household survey in South Africa.
We’ve been doing censuses in South Africa since the colonial era. Stats offices were set up to collect data for government planning, but until fairly recently the data wasn’t readily available. What used to happen is that government would collect the census and survey data, then government statisticians would analyse the data and produce reports. These reports would be given to all government departments, and policy would be made based on them.
'Planning was dependent on the initial analysis being accurate and there was no way you get the raw data to check the analysis. It’s possible that a lot of bad decisions were made in that way.'
Planning was dependent on the initial analysis being accurate and there was no way you get the raw data to check the analysis. It’s possible that a lot of bad decisions were made in that way. In a lot of instances, the data wasn’t actually used, so government would make policy based on tradition, politics, intuition, guesswork – even though they had the data there at their finger tips.
But since the advent of the digital age, data agencies have been able to access raw data from government. Each country has its data archive and there are regional and international networks of social science data archives, which use data analysis software to check the data, put it into useable form and make it available to researchers. Now the raw data is not just available, but easy to move around, copy, download for purposes of quantitative research and policy development.
'In 1993, SALDRU was commissioned to do the first complete household survey of all race groups in South Africa just before the democratic elections. That data would be the first really accurate indication of what was really going on in the country.'
In 1993, SALDRU was commissioned to do the first complete household survey of all race groups in South Africa just before the democratic elections. That data would be the first really accurate indication of what was really going on in the country. It was called the Project for Statistics on Living Standards and Development (PSLSD). I was working as SALDRU’s data manager at the time and we co-ordinated this huge and very well run survey.
After we got the data at the beginning of 1995, we sent reports out to the various government departments, and that data was used for the ANC’s economic development plan. But South Africa has a dearth of quantitative skills, and in 1993, there were very few people using raw data. It was quite a new thing and just a handful of academics knew how to analyse raw data. But you can only tell so much from a report, so our director at the time, Francis Wilson, decided that people should be using all this raw data; it needed to be further interrogated. He wanted researchers and people in government to use it, so he set up these road shows and we went around encouraging people to use the data.
Following a chance meeting with a man who worked in the Netherlands data archive, he realised the need to establish a data service here in South Africa that would take away the time consuming effort of accessing data from government, so that researchers could spend their valuable time just doing research. He spoke to a lot of people in government and eventually they set up the South African Data Archive (SADA), which has been operational since the late 1990s.
Then, in 2001, with funding from the Mellon Foundation, he established Data First, which is more than a data archive. We’re a data service; we do preserve data, but we’re more pro-active – our main focus is to help researchers to use the data. We check the data, make sure there are documents to support it, publish descriptions of it, put into a useable format and make it available through various different interfaces. In short, we provide certified, updated copies of data, adhering to international standards of best practice. It’s an amazing service; many, many countries don’t have this at all. Although, we do have a project to advance data curation in other African countries.
At DataFirst, we have a project to source and distribute historical South African data. While the data from SALDRU’s early research output is in digital form, making it easy to share with the wider academic community, much of the supporting documentation and research findings were, until recently, only available in hard copy. The Humanitec grant enabled us to digitise these questionaires and documents associated with these surveys, as well as all the publications based on the data. What is the point of having access to the data if nobody knows what people are writing about it? Most significant is the material related to and publications arising out of the PSLSD – the famous integrated household survey of 1993, which is still influencing research and policy.
The Humanitec project has also allowed for the digitisation of the reports from earlier SALDRU conferences. These include the papers from the 1976 SALDRU Farm Labour conference and the Second Carnegie Conference held in 1984. These ground-breaking conferences set the stage for future quantitative research on poverty in South Africa.
'Until recently, researchers had to visit the Research Data Centre located in the new School of Economics building on Middle Campus to use these publication for their research. This really limited their usage. Digitisation has vastly optimised the accessibility of these seminal pre- and post-apartheid poverty investigations.'
The 1984 Carnegie Conference was the first time that people got a proper overview of the poverty and inequality in South Africa. Of course, the government lambasted SALDRU for it, but it was a watershed moment for social and economic research in South Africa. There were about 300 conference papers and numerous seminal working papers, as people continued to publish about the outcomes of the conference.
DataFirst continues to receive requests for these conference papers, as well as early titles from SALDRU’s Working Paper Series. Until recently, researchers had to visit the Research Data Centre located in the new School of Economics building on Middle Campus to use these publication for their research. This really limited their usage.
Digital copies of the SALDRU resources are available in UCT libraries’ repository and online via SALDRU’s website, as well as via DataFirst’s online data portal, where relevant. Metadata for the collection has been created by DataFirst staff, who have experience in metadata creation for digital objects. Digitisation has provided better long-term access to these resources for the research community. It has vastly optimised the accessibility of these seminal pre- and post- apartheid poverty investigations.