Part I: Boundary and population data

I.1. Discussion of data sources

The African administrative boundaries and population database was compiled from a large number of heterogeneous sources. The objective was to compile a comprehensive database from existing sources and in a fairly short time period that is suitable for regional or continental scale applications. The resources available did not allow for in-country data collection or collaboration with national census bureaus as was done,for example, in the WALTPS study. With few exceptions, the data sets do not originate from the countries themselves, and none of the input boundary data have been officially checked or endorsed by the national statistical or mapping agencies.

Boundaries

Due to the lack of high quality, published maps showing administrative boundaries for African countries, this project made use of any available data set. For many of the national boundary coverages there was no information regarding source map scale available. If known, the cartographic scale of the source maps are indicated in the country documentation in Appendix A4. The scales are estimated to vary between 1:25,000 and 1:1 million.

In order to ensure a close match between different national coverages, and to obtain maximum compatibility with other standard medium resolution data sets, all national boundaries and coastlines were replaced with the political boundaries template (PONET) of the Digital Chart of the World (DCW). The DCW is a set of basic digital GIS data layers with a nominal scale of 1:1 million. The use of a very detailed international boundaries template for, in some cases, relatively coarse resolution data is somewhat misleading, but was required to ensure a close match between the national coverages. In any application the often smaller cartographic scale (i.e., coarser resolution) of the administrative boundary data in comparison to the international and coastlines template should be kept in mind.

For a few countries very detailed boundary data were available for which the spatial referencing information was not known. In the absence of better data, these were nevertheless incorporated in order to achieve maximum resolution. Yet, the ad hoc transformation, projection change and rubbersheeting required to make these data compatible with the DCW template have no doubt introduced positional error which may well reach a magnitude in the order of 1-2 km.

Population data

The population figures attached to the GIS database represent estimated totals for the standardized years 1960, 70, 80, 90 and 2000. The estimation method and a discussion of data accuracy are the subject of the following section. Data sources vary by country. In general we attempted to obtain population data from each census that has been carried out at the geographical level for which boundary data were available. Where official population estimates or projections were available from the national statistical office, these were used as well.

Copies of official census publications have been collected over the past five years from a number of university libraries as well as the United Nation's Statistical Library and the U.S. Library of Congress. Additional material was available from the comprehensive holdings of the International Programs Center's library at the U.S. Census Bureau. Finally, any population figures published in yearbooks, gazetteers, area handbooks or other country studies have been used to ensure that population figures for as many time periods as possible could be used as the basis of the estimation.

I.2. Population projections and data quality

Estimation of population figures

In order to provide an indication of population dynamics and to maximize comparability across national boundaries, population estimates were produced for 1960, 70, 80, 90 and 2000. This follows the approach of the WALTPS study which used such figures as the basis of a detailed demographic and economic analysis of West Africa. Since population censuses are not synchronized and census taking has been irregular in many African countries, figures needed to be interpolated to provide these estimates. For this purpose, province or district specific intercensal growth rates were computed from published figures. These growth rates were then used to compute estimates for the standard years. The intercensal growth rate is calculated as

r = [ ln (p1/p2) ]/ t

where r is the average annual rate of growth, P₁ and P₂ are the population totals for two different time periods, and t is the number of years between the two enumerations (see, for example, Rogers 1985). The resulting growth rates were then used to derive estimates for the standard years. For example, based on enumerations in 1967 and 1977 and a corresponding rate r, the 1970 population would be calculated as:

P1970 = P1967.e(3r)

In cases where no data were available for a year before 1960 or after 2000, the trend between the two closest enumerations was used to extrapolate the earliest or latest available data. Similarly, simple trend forecasts beyond 2000 could be made using the average growth rates between 1990 and 2000 as reflected in the figures in the GIS database.

"The volume of papers and monographs on population projection methods in the demographic literature is very large. It is matched, however, by the number of publications that emphasize the continuing inability of these methods to accurately forecast population figures over more than very short time periods (O'Neill and Balk 2001, also see the interesting discussion in Cohen, 1995)."

Hyman et al. 2002

For predictions over only a few years, mathematical trend projections are usually fairly accurate, and the specific type of function used has little influence on the results (Cohen 1995). A more elaborate estimation approach such as the cohort survival method would result in more reliable estimates, but the data requirements for this technique (district level age and sex distribution as well as age specific birth, death and migration rates for several censuses) were far beyond the scope of this project. In fact, it is unlikely that such data could be obtained for many countries even with large available resources.

Given the limited amount and quality of the base population data, we checked the resulting total national population figures against a standard benchmark, the regularly published population estimates produced by the Population Division of the United Nations (2002, medium variant). In the summary table in Appendix A2, the UN figures for 1960 to 2000 are presented. Obviously, the UN data are by themselves associated with a considerable amount of uncertainty since the estimates are based on conditional forecasts that make a number of assumptions regarding the most recent and future fertility, mortality and migration rates. They are also based, for the most part, on official census figures which sometimes prove to be highly unreliable (Nigeria being a notorious example). In cases where the estimate was considerably different from the UN estimate, the intercensal growth rates were adjusted uniformly such that the resulting estimate was equal to or close to the UN estimate. Typically this is the case where data were available only for two time periods, or where a country experienced significant short-term changes in population numbers due to external circumstances. The adjustments are indicated in the specific country documentation below.

UN population figures were used in two additional cases: (1) for countries for which no subnational boundaries or data were available (e.g. Reunion); (2) for countries for which census figures were available for only one point in time, resulting in a uniform adjustment of population figures across the nation.

The figures included in the database are directly taken from the estimation and thus show more significant digits than is justified by their accuracy. During data manipulation and processing one should preserve all significant digits, but for presentation purposes, the figures should be rounded to reflect the uncertainty of the data. Even the use of population numbers to the nearest thousand would imply a considerable degree of optimism about the quality of the data.

Census data

Given the method used for the population forecasting, the characteristics of the available source data obviously have a significant impact. It is clear that the accuracy is better for countries that have had several censuses at regular intervals over the last four decades. Unfortunately, not all countries in Africa have had more than two censuses since the 1950s. Nineteen African countries did not have a census before 1970, and four of these had their first census in or after the mid-eighties (U.S. Bureau of the Census, 1995).

The accuracy of censuses obviously varies by country. It was beyond the scope of this project to evaluate the accuracy of every census used, or of any of the official estimates. This would be possible since many censuses are followed by a post-census enumeration that provides an accuracy estimate. In countries with population registers, published population figures are accurate within a fraction of a percent. In the United States, census counts have been shown to have an accuracy of about 2 percent. With few exceptions, the accuracy of African censuses is likely to be considerably lower. Detailed discussions of population estimates in African countries are given in IDP (1988) and National Research Council (1993).

Additional sources of error

Population estimation is an uncertain science, particularly in countries that need to rely on often irregular census taking for population enumeration rather than on a civil registration system. Sources of error are numerous and include

census undercount,
intentional or unintentional misreporting during census activities,
decision by governments not to release census results,
the sometimesvery long inter-censal period,
sudden population movements due to external shocks suchas war, famine or forced migration schemes, and
rapid changes in fundamental demographic factors (i.e., fertility and mortality).

In countries where large and rapid migration movements occur, the timing of an enumeration will also have an impact on the magnitude of estimates for a particular time period. In other words, the population estimates can be sensitive to specific circumstances in the year of the enumeration. An example that was given in the Asia database documentation that illustrates this point is reproduced in Appendix A1.

A comprehensive discussion of data quality issues in census taking is presented in the United Nations' Principles and Recommendations for Population and Housing Censuses revision 1 (United Nations 1998). However, no technical discussion could better highlight the uncertainty associated with most published population figures (and certainly those included in this database) than the following anecdote. Uwe Deichmann's story about the population of Lagos was the following:

The journal West Africa published a short news item that the population of Lagos was five million. I wrote to them for the source, suspecting that Bob Morgan and Ransome Kuti had completed their demographic survey of the city and had multiplied the inverse of the sampling fraction to obtain its population. But, West Africa wrote back that one of its correspondents had been told this figure by Peg Pell (i. Professor Margaret Peil) at the Univerity of Birmingham's West African Centre. Thereupon I wrote to Peg who replied to say that she had been told that figure by Bob Morgan when he was visiting Britain. Thus, concluding that my first surmise was correct, I wrote to Bob for affirmation and congratulated him on the completion of the survey. He wrote back saying that the survey was not complete and forecast correctly that it would never be completed, but added the following:

"You remember, Jack, that I picked you and Pat up at Lagos airport nine months ago. Your flight path had come in over the full length of Lagos and you remarked to me that the city had grown greatly and now looked as if it might have five million inhabitants. I knew that you had flown over many cities and knew the populations of many of them, so I thought that this was the best estimate Nigeria was likely to have. I have subsequently employed it when people have asked me the question."

Uwe Deichmann

Given our limited knowledge about the accuracy of the input data, it is impossible to make an objective assessment of data quality. The development of a qualitative index of boundary and population data quality was considered. However, such an index would be associated with considerable subjective judgment. Any question "how good are the data?" is incomplete since we also have to ask "for what purpose?" Data that are clearly inappropriate for high resolution applications at the province or sub-province level, are still sufficiently accurate to be used in regional or continental scale applications (the prime motivation for this project), or for the visualization of spatial patterns in a country. Thus, we only provide some informal summary measures in the table below, and refer to the individual country documentation that provides all known details about the lineage of the data (admittedly, this knowledge is too often very limited). The user can consider this information to make his or her own decision about whether the data are appropriate for the specific tasks. Special care should be taken when the population figures are used as the denominator in the computation of proportions or rates, in particular, when the numerator is very small as is often the case in epidemiological studies.

As in the previous databases, we included two useful summary measures of data resolution in the summary table in Appendix A2

Mean resolution in km = (Country Area / Number of Units)^0.5

i.e. the length of a side of an administrative unit, if all units were square. And

Mean population per unit = Total National Population / Number of Units

These two measures complement each other. In countries where large areas are uninhabitable, the mean resolution in km gives a biased impression of available detail. In such cases, the number of people per unit is a more meaningful indicator. The following table shows how these measures of resolution compare for Africa, Asia and Latin America.

Continent	Mean resolution in km	Mean population per administrative unit ('000)
Asia	117	1148
Latin America and the Carribean	35	69
Africa	16 (32)¹	7 (28)¹

1 Figures in parentheses indicate the values when South Africa (83,000 units) is discounted from the total of 109,268 units in Africa

United Nations Environment Programme
Global Resource Information Database
Division of Early Warning & Assessment - North America