Sampling and the Census

 Improving the accuracy of the decennial count

 The actual enumeration shall be made within three years after the first meeting of the Congress of the United States, and within every subsequent term of ten years, in such manner as they shall by law direct.

-- Article I, section 2, Constitution of the United States of America

 It sounds so simple. Just count everyone, dwelling by dwelling, across the nation and add up the numbers to obtain the total population.

 Ever since the first census, in 1790, however, those charged with performing the decennial enumeration have faced a host of difficulties in accounting for every individual in the country -- and they have inevitably fallen short. In George Washington's time, census takers were likely to miss settlers in remote areas, itinerant laborers, and other elusive residents. In some cases, they simply made up the numbers.

 Nowadays, the U.S. Postal Service delivers census questionnaires. Many people fail to reply to them, perhaps because they were away from home, because the form went to the wrong address, or because they simply refuse to provide such information to the government. Follow-up door-to-door enumeration improves the count, but a significant fraction of the population still escapes the tally.

 In 1990, the Bureau of the Census recorded 248,709,873 people. It estimates that more than 8 million were not counted, most of them children, people from racial and ethnic minorities, and poor people in rural and urban areas. At the same time, more than 4 million people were counted twice or incorrectly included in the census.

 Because the areas where undercounts and overcounts occurred don't necessarily overlap, the accumulated, block-by-block error in the 1990 census could have been as high as 10 percent, says statistician Stephen E. Fienberg of Carnegie Mellon University in Pittsburgh.

 Such inaccuracy does make a difference. Census data are used not only to reapportion seats in the House of Representatives but also to allocate funds for a variety of federal programs. Local governments often rely on census data to determine the need for new schools, hospitals, and other facilities.

 The statistics covering small areas such as counties and towns are particularly useful. "Accuracy at that level is very important," says Robert M. Bell, senior statistician at the RAND Corp. in Santa Monica, Calif.

 Last year, the Census Bureau officially announced its plan for the tally in the year 2000, declaring that its program had the "twin goals of reducing costs and increasing accuracy." Indeed, the new plan represents a complete redesign of the census, with an emphasis on collecting more accurate data on fewer people rather than spending time and money in an ultimately futile effort to get information from everyone.

 A key element of that plan is a significant increase in the use of statistical sampling. Instead of trying to visit every address from which there was no reply, enumerators would follow up only a carefully chosen sample of those addresses. The bureau would also conduct a separate nationwide survey of 750,000 dwellings to help measure the quality of the census data and estimate the extent of the undercount.

 Despite an overwhelming consensus among statisticians favoring the use of sampling to improve the accuracy of the census, the bureau's proposal has proved highly controversial in Congress. Exercising their right to review census plans, members of the House of Representatives have voiced a variety of concerns. Last week, they voted to prohibit the substitution of sampling for direct enumeration until the Supreme Court rules on the matter.

"The dilemma is that [enumeration], like any other human process, is flawed," says historian Margo Anderson of the University of Wisconsin-Milwaukee. "There have always been controversies over the quality and accuracy of the count, and they flare up in the context of complex political issues."

 The history of census methods has been one of continuous change. The 1790 census was taken by federal marshals, who were directed to visit every dwelling and count the people living there. As the population grew, professional enumerators gradually replaced the marshals.

 Issues about how to conduct the census arose early on, and the American Statistical Association, now the major organization in the statistical profession, was founded in 1839 to deal with questions related to census taking. A few decades later, faced with the enormous task of tallying by hand the data collected in the 1880 census, bureau employees invented the punch card machine to ease the tabulation of results.

 In the 1930s, the federal government greatly expanded its role in managing the nation's economy, increasing its appetite for statistics of all sorts. In its first major use of sampling, the Census Bureau in 1940 introduced the "short form" set of questions for the majority of the population. Only a portion of the population received the "long form" questionnaire, from which nationwide trends could be extrapolated.

 Census directors and others have always known that the census fails to count everybody. The first clear evidence that minority groups were disproportionately undercounted surfaced in 1940, when 3 percent more draft-age men, including 13 percent more black men, than the census had tallied showed up for the draft pool. As one response to the discrepancy, the bureau began checking its data against such records as birth and death certificates.

 The Voting Rights Act of 1965 and a number of subsequent court decisions, along with the use of census data to allocate billions of dollars of federal aid annually to state and local governments, increased the political pressure to improve the accuracy of the count.

 In recent decades, census expenses have increased tremendously. Even accounting for inflation and population growth, the 1990 census cost twice as much as the 1970 census, which was the first to use forms sent by mail. At the same time, there were various signs that more people were missed in 1990 than in 1980, and the procedures used by the Census Bureau for collecting the 1990 data were challenged in court.

 Congress concluded that the 1990 census had cost too much and counted too few people, and it demanded improvement.

"The irony is that, despite the fact that the undercount is much smaller than it used to be [decades ago], politically it makes a much bigger difference now than it ever did before," says sociologist Harvey M. Choldin of the University of Illinois at Urbana-Champaign.

 Since 1990, the Census Bureau has conducted an extensive program of research, testing, and evaluation to develop methods for delivering census data of higher quality at lower cost. In part, the new plan was designed in response to expert advice from the National Research Council's Committee on National Statistics, which issued reports in 1994 and 1995.

"It is fruitless to continue trying to count every last person with traditional census methods of physical enumeration," the NRC panel concluded. "Simply providing additional funds to enable the Census Bureau to carry out the 2000 census using traditional methods, as it has in previous censuses, will not lead to improved coverage or data quality."

 The panel recommended that the bureau cut back on its traditional practice of trying to contact every last individual and instead rely more heavily on statistical estimates of the number and characteristics of those not directly enumerated.

 The inclusion of sampling would reduce the workload in the field, making possible the use of a smaller, better-trained, more highly qualified staff of enumerators. It would also allow more timely completion of the follow-up phase, increasing data quality because respondents would give information to enumerators closer to Census Day. Presumably, there would be fewer errors because of faulty recall and less use of information obtained from indirect sources to fill in gaps during the final stages of data collection.

 Several subsequent NRC committees and other groups have reaffirmed the panel's initial conclusion that the use of sampling techniques is "critical to the success of the year 2000 census."

 Actually, sampling isn't new to the Census Bureau. It has sampled a small portion of the population, asking such questions as the number of rooms in a person's residence. It has also used sampling to monitor census interviewers, estimate the number of vacant dwellings, adjust results for incomplete data, and evaluate the thoroughness of its coverage.

 Using sampling to follow up on people who fail to respond to the mailed questionnaire requires the development of new techniques to provide accurate data down to the level of a county, census tract, or even block. "There are all kinds of interesting questions about how best to do it," says David A. Binder of Statistics Canada in Ottawa.

 Researchers need to decide the size of the sampling unit, whether to look at a whole block or just a selection of households in a block, when to start sampling, and many other issues. Experience from an extensive test in 1995 helped the bureau refine its procedures.

"Part of the problem is that no matter what you do, there are still going to be errors in the results," Binder says. The idea, then, is to come up with methods that minimize those errors.

 Aided by advisory committees, the Census Bureau has devised a plan that includes developing a complete, accurate list of mailing addresses, designing questionnaires that are easier to understand and fill out, and using sampling to follow up on people who don't respond.

 Follow-up is a crucial component of the census. In 1990, the mail response rate was only 65 percent, and trends suggest that the return rate could fall even lower in 2000. At present, the bureau's goal is to collect data from at least 90 percent of the housing units in each census tract. A tract typically encompasses about 1,700 dwellings and 4,000 people, for a total of about 60,000 census tracts across the country.

 Instead of visiting every address from which no response was received by mail or telephone, enumerators would go to a fraction of those dwellings, substantially reducing the cost of the follow-up phase while presumably improving the quality of the data. The bureau plans to use such techniques as computer-based random selection to ensure that the sample is distributed evenly among the nonresponding addresses.

 Moreover, to obtain information from 90 percent of the housing units in each census tract, the plan calls for sampling a higher share of units in those tracts that have lower mail-in rates. For example, if the initial response rate were 30 percent, six out of every seven nonresponding addresses would be in the sample. If the response rate were 80 percent, just one in two such addresses would be visited.

 The bureau's program addresses the undercount issue with an extensive post-enumeration survey, which means, in effect, making a second count using a sample of the entire population.

 A dress rehearsal -- focused on Sacramento, Calif.; Columbia, S.C., and 11 surrounding counties; and the Menominee Reservation in Wisconsin -- scheduled for early 1998 will serve as a thorough test of the bureau's new procedures.

 Many members of Congress have questioned the Census Bureau's approach, expressing concern and, in some cases, dismay about the use of sampling methods in the initial enumeration phase of the census. Several critics have gone so far as to say they are willing to write a "blank check" to cover the additional cost of a traditional census plan to avoid the use of sampling.

 According to Census Bureau Director Martha F. Riche, however, such an approach would add at least $675 million to the $4 billion expenditure already slated for the 2000 census -- without guaranteeing an increase in accuracy.

 A few members of Congress have also voiced fears that the sampling process could be subverted or manipulated in some way to bias results. Statisticians argue that the precisely known, mathematical properties of scientific sampling would actually diminish the opportunity for political manipulation. In a sampling approach, attempts at manipulation would be more readily detectable because doctored data would no longer conform to expected statistical patterns.

 A joint House-Senate conference committee will next debate whether to prohibit the Census Bureau from using sampling for nonresponse follow-up. Most of the opposition to sampling comes from Republican members, some of whom fear that it could lead to the loss of their seats in the House.

"There is a grave concern that if [congressional] restrictions curtail certain census operations, the count in 2000 will be much worse than it was in 1990," Bell says.

"One should never forget that the census is about the apportionment of political power and resources -- 'moving power and money' -- among the various population constituencies that make up American society," Anderson and Fienberg write in the March-April Society. "The framers of the Constitution knew as much when they instituted it two centuries ago. It behooves us to remember their legacy to us today."

 How the current political controversy plays itself out may very well be reflected in what happens on Census Day, April 1, 2000.