Studies of Welfare Populations: Data Collection and Research Issues. Assessing the Quality of the Data and Cleaning the Data for Research Purposes

06/01/2002

In this section, we present strategies for determining if a particular administrative data set can be used to answer a particular question. Researchers seldom go directly to the online information system itself to assess its quality--although this may be one step in the process. Typically, government agencies give researchers both inside and outside the agency an extract of the information system of interest. This file may be called a "pull" file. It is a selection of data fields, never all of them, typically on all individuals in the information system during a specified period of time created for a particular purpose, usually not specified each time a request for data is made. Any one actual pull refers to a time period that corresponds to some administrative time period--for example, month or fiscal year. These cross-sectional pulls are very useful for agency purposes because they describe the point-in-time caseload for which an agency is responsible. As we will explain, this approach is not ideal for social research or evaluation.

The programming for a pull file is often a time-consuming task that is done as part of the system design based on the analytic needs at the time of the design. Even a small modification to the pull file may be costly or impossible given the capacity of the state or county agency information systems division. The advantage of this practice is that multiple individuals usually have some knowledge of the quality of the pull file--they may know how some of the fields are collected and how accurate they are. The disadvantage is that it probably requires additional cleaning to answer a particular set of research questions.

We cannot stress enough the importance of assessing data sets individually for each new research project undertaken. A particular data set may be ideal for one question and a disaster for another. Some fields in a database that may be perfectly reliable because of how the agencies collect or audit these fields, while other fields may almost seem to contain values entered in a random manner. Also, a particular programmatic database may have certain fields that are reliable at one point in time and not at other points. Needless to say, one field may be entered reliably in one jurisdiction and not in another.

For example, income maintenance program data are ideal for knowing the months in which families received Aid to Families with Dependent Children (AFDC) or Temporary Assistance for Needy Families (TANF) grants. However, because they rely on the reporting of grantees for employment information and there are often incentives for providing inaccurate information, addressing questions about the employment of TANF recipients using income maintenance program data is not ideal. Furthermore, information about the grantee, such as marital status or education, may only be collected at case opening and therefore is more likely to be inaccurate the longer the time since the case opening. Undertaking these tasks of assessing data quality is quite time consuming and resource intensive. The resource requirements are similar to those of cleaning large survey data sets, however, where to go to get information to do the cleaning is often unclear. Often documentation is unavailable and the original system architects have moved to other projects. Therefore, cleaning administrative data is often a task that goes on for many years as more is learned about the source and maintenance of the particular database.

In the following paragraphs, we provide some of the strategies and methods that we use to assess and address issues of data quality in the use of administrative data. The most basic, and perhaps best, of these is to compare the data with another source on the same event or individual. We will end with a discussion of that strategy.

View full report

Preview
Download

"01.pdf" (pdf, 472.92Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"02.pdf" (pdf, 395.41Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"03.pdf" (pdf, 379.04Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"04.pdf" (pdf, 381.73Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"05.pdf" (pdf, 393.7Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"06.pdf" (pdf, 415.3Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"07.pdf" (pdf, 375.49Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"08.pdf" (pdf, 475.21Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"09.pdf" (pdf, 425.17Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"10.pdf" (pdf, 424.33Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"11.pdf" (pdf, 392.39Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"12.pdf" (pdf, 386.39Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"13.pdf" (pdf, 449.86Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"14.pdf" (pdf, 396.87Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®