Assessing the Feasibility of Creating and Maintaining a National Registry of Child Maltreatment Perpetrators: Research Report. APPENDIX C. Guidelines and Procedures for Preparing a Data File for the Prevalence Study

Form Approved

OMB No. 0990-0366

Exp. Date 01/31/2014

Feasibility Study for a

National Registry of Child Maltreatment Perpetrators

Guidelines and Procedures

for Preparing a Data File for the Prevalence Study

Prepared for:

The Office of Assistant Secretary for Planning and Evaluation

U.S. Department of Health & Human Services

200 Independence Avenue, S.W.

Washington, D.C. 20201

Prepared by:

Walter R. McDonald & Associates, Inc.

2720 Gateway Oaks Drive, Suite 250

Sacramento, CA 95833

January 2010

Table of Contents

 

1.0 BACKGROUND

2.0 DATA SUBMISSION OVERVIEW

3.0 DOWNLOADING THE PERP FILE AND THE PENCODER SOFTWARE APPLICATION

4.0 SPECIFICATION FOR CREATING THE DATA FILES.

4.1 Perpetrator Extract File (PErp File)

4.2 Unencoded Perpetrator State Dataset (UPSD) File.

4.3 Perpetrator Encoding (Pencoder) Software Application.

4.4 Perpetrator State Dataset (PSD) File.

5.0 USING THE PENCODER SOFTWARE APPLICATION.

5.1 System Requirements.

5.2 Installing the Pencoder on a Local Computer.

5.4 Launching the Pencoder.

5.5 Location of Input and Output Files.

6.0 SUBMITTING FILES TO THE REGISTRY PREVALENCE STUDY

Appendix A. Locating perpetrators in the State system

 

1.0 BACKGROUND

The Office of the Assistant Secretary for Planning and Evaluation (ASPE), US Department of Health and Human Services is conducting a study to assess the feasibility of developing and maintaining a National Registry of Child Maltreatment Perpetrators as mandated under the Adam Walsh Child Protection and Safety Act of 2006. Walter R. McDonald & Associates Inc (WRMA) has been contracted to conduct the study. The study has two parts: a Key Informants Survey and a Prevalence Study.

The purpose of the Prevalence Study is to estimate how frequently child maltreatment perpetrators have substantiated investigations in multiple States. In order to make these estimates possible, States are asked to provide date of birth and encoded (and therefore not identifiable) names for all persons found to be substantiated perpetrators during the previous 5 years (FFY 2005-2009). The Federal fiscal year (FFY) begins October 1st and ends on September 30th of the next calendar year. The States are asked to provide information on substantiated perpetrators during October 1st 2004 and September 30th 2009.

The States will be assisted in this task in many ways. Each State will be assigned a prevalence study technical team liaison who will provide technical assistance throughout the data preparation and submission process.

 

This document provides a description of the process for preparing and submitting data for the prevalence study.

2.0 DATA SUBMISSION OVERVIEW

States participating in the study are requested to provide date of birth and encoded (and therefore not identifiable) names for all persons found to be substantiated perpetrators during the previous 5 years (FFY 2005-2009). Exhibit 2-1, Data Preparation and Submission Process, graphically depicts the activities that comprise the data collection processes. As can be seen in the diagram, States will be assisted in several ways to reduce the effort in participating.

States will receive from the study contractor, WRMA, the Perpetrator Extract File (PErp File). This file contains all perpetrator IDs (associated with substantiated maltreatments) for the State, for the last five years, which have been submitted to NCANDS. The perpetrator IDs are unduplicated within year. The data will be extracted from the NCANDS Child Files. Section 3.0, Downloading the PErp File and the Pencoder Application, provides step-by-step instructions for downloading the PErp file for your State.

The PErp file that the State receives (in TXT format) will consist of records with the State abbreviation, submission year, perpetrator ID, report date and the county of report. Refer specification in Exhibit 4.1.1 Data File Specification for the PErp File. The perpetrator IDs are unduplicated within each submission year, but are not unduplicated across the 5 years. If a perpetrator has abused more than one child or has more than one report associated with him/her the latest report date during the reporting period is used for the PErp file.

Upon receiving the PErp File, State decrypts each perpetrator ID and identifies the corresponding perpetrator in the State information system. Once the perpetrator is identified, the State encrypted perpetrator ID, first initial, last name, and the date of birth are appended to the PErp file to create a new file. This file is called the Unencoded Perpetrator State Dataset (UPSD) file. Refer specifications in Exhibit 4.2.1 Data File Specification for the UPSD File. The last name of the perpetrator in the UPSD file is not encoded. The UPSD file is the input for the encoding software. Appendix A provides examples of how perpetrators can be located in the State system using the PErp file.

In order to provide a file without the true name of the perpetrator, the State will also receive encoding software application called the Pencoder. The State uses the UPSD file in conjunction with the Pencoder. The Pencoder will encode the last name of each perpetrator in the UPSD file using the New York State Identification and Intelligence System (NYIIS) algorithm. The Pencoder will also validate the input file to make sure that all fields confirm to the specifications. Section 3.0, Downloading the PErp File and the Pencoder Application, provides step-by-step instructions for downloading the Pencoder software application for your State.

The Pencoder software application creates two output files, the Perpetrator State Dataset (PSD) file and the Results Report file. The PSD file contains all information in the UPSD file, however, with the last names encoded. Refer specifications in Exhibit 4.4.1 Data File Specification for the PSD File. The Results Report file contains the results of the validation of the data for conformity with the PSD file specifications. Both the files should be submitted to the study team through the secure web site described in section 5.0. The perpetrator information in the PSD file will be used to link to the existing NCANDS Child Files, to create the database for the prevalence study.

3.0 DOWNLOADING THE PERP FILE AND THE PENCODER SOFTWARE APPLICATION

The PErp file and the Pencoder software application will be available for States to download on a secure website called the Collaborator. Each State will have its own work area on the Collaborator and will not have access to other States information. The State user can download the compressed (zip) PErp file and the Pencoder package from the Collaborator as follows:

Go to www.wrma.com

  • Click on "Extranet" on the top menu. A new page for the Collaborator will open.
  • Enter the login information provided to you. Contact your registry technical team liaison if you have not received this information.
  • Click on Registry_<State Name>
  • Click on "Documents"
  • Click on "Registry"
  • The "<State Name> PErp File.zip" is the PErp file. Select "Download Document" from the dropdown list on the right and save the file into the C:\Registry folder on the State computer.
  • The "Registry Pencoder <Date>.zip" is the entire Pencoder package. Select "Download Document" from the dropdown list on the right and save the file into the C:\Registry folder on the State computer.

 

4.0 SPECIFICATION FOR CREATING THE DATA FILES

This section provides greater detail as to how the State creates the Perpetrator State Dataset (PSD) file. This includes describing the structure of the file, the data records in the file, the data elements in the records, and the procedures used for constructing the data file. Each State receives the PErp file and submits the Perpetrator State Dataset file.

4.1 Perpetrator Extract File (PErp File)

This file contains all perpetrator IDs (associated with substantiated maltreatments) for a State for the last five years (FFY 2005 - 2009). The Federal fiscal year begins October 1st and ends on September 30th of the next calendar year. The data have been extracted from the NCANDS Child Files. The perpetrator IDs have been unduplicated within each submission year. A single record consists of the State abbreviation, submission year, perpetrator ID, most recent report date, and the most recent report county. This file is submitted to the State in text format (TXT). The file is compressed (zip) to enable faster download.

Exhibit 4.1.1 Data File Specification for the PErp File
FIELD #
(POSITION)
LONG NAME
(SHORT NAME)
FIELD TYPE & CODES
(Example)
FIELD
LENGTH
Record Example: CT200700004356ABDF12052008040 12051990
1
(1-2)
STATE/TERRITORY
(STATEAB)
ALPHABETIC
(CT)
2
2
(3-6)
SUBMISSION YEAR
(SUBYR)
NUMERIC
(2007)
4
3
(7-18)
NCANDS PERPETRATOR ID
(NPERPID)
ALPHANUMERIC
(00004356ABDF)
12
4
(19-26)
REPORT DATE
(RPTDT)
NUMERIC [mmddyyyy]
(12052008)
8
5
(27-29)
REPORT COUNTY
(RPTCNTY)
NUMERIC
(040)
3

4.2 Unencoded Perpetrator State Dataset (UPSD) File

Upon receiving the PErp File, the State decrypts each perpetrator ID and identifies the corresponding perpetrator in the State information system. Once the perpetrator is identified, the State encrypted perpetrator ID, first initial, last name, and the date of birth are appended to the PErp file to create a new file. This file is called the Unencoded Perpetrator State Dataset (UPSD) file. The State encrypted perpetrator ID (STPERPID) is the encrypted version of the ID of the perpetrator in the State information system. In most cases this is the same as the perpetrator ID in the PErp file. The State encrypted perpetrator ID will be used to unduplicate perpetrators across all 5 years. The last name of the perpetrator in the UPSD file is not encoded. The UPSD file is the input for the encoding software application. This file is in text format (TXT).

 

Exhibit 4.2.1 Data File Specification for the UPSD File
FIELD #
(POSITION)
LONG NAME
(SHORT NAME)
FIELD TYPE & CODES
(Example)
FIELD
LENGTH
Record Example: CT200700004356ABDF1220200704000004356ABDFDSRRIA 12051990

Special Instructions:

The State encrypted perpetrator ID should be left-filled with zeroes, as needed to the 12 character length. For example, a perpetrator ID of "4356ABDF" is invalid. It should be reported as "00004356ABDF".

  • The report date and the perpetrator date of birth must be in month-day-year (mmddyyyy) format.
  • The unencoded first initial should be in caps. Ex: D for David.
  • The unencoded last name should be right-filled with spaces, as needed, to the 50 character length.
  • If data for a field are unavailable, the field should be filled with blank spaces in accordance with the field length.
  • Do not provide information on unknown, anonymous, and dummy perpetrators. In such cases the first initial (FIRSTINI), last name (LASTNM), and the date of birth (PERPDOB) fields should be blank spaces in accordance with the field length.
  • If a perpetrator cannot be located in the State system the State perpetrator ID (STPERPID), first initial (FIRSTINI), last name (LASTNM), and the date of birth (PERPDOB) fields should be blank spaces in accordance with the field length. Example: The perpetrator substantiation may have been overturned.

 

1
(1-2)
STATE/TERRITORY
(STATEAB)
ALPHABETIC
(CT)
2
2
(3-6)
SUBMISSION YEAR
(SUBYR)
NUMERIC
(2007)
4
3
(7-18)
NCANDS PERPETRATOR ID
(NPERPID)
ALPHANUMERIC
(00004356ABDF)
12
4
(19-26)
REPORT DATE
(RPTDT)
NUMERIC [mmddyyyy]
(12052008)
8
5
(27-29)
REPORT COUNTY
(RPTCNTY)
NUMERIC
(040)
3
6
(30-41)
STATE ENCRYPTED PEPETRATOR ID
(STPERPID)
ALPHANUMERIC
(00004356ABDF)
12
7
(42)
UNENCODED FIRST INITIAL
(FIRSTINI)
ALPHABETIC 1
8
(43-92)
UNENCODED LAST NAME
(LASTNM)
ALPHABETIC
(SMITH, smith, Smith)
50
9
(93-100)
PERPETRATOR DATE OF BIRTH
(PERPDOB)
NUMERIC [mmddyyyy]
(12051990)
8

 

4.3 Perpetrator Encoding (Pencoder) Software Application

The last name information in the UPSD file is encoded by the Pencoder software application. The output file from the Pencoder will be similar in format as the input file. However, the last name information will be encoded using the NYSIIS algorithm. The Pencoder software is provided to the State along with the PErp file.

The Pencoder will also validate the input file to make sure that all fields confirm to the specifications. The validation rules enforced in the Pencoder are as follows:

A valid State code should be entered in the State abbreviation (STATEAB) field. If invalid data are found, the entire record is removed.

  • The report date (RPTDT) field should have valid month, day and year values. If invalid data are found, the RPTDT field is blanked.
  • The report county (RPTCNTY) field should be three characters in length. Ex: 029, 001. If invalid data are found, the RPTCNTY field is blanked.
  • The NCANDS perpetrator ID (NPERPID) cannot be blank. If the field is blank, the entire record is removed.
  • The NCANDS perpetrator ID (NPERPID) field should be alphanumeric and 12 characters in length. If invalid data are found, the entire record is removed.
  • The State perpetrator ID (STPERPID) field should be alphanumeric and 12 characters in length. If invalid data are found, the entire record is removed. The STPERID field can be blank.
  • The first initial (FIRSTINI) field should be alphabetic. If invalid data are found, the FIRSTINI field is blanked.
  • The last name (LASTNM) field should be alphabetic. The only special characters allowed are the hyphen and the apostrophe. If invalid data are found, the entire record is removed.
  • The perpetrator date of birth (PERPDOB) field should have valid month, day and year values. If invalid data are found, the PERPDOB field is blanked.

 

The errors, if any, found during the validation and encoding process are reported in Results <date time>.txt file. This file is also an output of the Pencoder. States are requested to review this document, fix the errors, if any, in the UPSD file, and run it through the Pencoder again. The registry technical team liaison will be available to provide technical assistance.

More information about using the Pencoder software application is under section 5.0 Using the Pencoder Software Application.

4.4 Perpetrator State Dataset (PSD) File

This file is the output of the Pencoder software and is automatically generated after encoding the last names in the UPSD file. This file should be submitted to the prevalence study. This file is in text format (TXT).

Exhibit 4.4.1 Data File Specification for the PSD File
FIELD #
(POSITION)
LONG NAME
(SHORT NAME)
FIELD TYPE & CODES
(Example)
FIELD
LENGTH
1
(1-2)
STATE/TERRITORY
(STATEAB)
ALPHABETIC
(CT)
2
2
(3-6)
SUBMISSION YEAR
(SUBYR)
NUMERIC
(2007)
4
3
(7-18)
NCANDS PERPETRATOR ID
(NPERPID)
ALPHANUMERIC
(00004356ABDF)
12
4
(19-26)
REPORT DATE
(RPTDT)
NUMERIC [mmddyyyy]
(12052008)
8
5
(27-29)
REPORT COUNTY
(RPTCNTY)
NUMERIC
(040)
3
6
(30-41)
STATE ENCRYPTED PEPETRATOR ID
(STPERPID)
ALPHANUMERIC
(00004356ABDF)
12
7
(42)
UNENCODED FIRST INITIAL
(FIRSTINI)
ALPHABETIC
(D)
1
8
(43-92)
ENCODED LAST NAME
(LASTNM)
ALPHABETIC
(SRRIA)
50
9
(93-100)
PERPETRATOR DATE OF BIRTH
(PERPDOB)
NUMERIC [mmddyyyy]
(12051990)
8

5.0 USING THE PENCODER SOFTWARE APPLICATION

The State will receive encoding software application called the Pencoder. The UPSD file is the input to the Pencoder. The Pencoder will encode the last names of the perpetrators in the UPSD file using the New York State Identification and Intelligence System (NYIIS) algorithm. The Pencoder will also validate the input file to make sure that all fields confirm to the specifications.

5.1 System Requirements

The Pencoder is implemented as a relational database application in Microsoft Access 2003. Users must have Access 2003 or later installed on their computer. The Pencoder application operates in both Microsoft Windows XP/Vista/7 environments. A single Pencoder Access database file contains all of the system data, programming modules, tables, queries, forms, and reports necessary for the operation of the application. The input to the Pencoder is the State Unencoded State Dataset (UPSD) File. The output from Pencoder is the PSD File along with a report with results from the validation and encoding processing. All intermediate data sets created during the processing are contained within the Pencoder Access database. The Pencoder is distributed as a compiled Access file (MDE) along with essential documentation, including this Guidelines document. The distributed size of the Access file is about 8mb.

The hardware needed to run Pencoder includes: processor speed of 2.0 GHz, 1-2 GB of RAM and sufficient hard drive space to dedicate a gigabyte to the Pencoder database file (smaller datasets would need less hard drive space).

5.2 Installing the Pencoder on a Local Computer
 

The Pencoder is installed by unzipping and extracting the compressed (zip) Perpetrator package file:

C:\Registry\ Registry Pencoder <Date>.zip

into the user’s C:\Registry folder. When the extraction occurs, a file structure will be created under the C:\Registry\ folder with the following folder structure and file contents being populated:

  • C:\Registry\Pencoder\Input\BMRegistry.txt (BM test file);
  • C:\Registry\Pencoder\Output\Log.txt (Test file);
  • C:\Registry\Pencoder\Registry Pencoder.mde

5.4 Launching the Pencoder

The Pencoder is launched by double clicking on the Access file located at:

C:\Registry\Pencoder\Registry Pencoder.mde.

5.5 Location of Input and Output Files

The UPSD input file is typically stored in the C:\Registry\Pencoder\Input\ folder. The Pencoder places the encoded PSD File in the C:\Registry\Pencoder\Output\ folder. The filename of the output PSD File is of the format: <State>_yyyymmdd_hhmmss.txt, where the date and time in the file name is set to the time the file was opened for writing. The Results Report created by Pencoder is placed in the same folder with the filename format of Results<State>_yyyymmdd_hhmmss.txt.

6.0 SUBMITTING FILES TO THE REGISTRY PREVALENCE STUDY

A State submits the Perpetrator State Dataset (PSD) file and the Result Report file. The files are submitted to and from the States on the Collaborator.
 

The State user can upload the files to the Collaborator as follows:

Go to www.wrma.com

  • Click on "Extranet" on the top menu. A new page for the Collaborator will open.
  • Enter the login information provided to you.
  • Click on Registry_<State Name>.
  • Click on "Documents".
  • Click on "Registry".
  • Click on "Add Document" link on the top of the page.
  • On the "Add Document" page, scroll to the bottom of the page and click on the "Add" button.
  • On the "Open Document" dialog screen, browse and select the PSD file. Click on "Open".
  • Click on "Add Document" link on the top of the page.
  • On the "Add Document" page, scroll to the bottom of the page and click on the "Add" button.
  • On the "Open Document" dialog screen, browse and select the Results Report file. Click on "Open".
  • Click on the "Submit" button to submit your file.
  • The registry technical team liaison for your State will review the files you submitted and will contact you with any questions or concerns.

 

Thank you for your participation in the Prevalence Study. If you have any questions please contact the prevalence study liaison assigned to your State. Please contact Brett Brown at bbrown@wrma.com or 301-881-2590 if you do not have the contact information of the liaison.

 

Appendix A. Locating perpetrators in the State system

The PErp file contains all perpetrator IDs (associated with substantiated maltreatments) for a State for the last five years. The data have been extracted from the NCANDS Child Files. The perpetrator IDs have been unduplicated within each submission year. A single record consists of the State abbreviation, year, perpetrator ID, most recent report date, and the most recent report county. This file is provided to the State in text format (TXT).

There are a number of ways to locate perpetrators in the State system using the information in the Perp file. The typical way is to load the data in the PErp file into a database table in the State system. Since the Perp file is available in TXT format and has only 5 fields the extract, transform, and load process into the database should be straightforward. Once loaded the perpetrator ID can be used to match across the database table(s) with the perpetrator information.

If the encrypted perpetrator ID (submitted for NCANDS) is not stored in the State system then the PErp file needs to be processed to obtain the unencrypted perpetrator ID. Once the unencrypted perpetrator ID is obtained the entire PErp file with the unencrypted perpetrator ID can be loaded into a database table and matches using the perpetrator ID can be made as mentioned above.

Another way to locate perpetrators is to extract the required information for all perpetrators for the last five years from the State system. This extract file can be used to match against the PErp file. The UPSD file can be created by combining certain fields from both the files. The matching process might be inefficient as the perpetrator extract from the State system may be large. The use of a Database Management System (DBMS) is preferred.

View full report

Preview
Download

"ResearchReport.pdf" (pdf, 4.61Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"ReportMethodology.pdf" (pdf, 738.21Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"CaseLawReview.pdf" (pdf, 598.85Kb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®

View full report

Preview
Download

"StatuteInterimReport.pdf" (pdf, 1.04Mb)

Note: Documents in PDF format require the Adobe Acrobat Reader®. If you experience problems with PDF documents, please download the latest version of the Reader®