Northwest Tribes have faced significant barriers in obtaining public health data that are accurate, timely, and relevant to their communities. This lack of community-level health data has contributed to Tribal health disparities, as it has hampered Tribes’ efforts to identify public health priorities, seek resources for health promotion and disease prevention initiatives, and monitor changes in community health status over time.

Tribes are sovereign nations and have an inherent right to control the collection, ownership, and use of their own data. Data are an important tool used by sovereign nations to inform policies and decision-making and are a key to building tribal health systems that protect and promote health for current and future generations.

The IDEA-NW project works to address data challenges and improve tribal data sovereignty by increasing Northwest Tribes’ access to accurate public health data for their communities. We work to reduce the misclassification of AI/AN people in a wide range of public health data systems, including cancer registries, vital records, hospital discharge data, and communicable disease systems. We are also working to modernize NPAIHB’s data reporting systems by developing a Northwest Tribal Data Hub to provide Tribes with easy access to regional, state, and community-level public health data.

The IDEA-NW project is supported by grants from the Indian Health Service, Centers for Disease Control and Prevention, and other funders.


  • Work with Tribal, state, and federal partners to increase Tribes’ access to public health data.
  • Conduct record linkages with state and federal systems to improve the accuracy of data for Northwest Tribes.
  • Develop the Northwest Tribal Data Hub to provide Tribes with easy, on-demand access to reliable regional, state, and community-level public health data.
2 canoes









Linkage Resources


Tribal health leaders have long recognized the necessity of having complete and accurate race data as a first step to addressing health disparities experienced by American Indians/Alaska Natives (AI/AN). Numerous studies have shown high prevalence of race misclassification for AI/AN in data sources such vital statistics and cancer registries. This results in underestimated morbidity and mortality, hampering public health decision-making and the appropriate allocation of disease control resources.

Using the most complete listing of AI/AN currently available—a roster of individuals who have registered at tribal, Indian Health Service, and urban Indian clinics in the northwest—we perform record linkages with health data systems in Idaho, Oregon, and Washington. The prevalence of misclassified and missing race data in this region can range from 30-60%, which if left uncorrected, would significantly underestimate the burden of health outcomes for this population. Our work directly benefits both state partners and tribes by improving the accuracy of race data in state surveillance data systems, and providing more accurate and complete health status data to northwest tribal communities.  To date, linkages have been conducted with state cancer registries, death records, hospital discharge data, STD surveillance systems, and several tribe-specific projects. This work is widely supported by tribal health leaders and our state partners.


What is record linkage?

Record linkage is the process of comparing records across data sets to identify individuals contained in both.  In Indian Country, one common example involves taking a data source with accurate information about American Indian/Alaska Native ancestry and linking it with a second dataset to improve the quality of race information in the second database.

Linkages can supplement or validate data across data sets, as well as identify duplicate records on the same individual within one data set.  Common examples include merging death information from a vital statistics file with cancer information from a central center registry; or linking data from death certificates, inpatient hospitalizations, and law enforcement citations to generate crash and injury reports as in NHTSA’s CODES Program.  Likewise, the detection of duplicates is a fundamental requirement for accuracy and validity of event counts in any disease registry.

Linkages fall into two main categories, deterministic and probabilistic.  Deterministic linkage compares data fields to look for exact matches across data fields of a record; a fairly straightforward process, but may result in many missed matches if there are coding errors or missing data. Probabilistic matching has several advantages over exact matching methods, such as the ability to:

  • Account for coding differences between the two files, such as the use of nicknames, middle initials vs. full middle names, and transposed digits in a social security number
  • Account for both the likelihood that two records represent the same person (sensitivity), and the likelihood that they do not (specificity)
  • Assign score weights depending on the frequency of a value (e.g., your dataset contains many “Smiths” but few “Hoopes” so a match on “Hoopes” would be weighted higher)
  • Allow for phonetic name matching (e.g., NYSIIS and Soundex)

For more detailed information about linkage concepts, see the Linkage Concepts PowerPoint in the “Resources” section.




Getting Started

The process of pursuing record linkages varies across states, departments, and institutions, but here we offer some tools that may help you get started.  First, it is important to contact the manager of the data source with which you wish to link to discuss the project and determine specific approval processes that the organization may have. We generally develop a simple IRB protocol, which often qualifies for expedited review, and negotiate a data sharing agreement with the agency we’ll be linking with.  Confidentiality pledges can be used to specify data handling and disclosure protocols required of staff with access to confidential data. Examples of these documents are provided below.

Linkage Concepts

This PowerPoint presentation covers the basic concepts of deterministic and probabilistic linkage.

Software Options

We recommend using the Match*Pro software for conducting probabilistic linkages and de-duplications. The software is free, easy to use, and efficient. A free training on utilizing earlier versions of Match*Pro is available through the North American Association of Central Cancer Registries.

Another probabilistic linkage software option is CDC’s Link Plus.

Additional Resources

Here is an example of an IRB Protocol describing linkage methods using Link Plus Software.

This document contains a sample template for a Data Sharing Agreement (may also be called Data Use or Data Exchange Agreement) and Use and Disclosure of Client Information. Within the data sharing agreement there are important areas to consider for inclusion. At a minimum, the agreement should specify the following: parties involved, including contact information; the purpose or need for the data sharing agreement; nature of the data to be collected; access and confidentiality of data; how the data is to be used; how and in what situations the agreement can be severed by either party; and relevant legal authorities (tribal, state, national). For more support with developing data sharing partnerships and agreements, visit NativeDATA.

Confidentiality Pledge may be used to outline the rules for internal access to a data set containing direct personal identifiers, such as a patient registration list or tribal enrollment list, which may be used for record linkages. Technical details of data exchange between multiple parties should be detailed separately in a data sharing agreement.

Please contact us with questions or comments on this material, using the “Program Contacts” below.

Program Contacts

For questions about IDEA-NW Project contact the staff:

Sujata Joshi, MSPH, Project Director

Victoria Warren-Mears, PhD, RD, Principal Investigator; EpiCenter Director


More NPAIHB Programs

The owner of this website has made a commitment to accessibility and inclusion, please report any problems that you encounter using the contact form on this website. This site uses the WP ADA Compliance Check plugin to enhance accessibility.