Editor’s Note: This is the last in a series of posts on health and health care disparities that Health Affairs Blog is publishing in conjunction with the new March/April issue of Health Affairs on Disparities: Expanding The Focus, published with support from the Robert Wood Johnson Foundation. Brian Smedley, Richard Epstein, Dora Hughes, and Tom Miller contributed ealier posts in the series.

Pay-for-performance (P4P) has gained prominence as a way to incentivize quality improvement, and more recently has come to be seen as a potential approach for reducing racial and ethnic disparities in health care. Casalino and colleagues and Chien and colleagues have each raised significant philosophical concerns about the potential impact of current P4P programs on racial and ethnic disparities. For example, P4P could result in minority patients being less likely to get care because they may be perceived as having the potential to lower quality scores. Casalino and coauthors suggest several strategies for ameliorating the potential negative effects of P4P on disparities, including rewarding both absolute quality and improvement over time and using risk adjustment techniques to ensure the most accurate measurement of quality.

The recent development of a program in Massachusetts has highlighted a set of additional practical considerations for designing P4P programs that specifically target disparities reduction. The program, mandated as part of the state’s 2006 comprehensive healthcare reform bill, focuses on improving the quality of inpatient care for the state’s Medicaid program, and includes a legislatively mandated provision requiring the use of measures to document the reduction of racial and ethnic disparities. To the best of our knowledge, this is the only P4P program with such a provision currently operating in the country.

One of the first major issues to arise in Massachusetts was determining what set of measures should be used for evaluating disparities reduction. Because there are few within-hospital disparities in the Hospital Quality Alliance measures – the most common measure set used by hospitals – they become impractical for use in a disparities reduction program. In contrast, while studies of patient experiences with care consistently show poorer experiences among minority patients, it is unclear whether this reflects true differences, perceived differences, or differences in the psychometric properties of the measurement scales being used.

As a result, structural measures related to care for minority patients will play a large role, at least initially, in the Massachusetts effort. Structural measures include indicators such as translating patient education and other materials into languages other than English, ensuring that interpreters meet minimum performance standards, and that patient satisfaction surveys assess cultural competence in care provision. Improvements in structural measures such as these may be a precondition for reducing disparities in care, but there is a need for the development of objective evaluation standards and for the clearer specification of such measures – including the ability to provide more than a simple dichotomous response or the length of time for which the activity has been occurring.

The second issue is how to design a program that accounts for the size of the minority patient population in each hospital, in order to avoid penalizing hospitals that serve more minorities. In order to ensure similar disparities reductions, hospitals with more minority patients may need to conduct outreach or change services for more patients and at greater cost than hospitals with smaller minority patient populations. In contrast, hospitals serving small minority patient populations may have their quality measures disproportionately affected by what happens in the process of caring for very few patients. If P4P programs are designed to be budget neutral, with money being withheld from low performers and used to pay high performers, an appropriate adjustment for the size of each hospital’s minority patient population is crucial to ensure a level playing field.

Third, the importance of risk adjusting for patients’ health and socioeconomic status when assessing disparities in quality has been addressed by Casalino and colleagues. However, such strategies typically neglect other crucial factors such as differences in health literacy, attitudes and beliefs about health, and lifelong deprivation among certain patient populations. Factors such as these need to be taken into account when considering P4P programs to reduce disparities, as they affect patients’ ability and willingness to understand, accept, and adhere to treatment recommendations. This points to the need for a broad approach to considering risk adjustment strategies in order to avoid creating incentives that discourage caring for minority patients, without “adjusting away” issues of particular concern.

Fourth, Casalino and colleagues discuss the importance of rewarding both overall quality and reductions in disparities. We remain skeptical of the extent to which quality improvement activities will result in a “rising tide that lifts all boats;” it may be that improving quality for minority patients requires different, more complex, or more culturally competent efforts than do generalized quality improvement programs. Thus, P4P programs such as the one in Massachusetts must decide whether their primary goal is to reduce disparities between advantaged and disadvantaged groups, to improve quality for disadvantaged groups, or both.

Finally, a solid P4P program to reduce disparities is predicated on collecting accurate race and ethnicity data. In Massachusetts, where the P4P program affects only hospitals, there is a state regulation in place mandating hospital collection of patient race/ethnicity data. Nationally, however, recent research has shown that while more than three-quarters of hospitals collect data on patients’ race and ethnicity, fewer than one in five use the data to assess inequalities in quality of care, health outcomes, or patient satisfaction. In addition, one study found that half of hospitals that collect race/ethnicity data do so based on their admitting staff’s observations of the patient’s appearance or last name – a far from desireable approach. Without accurate, self-reported race and ethnicity data for patients, all measures used by P4P programs to compare the performance of different groups are subject to significant bias.

The new Massachusetts program is a small but bold experiment in a field in need of innovative approaches. The state Medicaid program received guidance from an expert consensus panel convened by the Massachusetts Medicaid Policy Institute to address many of the complex issues in designing the P4P program, and the initial financial risks and rewards are relatively small – as they should be, given many of the measurement and implementation concerns. It is too early to learn from the Massachusetts effort, which was implemented in November 2007. However, future results from this foray into the intersection between P4P and disparities reduction will provide information about the merits and pitfalls of such an approach, from which others can learn in order to build more effective programs in the future.