In a recent Health Affairs Blog post, Mark Warshawsky raises a number of objections to our simulations of financing options for long-term services and supports (LTSS), described in our recent Health Affairs article. As is true for any complex model, our projections can be improved, and we welcome constructive feedback.
However, we take issue with most aspects of Warshawsky’s critique, which is based largely on misleading comparisons and a selective reading of the literature. We stand behind our original analysis and remain convinced that DYNASIM, the model used to generate our simulations, is a valuable tool for projecting LTSS spending and evaluating alternative financing options.
The need for LTSS is one of the greatest financial risks facing older adults — disability is common at older ages, paid help is expensive, and third-party payment is rare until people exhaust all of their resources. Developing alternative payment options for LTSS, through public or private initiatives, is a pressing policy issue.
Evaluating the costs and benefits of alternative financing options, however, is difficult. We spent several years extending DYNASIM—which already simulated income, wealth, and acute health care expenses—to include projections of disability, cognitive impairment, paid home care, residential care, nursing home care, and private long-term care insurance. As with any such projections, ours require many definitional and analytic choices and assumptions. Our model development and policy simulations were guided by an exhaustive review of the relevant literature, careful analysis of nationally representative data, and input from an impartial technical advisory panel that included many of the nation’s most respected economists, actuaries, and long-term care experts.
The goal of our modeling exercise was to measure the relative advantages of various LTSS financing solutions. Because the benefits for both private insurance and Medicaid are triggered only after someone has a high level of functional impairment, we focused on those with severe disabilities. As a result, our analysis differs from other work that attempted to look at those with any need for LTSS, even a low level that would fall far short of triggering LTSS benefits.
Warshawsky alleges that we systematically understated LTSS costs and thus the price tag for any new government LTSS program. On the contrary, the choices we made were credible and responsible, and fully supported by the data. Our estimates might appear low to Warshawsky because we considered only older adults with severe disabilities and we excluded those services that are not long term. Most important, many of the comparisons he made to evaluate our model are simply wrong. Our projections make sense when researchers make apples-to-apples comparisons, but not when they compare our apples to their oranges.
We respond to his major points below.
Is Our Simulated Population Too Healthy?
Warshawsky claims that the underlying population we used to simulate policy options is too healthy, causing us to understate LTSS costs. He reaches this conclusion by comparing life expectancy in DYNASIM with Social Security’s projected life expectancy, but he compares the wrong statistics.
Demographers use period life expectancy, a synthetic cohort measure, to summarize what mortality experience would be if people at a given age today experienced current-period mortality rates for the rest of their life. The cohort life expectancy our model generates, in contrast, reflects time to death for those in a cohort based on the rates experienced by that actual cohort (i.e., a 65-year old in 2015 will experience 2015 rates for age 65 year olds, 2016 rates for 66 year olds, 2017 rates for 67 year olds, and so forth), thus capturing expected mortality reductions over time.
Warshawsky contends that DYNASIM’s population is longer-lived than the “actual” population because the reported DYNASIM cohort life expectancy for those turning age 65 between 2015 and 2019 (the apple) is higher than the Social Security Trustees’ period life expectancy for those age 65 in 2015 (the orange). Not surprisingly, the two measures differ, by more than a year in this case. But this is consistent with information in the Social Security Trustees Report. Comparing Tables V.A4 and V.A5, for example, reveals the same pattern: Period and cohort life expectancies differ by about a year for a single cohort turning 65 now.
How Many People Will Become Disabled and for How Long?
Warshawsky comments extensively on how our disability estimates differ from some others in the literature, but he again often compares apples with oranges. Our projections focused on people age 65 and older with severe disabilities who need long-term help, not people with short-term disabilities or those with only some difficulty performing everyday activities. We classified older adults as having a disability if they needed help with two or more activities of daily living (ADLs) for at least 90 days or had severe cognitive impairment, simulating the criteria in the Health Insurance Portability and Accountability Act (HIPAA) for benefit eligibility under tax-qualified long-term care policies.
To project disability, we extensively reviewed the disability literature, including all of the papers Warshawsky cited, comparing estimates of outcomes such as expected duration of disability at the HIPAA level. It was striking how challenging it was to make apples-to-apples comparisons using the literature, because of the many ways in which estimates can vary, including the following:
- inclusion or exclusion of the institutionalized population;
- types of ADLs included in the analysis;
- wording of the ADL question in the survey questionnaire and how closely it aligns with the language in the HIPAA statute, which requires that individuals be “unable to perform (without substantial assistance from another individual) at least two activities of daily living for a period of at least 90 days due to a loss of functional capacity,” a higher threshold than simply having difficulty with an ADL;
- measurement of severe cognitive impairment (for example, the Aging Demographics and Memory Study, the Telephone Interview for Cognitive Status, or self-report of a diagnosis), including how analysts treat respondents who refuse to participate or are incapable of participating in cognitive tests;
- treatment of those nursing home residents who do not appear to meet the HIPAA disability criteria;
- treatment of people in the year of death, including availability of proxy responses;
- treatment of attrition and introduction of new samples in longitudinal surveys;
- interval over which disability is measured, because disability often varies over time; and,
- whether and how disability duration factors into the analysis, because HIPAA requires that disabilities are expected to last at least 90 days.
Comparisons of disability estimates become even more problematic when data quality varies, cohorts examined vary, and the statistics themselves differ (such as when some studies report conditional means—excluding people who don’t have a disability—and others report unconditional means—including everyone, such as those without disabilities).
What about Warshawsky’s comparisons? We estimated that, on average, HIPAA-level LTSS needs last 2.0 years. Based on his own past research (Murtaugh, Spillman, and Warshawsky 2001), he claims that this is too low, and the average duration should be 2.2 years. We won’t quibble about whether a difference of 0.2 years is really meaningful. Instead, we’ll point out that our 2.0 year estimate is an unconditional mean estimated on a sample that included people who never become severely disabled, whereas his 2.2 year estimate is a conditional mean estimated on a sample of people who experienced some severe disability. An apples-to-apples comparison shows that our estimates of the duration of HIPAA-level disability are actually longer than his estimates, which is reasonable because he and his colleagues examined the 1930 birth cohort whereas we examined the 1950 to 1954 birth cohorts, which will experience much longer average lifespans. As table 1 in our related report for the Department of Health and Human Services (HHS) shows, we estimated a conditional average duration of 3.9 years, compared with 2.2 years for Murtaugh et al., and an unconditional duration of 2.0 years, compared with 1.5 years for Murtaugh et al.
What Should LTSS Include?
We attempted, albeit imperfectly, to exclude services that encompass strictly post-acute care, such as a stay in a skilled nursing facility by a senior who has had a hip replaced and is expected to return to full function within three months. Like many analysts, we did not classify such services, which are usually paid by Medicare, as LTSS since they are not long term. We didn’t want our new simulated LTSS financing mechanisms to cover those post-acute care expenses. However, the Congressional Budget Office (CBO) included those costs in one table in a 2013 report that Warshawsky cited as evidence that we underestimated LTSS spending.
Another assumption that affects the composition of LTSS expenses by payer is the treatment of residential care. This is a thorny issue, as “assisted living” facilities vary quite widely and provide differing levels of personal care and other supportive services. We focused on those facilities that meet criteria for residential care communities set out in a recent report from the National Center for Health Statistics. Also, different sources treat the room and board component in such facilities differently when estimating costs. The CBO report, for example, clearly states that it includes only services in these settings, not room and board. We clearly state that we do include room and board, and, in the appendix to our related HHS report, we show how our cost estimates change when we exclude them.
Again, it is important to recognize when one is comparing fundamentally different concepts. Our baseline estimate does not include post-acute care but does include room and board costs in residential care. CBO’s estimates do include post-acute care but do not include room and board costs in residential care. As a result, our estimates differ from CBO’s; we would worry if they did not. The two statistics differ even further in that CBO’s table reports point-in-time spending for the full population age 65 and older in 2011, and our study reports expected lifetime expenses for those turning age 65 between 2015 and 2019.
Warshawsky also notes that our estimate that 8.6 percent of the older population is covered by private long-term care insurance is lower than CBO’s 13 percent estimate, based on data from the Health and Retirement Study (HRS). However, our estimate was for the cohort born between 1976 and 1990, whereas CBO’s estimate was for today’s older population. Moreover, CBO’s estimate is overstated because it fails to account for respondents who confuse long-term care insurance with medical insurance; a better estimate of private long-term coverage for adults ages 65 and older in 2014 is 11 percent. Because recent erosion in private coverage is likely to continue, according to our technical advisory group, an age-65 coverage rate of 8.6 percent 30 to 40 years from now seems reasonable.
Do Our Disability Trends Make Sense?
Projecting future levels of disability is extremely difficult, especially given rapid changes in medical technology. Warshawsky evaluates the reasonableness of our disability trends by considering only changes in the age composition of the population, but other factors come into play. For example, disability rates have historically declined as educational attainment rises, and cohorts entering retirement now are better educated than earlier cohorts. Both in the US and worldwide, dementia incidence has recently declined (Langa et al. 2016; Matthews et al. 2013; Rocca et al. 2011; Satizabal et al. 2016; Stallard and Yashin 2016), and future LTSS spending will depend critically on how these trends continue to evolve. Our projections assume increased life expectancy will be shared between healthy and disabled life. Like Warshawsky, we recommend more sensitivity analysis.
Are We Projecting Too Much Unpaid Care?
Warshawsky claims that we are projecting too little paid care and too much unpaid care, thus understating LTSS costs. The professional literature provides overwhelming evidence that many older adults rely on substantial amounts of unpaid care from family and friends, even if they have a high level of disability (Freedman and Spillman 2014). Warshawsky holds up as a gold standard an estimate from Stallard (2011), based on data from the 1984, 1989, and 1994 National Long-Term Care Survey, that roughly two-thirds of care for people who are at the HIPAA standard is paid care. Although Stallard’s comprehensive study is an important one, his data precedes the enactment and implementation of the Balanced Budget Act of 1997, covering a time when Medicare coverage of home health was comparatively high and growing (McCall et al. 2001; Murtaugh et al. 2003). Also, the share of Medicare beneficiaries being held for observation in hospitals, rather than admitted, has been rising (Feng, Wright, and Mor 2012), which can limit Medicare payments for subsequent LTSS and raise out-of-pocket liability. As payment policies like these evolve, incentives are changing for families choosing how to balance unpaid family care and paid care.
Our own extensive empirical analyses of more recent data from the HRS, National Health and Aging Trends Study (NHATS), and National Long-Term Care Survey suggest a lower proportion of care that is paid than Stallard’s estimates. In 2011 NHATS data, for example, roughly half of HIPAA-qualified elderly adults receive only unpaid care.
Moreover, measurement difficulties in this field are daunting, making us reluctant to designate one study as clearly superior to others. To give two examples, caregiver prevalence and intensity estimates generally differ markedly between surveys of caregivers and surveys of care recipients, and they are sensitive to the reporting period used by a survey (last week, last month, last year).
Does our Use of Two Models Bias Our Results?
We simulated both voluntary and mandatory variants of LTSS financing options, and Warshawksy suggests that our use of a second actuarial model outside of DYNASIM to evaluate some of these options may have biased our results against voluntary programs. A valuable feature of our modeling project was our ability to marry DYNASIM’s microsimulation tool with extensive actuarial information about the currently insured population. This not only allowed us to validate our own analysis, it provided us with the ability to calculate premiums in various reform scenarios.
Our choice of which model to use for a particular analysis was based on straightforward sampling considerations, with no underlying agenda. Milliman Inc.’s rich Long-Term Care Guidelines data, representing an insured population and not general population data, was the best choice for developing cost estimates for voluntary programs, which would draw from a self-selecting population likely to resemble the insured population more closely than the general population. DYNASIM, representing the Social Security Area population, was the best choice for developing cost estimates for mandatory programs, which would cover the broader population. The key difference between the mandatory and voluntary analyses is that the voluntary analyses incorporated a model of participation and adverse selection, the assumptions for which were developed and documented by our Milliman colleagues (Giese and Schmitz 2015). The distributional analyses for both simulated programs used DYNASIM solely, differing only in the application of the participation/selection model.
How Reasonable Are Our Take-Up Assumptions?
Warshawsky questioned our assumptions around take-up of public benefits. We relied heavily on the literature on take-up broadly (Currie 2006), and on take-up of both cash benefits like Supplemental Security Income (SSI) and food stamps (Elder and Powers 2006; Haider, Jacknowitz and Shoeni 2003), cost-sharing relief as in Medicare savings programs (Rupp and Sears 2000; Sears 2001/2002), and in-kind benefits like Medicaid (Sommers et al. 2012). These studies generally conclude that take-up rates are well below 100 percent in public programs, even for cash or cash-equivalent benefits.
Again, we agree with Warshawsky that sensitivity analyses would be useful, but we do believe it is reasonable to assume that take-up could be below 100 percent in a public disability-related program that serves a severely disabled population, a large part of which is cognitively impaired and close to death.
Warshawsky’s ultimate concern is that we underestimated LTSS costs and the potential liability that new government programs could generate. We agree that models must not underestimate costs regardless of the financing source. However, we have seen no evidence that we systematically underestimated costs. In fact, it may be reassuring that our cost estimates for LTSS users line up reasonably with those by Kemper, Komisar, and Alecxih (2005/2006), which employed methods similar to our own, once we account for life expectancy growth and inflation. The main difference in findings is that we project some shift from nursing homes to home- and community-based services, a trend that has been well documented since the 1990s and early 2000s, the period used by Kemper and his colleagues to generate their estimates.
Moreover, there is substantial overlap between our qualitative findings and others from the literature. We conclude that:
- the risk of needing LTSS is significant;
- durations of LTSS needs and use are skewed, with women, lower-income adults, and adults with less education facing especially high risks;
- due to adverse selection, voluntary programs without underwriting are likely to cost more than mandatory programs that provide equivalent benefits; and,
- back-end programs tend to generate more Medicaid savings on a dollar-for-dollar basis than front-end programs.
We always welcome feedback on our modeling efforts and continue to learn from new data and perspectives. We appreciate Warshawsky’s suggestion to report more sensitivity analyses and more frequently show bands for projections. We often make these same arguments ourselves and intend to follow through in forthcoming work.
However, much of Warshawsky’s critique is based on apples-to-oranges comparisons and objection to a few parameter choices that we drew from the extensive peer-reviewed literature in the field. We respect Warshawsky’s preference for alternative definitions and assumptions in some cases, but our choices have been neither unreasonable nor irresponsible, especially given the inherent uncertainty of future trends and the difficulty of measuring several key outcomes.