# aldvmm

This post is written by Mark Pletscher, Institute of Health Economics and Health Policy, Bern University of Applied Sciences.

# Introduction

Health-related quality of life is a key outcome in health technology assessments because it is patient-relevant and it is needed to calculate quality-adjusted life years. As 100% quality of life represents perfect health, health state utilities are limited at 1. The lowest possible utility in a local value set further defines a lower limit of health state utilities in a local population, and local value sets often show gaps between 1 and the next smaller utility value. Thus, health state utilities are limited dependent variables. In addition, they can be the consequence of multiple latent classes, or they can exhibit multi-modal marginal densities .

The goal of the aldvmm package is to fit adjusted limited dependent variable mixture models of health state utilities using the likelihood and expected value functions proposed by . Adjusted limited dependent variable mixture models have been frequently used for mapping studies , but they can also improve assessments of incremental and average marginal effects of medical interventions or health problems.

# Methods

Adjusted limited dependent variable mixture models are finite mixtures of normal distributions in $$K$$ components $$c$$ with conditional expectations $$E[y|X, c] = X\beta^{c}$$ and standard deviations $$\sigma^{c}$$. The probabilities of component membership are modeled as a multinomial logit function $$P[c|X]=exp(X\delta^{c})/\sum_{k=1}^{K}exp(X\delta^{k})$$. The model accumulates the density mass of the finite mixture below a minimum value $$\Psi_1$$ at the value $$\Psi_1$$, and the density mass above a maximum value $$\Psi_{2}$$ at 1. If the maximum value $$\Psi_2$$ is smaller than 1, the model emulates a value set with a gap between 1 and the next smaller value.

$\begin{equation} \label{eq:limits} \begin{array}{ll} y_{i}|c =& \begin{cases} \begin{array}{ll} 1 & \text{if } y_{i}|c > \Psi_{2}\\ \Psi_{1} & \text{if } y_{i}|c \leq \Psi_{1}\\ y_{i}|c & \text{if } \Psi_{1} < y_{i}|c \leq \Psi_{2}\\ \end{array} \end{cases} \end{array} \end{equation}$

# Usage

The aldvmm() function fits an adjusted limited dependent variable mixture model. By default, the aldvmm() function estimates mixtures of two components, but the number of components can be set by the user using the argument ncmp. If ncmp is set to 1, the model fits a tobit-like single-component model with a gap between 1 and the next smaller utility value specified in psi. We fit a simple two-component model with gender as the only explanatory variable for component means and an intercept-only model for the probability of component membership.

library("aldvmm")
data("utility")

fit <- aldvmm(eq5d ~ female | 1,
data = utility,
ncmp = 2,
init = "zero",
psi = c(-0.594, 0.883),

summary(fit)

The model formula in aldvmm() is an object of class “formula” with two parts on the right-hand side of ~. The first part on the left of the | delimiter represents the model of expected values of normal distributions. The second part on the right of the | delimiter represents the model of probabilities of component membership.

The argument optim.method accepts all optimization methods available in the optimr package except for “nlm,” which requires a different implementation of the likelihood function.

The argument init accepts four options for the generation of starting values of the optimization algorithm.

1. “zero”: A vector of zeroes (default).

2. “random”: A vector of standard normal random values.

3. “constant”: Parameter estimates of a constant-only model as starting values for intercepts and standard deviations, and zeroes for all other parameters.

4. “sann”: Parameter estimates of a simulated annealing algorithm.

We obtain a summary table of regression results (table 1) using the generic function summary(). The coefficients of the model of expected values of normal distributions $$E[y|c, X]$$ can be interpreted as marginal effects on component means. ‘lnsigma’ denotes the natural logarithm of the estimated standard deviation $$\sigma^{c}$$. The coefficients of covariates in the multinomial logit model of probabilities of component membership are log-transformed relative probabilities. Our model only includes two components, and the multinomial logit model collapses to a binomial logit model. The intercept of 2.012 means that the average probability of an observation in the data to belong to component 1 is exp(2.012) or 7.48 times the probability to belong to component 2.

Table 1: Regression results with the “Nelder-Mead” optimization method and zero-only initial values
EstimateStd. Err.zP>|z|[95% Conf. ]
E[y|X, c]
Comp1(Intercept)0.4270.03611.9690.0000.3570.497
female0.1840.0404.6300.0000.1060.262
lnsigma-1.6740.091-18.4290.000-1.852-1.496
Comp2(Intercept)0.0080.0420.1800.857-0.0750.091
female-0.1020.049-2.0750.038-0.198-0.006
lnsigma-2.0480.181-11.3040.000-2.403-1.693
P[c|X]
Comp1(Intercept)2.0120.3535.6990.0001.3202.704
N = 200ll = -28.80AIC = 71.61BIC = 94.70

We cannot interpret coefficients in terms of expected quality of life, but we can use predictions to calculate incremental effects. Standard errors of incremental effects can be calculated using the delta method (See vignette for example code).

tmpdf <- utility[utility$female == 1, ] pred1 <- predict(fit, newdata = tmpdf, se.fit = TRUE) tmpdf[, "female"] <- 0 pred0 <- predict(fit, newdata = tmpdf, se.fit = TRUE) atet <- mean(pred1$yhat - pred0\$yhat)

In this example, we calculate the average treatment effect on the treated for being female. The expected quality of life is 15.24 percentage points higher for women than for otherwise identical men.

# Discussion

The aldvmm package makes adjusted limited dependent variable mixture models available to R users and offers a broad set of optimization algorithms and methods for generating initial values.

The comparison of different optimization methods with EQ-5D-3L utility data from English patients after hip replacement in 2011 and 2012 showed that the likelihood function can be challenging to maximize and can converge at extreme solutions (see vignette). Parameter estimates varied considerably across optimization methods and even across optima with the same log-likelihood. However, fitted values were very similar across optimization approaches which suggests that the model is more robust for prediction tasks than for parameter identification.

Although coefficients of models of normal means can be interpreted as marginal effects within each component, they cannot be interpreted in terms of overall expected values. Thus, average marginal effects and average treatment effects need to be calculated from predictions using the generic function predict(). Standard errors of marginal effects or average treatment effects can be calculated using the delta method (see example code in the vignette).

In situations with repeated measures, the aldvmm package only allows fixed effects estimations with group- and time-specific fixed effects which can be an important limitation in the analysis of clinical data. However, time fixed effects can be an appropriate modeling strategy in the presence of general time trends and dynamic selection, e.g. when health state utilities decrease over time and treated individuals survive longer and thus are over-represented in later measurements.