A Varying Coefficient Model to Measure the Effectiveness of Mass Media Anti-Smoking Campaigns in Generating Calls to a Quitline

Background Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. Methods Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. Results The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. Conclusions Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model.


INTRODUCTION
Smoking is the single largest preventable cause of death and disease in Australia. It was estimated that over 19 000 people died from tobacco-related diseases in 1998. 1 The total social costs of tobacco use have been estimated at over 21 billion Australian dollars annually-including health care, hospitalizations, loss of productivity and earnings due to premature death, and other direct and indirect costs. 2 Thus, encouraging individuals to stop smoking is an important public health challenge.
In Australia the prevalence of smoking consistently declined from the early 1960s to the early 1990s, but stalled in the mid-1990s. 3,4 In an effort to reduce smoking prevalence, in June 1997, the Australian federal government collaborated with the Australian States and Territories to launch the National Tobacco Campaign (NTC). This is Australia's most intense and sustained mass media tobacco control campaign. The major aim of this initiative was to show television commercials with intense content, so that smokers would immediately attempt to quit. The target for the NTC campaigns was smokers between 18 and 40 years of age. The Quitline telephone helpline service is a population telephone-based program that was promoted as part of the NTC program and provides a first point of contact to assist smokers who wish to quit. It is a flexible and cost-effective campaign, and is easily accessible to a large population. 5 It has also been shown to be an effective aid in smoking cessation. [6][7][8] Several studies have examined the association between the amount of mass media anti-smoking advertising, as measured by target audience rating points (TARPs), and number of calls to a telephone-based quitline. [8][9][10] TARPs are a measure of television advertising weight, and are used to indicate the number of people in a particular demographic group exposed to an advertisement within a specified period of time. 17 Miller et al 8 used relatively simple regression analysis to show that there is a linear relationship between number of calls and TARPs. Their method assumes that the relationship was fixed through time. However, this assumption may not be valid as these relationships may vary over time in response to changes in the type, intensity, and placement of advertisements. In addition, the number of calls may not depend entirely on TARPs, but also on other, unknown factors. For example, in Figure 1 we plot the daily number of calls (bottom) to Quitline in Victoria, Australia, from August 2000 until the end of July 2001, along with 2 anti-smoking advertising campaign variables: TARPs for the Quit campaign (middle) and the nicotine replacement therapy (NRT) campaign (top) in Victoria. The NRT campaigns were run by pharmaceutical companies promoting their products, eg, patches, gum, and inhalers. The figure suggests an increase in number of calls when the TARPs for both campaigns increase, although this increase did not always correspond with the size of the TARPs, as would have been expected, particularly from mid-December 2000 to mid-January 2001, which is the Australian summer holiday period. Figure 1 also shows that, in 2001, the TARPs for both campaigns were 0 from March to May, and from mid-May to early June, and the TARPs were lowest from mid-June until approximately the end of August. Figure 2 shows box-and-whisker plots of the daily number of calls and TARPs for both campaigns for each day of the week of the study period. The number of calls shows a weekly cyclic pattern, with a peak from Monday through Wednesday and then a gradual decline to a minimum on Sunday. However, TARPs by day behaves differently for each campaign: TARPs for Quit are centered on the period from Monday through Wednesday, similar to the trend in call volume; however, TARPs for the NRT campaign are more evenly spread throughout the week. Moreover, the number of calls are relatively high on Thursday and Friday, when the TARPs for Quit are low.
To model a complex process that evolves over time, as in the present case, standard regression models can be replaced by a number of models, such as generalized additive models 11 and semi-varying coefficient models. 12 In this study we sought to model the relationship between number of calls to Quitline Victoria, TARPs for both the Quit and NRT campaigns, and day-of-the-week effects, using a semivarying coefficient model comprising parametric and nonparametric components whose coefficients are allowed to change smoothly with time. By doing so, we were able to evaluate the usefulness of this method in modeling complex public health data.

Data
We use daily data on total number of calls to Quitline Victoria, total number of Quit antismoking advertisements on free-toair media, and TARPs for both the Quit and NRT programs during the period from 6 August 2000 through 31 July 2001. The data used in this study were de-identified and collapsed into daily total counts for the purposes of the study. Therefore, the use of the data did not require ethical approval.

Statistical methods
Semiparametric models allow us to fit parametric models to known features of the data and nonparametric terms to the unknown component. This aids both in making inferences about the known features and in detecting unknown structures.
For the purpose of this study, there are 3 types of individuals: smokers who are not considering quitting, smokers that are considering calling Quitline and quitting, and smokers that have called Quitline. Suppose observations are taken over the time interval ðt jÀ1 ; t j , j ¼ 1; . . . ; J, 0 ¼ t 0 Á Á Á t n ¼ τ. In our context, this time interval is defined as a day. At the start of day j there are N j people "at risk" of giving up smoking. The probability they call Quitline on that day is p j . In general, N j and p j are not known. Let Y j be the number of these at-risk individuals that do call Quitline on day j; therefore, there are N j − Y j individuals remaining, consisting of individuals who do not give up smoking and those who are thinking of calling Quitline. Let N j+1 = N j − Y j be the number of individuals at risk the next day, and let F j be the number of calls for each day on the series up to and including day j, ie, Consider a chain binomial model where the conditional distribution Y j jF jÀ1 $ binðN j ; p j Þ and the number at risk at the start of the jth interval is with N 0 being the number initially at risk. If the N j are so large that N j /N j−1 ≈ 1, then If p j are small, then 1 − p j ≈ 1 and thus so it is reasonable to assume Y j jF jÀ1 $ PoissonðY jÀ1 α 0 j Þ. This model is a type of first-order autoregressive conditional Poisson time series with varying coefficients α 0 j . We extend this model by parameterizing the coefficients as α 0 . . . ; X jp Þ is a row vector of covariates of length p and β ¼ ðβ 1 ; β 2 ; . . . ; β p Þ T is the column vector of parameters of the same length. The term X T j β allows us to examine the effect of covariates on the atrisk population. There is also an immigration component to the at-risk population, and some smokers will call Quitline without entering an at-risk phase, and other individuals become at risk. The numbers of the latter are not observed. We assume the number of individuals that were previously not at risk of quitting that do call Quitline on day j has a Poisson distribution with mean Z T j γ, where Z T j ¼ ðZ j1 ; Z j2 ; . . . ; Z jq Þ is a row vector of covariates of length q and γ ¼ ðγ 1 ; γ 2 ; . . . ; γ p Þ T is the column vector of parameters of the same length. Thus, our model for the mean number of calls to Quitline on each day is EðY j jF jÀ1 Þ ¼ Y jÀ1 ðα j þ X T j βÞ þ Z T j γ. The first term of the model is Y j−1 α j , where α j represents the nonparametric parts of the model. An examination of the estimated α j allows us to detect previously unknown structures in the data. The term Y jÀ1 X T j β reflects the effects of the covariates on the at-risk cohort. The remaining term Z T j γ represents the mean number of individuals that immigrate into the at-risk population then immediately call Quitline. Other immigrants who do not call Quitline are absorbed into the cohort at risk of calling the next day.
Clearly, more complex models are possible, but the present model can be fitted to the data, and the nonparametric component allows us to find structures that are not included in the parametric components.
We compared the performance of our model with 4 other models (see Table 1) using the mean square error (MSE), ie, whereŶ j is the fitted value. Model 1 is a linear regression model used by Miller et al. 8 Model 2 is the same as model 1, but assumes a Poisson distribution for the outcome variable. Model 3 is similar to our model (Model 5), but the varying coefficient α j is replaced by the constant coefficient α. Model 4 is the model used by Richard et al. 12 Standard errors and goodness of fit To apply the model to the Quit data, we let X T j ¼ ðX jMon ; X jTues ; . . . ; X jSat Þ be a vector of covariates of day of the week, where X jMon is the indicator function which equals to 1 for Monday and 0 otherwise, and similarly for X jTues ; . . . ; X jSat . Note that we set Sunday to 0. The vector Z j includes X j and the TARPs for both the Quit and NRT campaigns. That is, the model for the mean number of calls is Parameters in the model were estimated by the method described in Huggins et al. 12 A closed-form expression of standard errors of the parametric component of the model was not found and must be estimated by the bootstrap method. This can be carried out by considering Y 1 as fixed and by generating Y j according to PoissonfY jÀ1 ðα j þ X T jβ Þ þ Z T jγ g distribution. However, this approach could result in reporting an incorrect bootstrap estimate of standard errors of parameters if the data are overdispersed. To adjust for overdispersion, we suggest the use of nonparametric bootstrap as follow: Let Var(Y j |F j−1 ) = Fμ j and the modified Pearson residual be r . Solving Varðr Ã j Þ À 1 ¼ 0 to obtain value for F and use this value to compute r Ã j . A bootstrap sample is generated by randomly sampling with replacement J observations from r Ã j , denoted by r Ãb j , and then calculating estimates β and γ using Y b j ¼μ j þ r Ãb j ffiffiffiffiffiffiffi Fμ j p . This procedure is repeated a number of times to obtain a set of bootstrap estimates of parameters. From this set of estimates, the standard error of estimates is estimated by the standard deviation of the bootstrap replicates.
To assess the goodness of fit of the model, we use deviance R-square for Poisson regression models, 13 which is given by The Poisson model assumes that the conditional mean is equal to the conditional variance. This assumption is not tenable if the data are overdispersed-a consequence of the conditional variance exceeding the conditional mean, and thus resulting in reporting standard errors that are no longer correct. An informal technique for assessing overdispersion is to compute Pearson residuals, ie, r j ¼ ðY j Àμ j Þ= ffiffiffiffi f μ j p , and use these residuals to check for 0 mean and variance of 1. A more formal approach is to test the Poisson assumption against the negative binomial assumption, 14 where the conditional variance is of the form Var(Y j |F j−1 ) = Fμ j , where F is the dispersion parameter, by performing the following regression Y Ã j = F + u j , where Y Ã j ¼ ½ðY j Àμ j Þ 2 À Y j =μ j ,μ j is the estimated conditional mean and u j is an error term. The parameter F is asymptotically normal under the null hypothesis of no dispersion against the alternative of overdispersion of the negative binomial.
All analyses were conducted using our own programs written in R language (version 2.8.1; http://www.R-project. org).

RESULTS
We fit the varying coefficient models to the daily total number of calls to Quitline (outcome variable). The model fit the data reasonably well, with deviance R 2 = 0.839 and estimated conditional mean values closely agreeing with the observed values ( Figure 3).
A plot of the estimated density of the Pearson residuals (not shown) showed that most of the residuals were distributed symmetrically around 0. A simple calculation, however, revealed that the sample mean and variance of the Pearson residuals were 0.116 and 6.562, respectively, indicating that the data were overdispersed. The formal regression test for overdispersion gaveF = 5.518 with a P-value <0.0001, providing conclusive evidence of overdispersion. Therefore, we employed a nonparametric bootstrap procedure to estimate standard errors of fixed effect parameters. The estimated parametric effects are shown in Table 2. Note that these estimated parameters are all relative to Sunday. In this table, the parameters β, together with varying coefficients α j , represent effects for the total number of calls to Quitline on the previous day, while the parameter γ represents the day-of-    the-week fixed effects (γ 1 to γ 6 ), in addition to the effects of the advertisements (γ 7 and γ 8 ) on that day. In Figure 4 we graph the estimated varying coefficientsα j and the nominal 95% confidence interval. The varying coefficients represent the effects not explained by the TARPs for both the Quit and NRT programs.
The negative estimates of β 2 (Tuesday) to β 6 (Saturday) reflect the effect of a large number of calls on 1 day, leading to a decline in calls the next day. A large number of calls on 1 day reduces the population cohort that can potentially make a call the next day, and vice versa for a positive estimate of β 1 (Monday). These effects are shown in Figure 5, where we plot the combined effect ofα j and fixed parametersβ j , ie, α j þ X T jβ . The figure shows a weekly cyclic pattern, with maximum values on Monday and minimum values on Saturday.
The daily effects represented by the γ's are perhaps more representative of the susceptibility of the at-risk population of smokers to advertisements, and this may be regarded as a combination of their TV viewing habits and psychological state. They do represent the day effect if there were no calls made on the previous day. For example, if there were no calls on a Sunday we would expect about 51 additional calls on a Monday, whereas if there were 100 calls on a Sunday we expect only 59 additional calls. Furthermore, as expected, the number of calls to Quitline increased with the TARP value of either type of advertisement, and the TARPs associated with the Quit program were almost twice as effective. Figure 6 shows the estimated number of calls from the previous day, the day-of-the-week effects, and the effect from advertising. If we compare the top line of Figure 6 with that of Figure 4, the 3 troughs observed in Figure 4 correspond to periods with little or no advertising. However, the converse was not true for the 2 peaks from late December 2000 to mid-January 2001 and the second half of May 2001, where we observe a sudden increase in daily call volume. This increase was modeled by the varying coefficients and therefore demonstrates the need to incorporate these terms into the models. Table 1 shows the mean square errors for the 5 different models. The present model had the smallest MSE and fitted best.

DISCUSSION
Quitline is a major public health intervention program for smoking cessation. It is a media tobacco cessation campaign  that is free and available to all members of the general population who have access to a television and a telephone. Although there is convincing evidence for the effectiveness of this program as an initial point of contact for smokers who wish to quit, the underlying relationship between the number of calls to Quitline and the intensity and placement of antismoking advertisements is poorly understood. [8][9][10] This can be partly explained by the complexity of unraveling these relationships (as they vary through time) and the effects of other, unknown factors on call volume, as well as by limitations in the statistical methods used in practice.
Although the intensity and placement of anti-smoking advertisements are the primary factors that influence the number of calls to Quitline, a number of other factors may have an impact. These include public relations activity (such as World No Tobacco Day, New Year, and the launch of a new advertisement), residual effects from a previous campaign, and information obtained from other sources. With no available data on these unknowns, there are several advantages of semi-varying models. The nonparametric component of the semi-varying coefficient models incorporates these factors into the modeling by allowing call volume to vary smoothly with time. Incorporation of this term seems to permit detection of the underlying structure, given that there is no real evidence of a long-term trend. In this modeling framework, in addition to fixed day-of-the-week effects, we expect the day effects to vary and to depend on the number of calls received on the previous day.
Very few studies have focused on modeling the relationship between the intensity of anti-smoking advertising associated with the Quit campaign, placement of anti-smoking advertisements, and call volume to Quitline. In fact, to our knowledge, only 2 studies implemented a regression-type modeling strategy to monitor trends in call volume. 8,15 Our findings are in agreement with both of these previous studies, which found an association between the TARP values and calls to Quitline.
The fixed-effects regression approach implemented in Miller et al 8 does not adequately model the underlying nonlinear trend in call volume. Although the semi-parametric approach in Erbas et al 15 extends the classical regression modeling framework by modeling the time trend nonparametrically, the models do not reflect the stochastic nature of the trend. Semi-varying coefficient models, although new, are useful in modeling data, such as call volume, where there is little or no information on other factors related to the at-risk population-in this case, the "at-risk" population of smokers. The varying coefficient approach is a useful technique for modeling data with a strong underlying trend component. There are a number of strengths to this analytical approach. In contrast to the common fixed effects regression techniques used to model trends in call volume and its association with fixed covariates, varying coefficient models allow the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in call volume attributed to other unknown or unstudied factors. We acknowledge some limitations that should be considered when using this method. Although useful for analyzing complex data with an underlying trend, the methods tend to "break down" when analyzing data with many zeros and outcomes of rare events. Nevertheless, the methods are very useful for analyzing large-scale populationbased public health data.
Pooling the data into days, as in this study, allows us to establish a fixed relationship between daily call volumes and advertising, in addition to the day-of-the-week effects. However, to fully understand the impact of anti-smoking advertisements we must consider additional covariates, such as level of audience involvement in particular television shows, replacement of advertisements by time of day, and lag effects from exposure to the advertisements. To do this, we need to examine hourly data. Modeling hourly data is a challenge methodologically, as zero counts are frequent both in the outcome variable and potential covariates. A number of models are possible, for example, incorporating a nonparametric component to a zero-inflated Poisson regression model, 16 which will be considered in future research in this area.