The analysis of individual risks in insurance raises problems that occur in any statistical analysis of longitudinal data. Considering insurance data, the endogenous variables are severity variables (for instance: number and cost of claims, duration of compensations, and so on). The exogenous variables of the current period can be first be used as rating factors in an a priori rating model. The allowance for the history of the policyholder in a rating model is more intricate, and it can be performed from two different approaches. They are related to interpretations of serial correlation for individual data that can be summarized in the following way.
Exogenous vs. Endogenous interpretations of serial correlation for longitudinal data
Actual bonus-malus systems throughout the world are described in Lemaitre (1995). For most of them, a claim reported increases the cost of the males related to the next claims. Thus, these systems induce a “hunger for bonus”, and have a real incentive effect on the policyholders. This negative contagion could be taken into account by an endogenous
Annulation, considering that the history of the policyholder influences the distributions of the severity variables. Now, what is observed for every guarantee in automobile insurance is “positive apparent contagion”: policyholders that reported claims in the past will report more in the future than those who did not. This “positive apparent contagion” is explained by the revelation throughout the time of hidden intonations. Heterogeneous models, which allow for hidden intonations, are hence adapted to the prediction on insurance data.
Positive apparent contagion: empirical pieces of evidence for insurance data
Consider policyholders observed during two periods (a period is equal or less than a year). We split the population between those who did not report claims of a certain type during the first period and those who did. We discard the policyholders who reported two or more claims during the first period (the following results are easier to interpret).
Since the frequency per period is very inferior to one, these policyholders are much less numerous than those who reported one claim. For the population that reported i claim (i = 0,1), denoted as t; (resp. J;) the average frequency (resp. estimated frequency) of claims during the second period.
Allowance for hidden information by heterogeneous models
The starting point is a model (subsequently called “basic model”) on the observable intonations. Its likelihood with respect to a dominating measure is parameterized by 91′ and denoted as (J(y.l91,x.) for the individual i. Besides X., the vector of observable exogenous variables, ~e s~ppose that there exist hidden v;marbles, relevant for the explanation of Yr These variables are represented by uj ‘ a heterogeneity component for i.
The likelihood conditional on uj is denoted as 1*(y/9I’xj,u), These distributions, supposed to be the actual ones in the prediction, will be said to belong to a “fixed effects” model, where the individual heterogeneity component is the fixed effect. We suppose that there exists U ~ such that
Where the expectation is taken with respect to Vi” The parameter 9 is written as a list for convenience. Since data are longitudinal, Xi and Yi are sequences of variables. The ~ are i.i.d., and we write.