# Use of SynthETIC to Generate Individual Claims of Realistic Features

This vignette aims to illustrate how the SynthETIC package can be used to generate a general insurance claims history with realistic distributional assumptions consistent with the experience of a specific (but anonymous) Auto Liability portfolio. The simulator is composed of 8 modelling steps (or “modules”), each of which will build on (a selection of) the output from previous steps:

1. Claim occurrence: claim frequency, claim occurrence times
2. Claim size: claim size in constant dollar values i.e. without inflation
3. Claim notification: notification delay (delay from occurrence to notification)
4. Claim closure: settlement delay (delay from notification to closure)
5. Claim payment count: number of partial payments
6. Claim payment size: sizes of partial payments in constant dollar values i.e. without inflation
7. Claim payment time: inter-partial-payment delays, partial payment times in calendar period
8. Claim inflation: sizes of inflated partial payments

In particular, with this demo we will output

Description R Object
N, claim frequency n_vector = # claims for each accident period
U, claim occurrence time occurrence_times[[i]] = claim occurrence time for all claims that occurred in period i
S, claim size claim_sizes[[i]] = claim size for all claims that occurred in period i
V, notification delay notidel[[i]] = notification delay for all claims that occurred in period i
W, settlement delay setldel[[i]] = settlement delay for all claims that occurred in period i
M, number of partial payments no_payments[[i]] = number of partial payments for all claims that occurred in period i
size of partial payments payment_sizes[[i]][[j]] = $partial payments for claim j of occurrence period i inter-partial delays payment_delays[[i]][[j]] = inter partial delays for claim j of occurrence period i payment times (continuous time) payment_times[[i]][[j]] = payment times (in continuous time) for claim j of occurrence period i payment times (period) payment_periods[[i]][[j]] = payment times (in calendar periods) for claim j of occurrence period i actual payments (inflated) payment_inflated[[i]][[j]] =$ partial payments (inflated) for claim j of occurrence period i

## Reference

For a full description of SythETIC’s structure and test parameters, readers should refer to:

Avanzi, B, Taylor, G, Wang, M, Wong, B (2021). SynthETIC: An individual insurance claim simulator with feature control. Insurance: Mathematics and Economics 100, 296–308. https://doi.org/10.1016/j.insmatheco.2021.06.004

The work can also be accessed via arXiv:2008.05693.

To cite this package in publications, please use:

citation("SynthETIC")

## Set Up

library(SynthETIC)
set.seed(20200131)

## Package-wise Global Parameters

We introduce the reference value ref_claim partly as a measure of the monetary unit and/or overall claims experience. The default distributional assumptions were set up with a specific (but anonymous) Auto Liability portfolio in mind. ref_claim then allows users to easily simulate a synthetic portfolio with similar claim pattern but in a different currency, for example. We also remark that users can alternatively choose to interpret ref_claim as a monetary unit. For example, one can set ref_claim <- 1000 and think of all amounts in terms of $1,000. However, in this case the default functions (as listed below) will not work and users will need to supply their own set of functions and set the values as multiples of ref_claim rather than fractions as in the default setting. We also require the user to input a time_unit (which should be given as a fraction of year), so that the default input parameters apply to contexts where the time units are no longer in quarters. In the default setting we have a time_unit of 1/4. The default input parameters will update automatically with the choice of the two global variables ref_claim and time_unit, which ensures that the simulator produce sensible results in contexts other than the default setting. We remark that both ref_claim and time_unit only affect the default simulation functions, and users can also choose to set up their own modelling assumptions for any of the modules to match their experiences even better. In the latter case, it is the responsibility of the user to ensure that their input parameters are compatible with their time units and claims experience. For example, if the time units are quarters, then claim occurrence rates must be quarterly. set_parameters(ref_claim = 200000, time_unit = 1/4) ref_claim <- return_parameters()[1] time_unit <- return_parameters()[2] The reference value, ref_claim will be used throughout the simulation process (as listed in the table below). Module Details 2. Claim Size At ref_claim = 200000, by default we simulate claim sizes from S^0.2 ~ Normal (9.5, sd = 3), left truncated at 30. When the reference value changes, we output the claim sizes scaled by a factor of ref_claim / 200000. 3. Claim Notification By default we set the mean notification delay (in quarters) to be $min(3, max(1, 2 - \frac{1}{3} \log(\frac{claim\_size}{0.5~ref\_claim}))$ (which will be automatically converted to the relevant time_unit) i.e. the mean notification delay decreases logarithmically with claim size. It has maximum value 3 and equals 2 for a claim of size exactly at 0.5 * ref_claim. 4. Claim Closure The default value for the mean settlement delay involves a term that defines the benchmark for a claim to be considered “small”: 0.1 * ref_claim. The default mean settlement delay increases logarithmically with claim size and equals 6 exactly at this benchmark. Furthermore there was a legislative change, captured in the default mean function, that impacted the settlement delays of those “small” claims. 5. Claim Payment Count For the default sampling distribution, we need two claim size benchmarks as we sample from different distributions for claims of different sizes. In general a small number of partial payments is required to settle small claims, and additional payments will be required to settle more extreme claims. It is assumed that claims below 0.0375 * ref_claim can be settled in 1 or 2 payments, claims between 0.075 * ref_claim in 2 or 3 payments, and claims beyond 0.075 * ref_claim in no less than 4 payments. 6. Claim Payment Size We use the same proportion of ref_claim as in the Claim Closure module, namely 0.1 * ref_claim. This benchmark value is used when simulating the proportion of the last two payments in the default simulate_amt_pmt function. The mean proportion of claim paid in the last two payments increases logarithmically with claim size, and equals 75% exactly at this benchmark. 8. Claim Inflation Two benchmarks values are required in this section, one each for the default SI occurrence and SI payment functions. 1) A legislative change, captured by SI occurrence, reduced claim size by up to 40% for the smallest claims and impacted claims up to 0.25 * ref_claim in size. 2) The default SI payment is set to be 30% p.a. for the smallest claims and zero for claims exceeding ref_claim in size, and varies linearly for claims between 0 and ref_claim. The time_unit chosen will impact the time-related modules, specifically • Claim Notification; • Claim Closure; • Claim Payment Time; • Claim Inflation. ## 1. Claim Occurrence Unless otherwise specified, claim_frequency() assumes the claim frequency follows a Poisson distribution with mean equal to the product of exposure E associated with period $$i$$ and expected claim frequency freq per unit exposure for that period. The exposure and expected frequency are allowed to vary across periods, but not within a period. Given the claim frequency, claim_occurrence() samples the occurrence times of each claim from a uniform distribution. Together, the two functions assume by default that the arrival of claims follows a Poisson process, with potentially varying rates across different periods (see Example 1.2). Alternative sampling processes are discussed in Example 1.3 and 1.4. ## Example 1.1: Constant exposure and frequency ### Input parameters • years = number of years considered • I = number of claims development periods considered (which equals the number of years divided by the time_unit) • E[i] = exposure associated with each period i • lambda[i] = expected claim frequency per unit exposure for period i years <- 10 I <- years / time_unit E <- c(rep(12000, I)) # effective annual exposure rates lambda <- c(rep(0.03, I)) ### Implementation and Output # Number of claims occurring for each period i # shorter equivalent code: # n_vector <- claim_frequency() n_vector <- claim_frequency(I = I, E = E, freq = lambda) n_vector #> [1] 90 79 102 78 86 88 116 84 93 104 80 87 86 104 81 84 101 96 96 #> [20] 86 102 103 82 83 80 80 82 87 103 79 79 100 94 99 88 101 91 95 #> [39] 91 84 # Occurrence time of each claim r, for each period i occurrence_times <- claim_occurrence(frequency_vector = n_vector) occurrence_times[[1]] #> [1] 0.6238351404 0.1206679437 0.2220435985 0.4538308736 0.5910992266 #> [6] 0.9524491858 0.3660710892 0.1923275446 0.5391526092 0.7398599708 #> [11] 0.9761979643 0.6794459166 0.6491731463 0.0145699105 0.0117662018 #> [16] 0.0002802343 0.1229670814 0.2181776366 0.9188914341 0.3641183279 #> [21] 0.3599445471 0.3228054109 0.7384824581 0.0756409415 0.2406489884 #> [26] 0.0309497463 0.1994408462 0.0391640882 0.1830444403 0.5194172878 #> [31] 0.8934622605 0.2604308173 0.8512500757 0.1738214253 0.4129021554 #> [36] 0.0683904318 0.0944415457 0.5636684340 0.4130775523 0.6496588932 #> [41] 0.2293977202 0.2929870863 0.1346096094 0.3428012058 0.5930486526 #> [46] 0.7660660581 0.7112241383 0.9488298327 0.0046397008 0.7370544358 #> [51] 0.1497760331 0.0386742705 0.1717934967 0.8123882010 0.3574451937 #> [56] 0.7511094357 0.2453237963 0.8360645119 0.7225212962 0.5654766215 #> [61] 0.0858555159 0.2943205256 0.4229451967 0.3454886819 0.6273976711 #> [66] 0.4686531660 0.6168212816 0.2097416152 0.0703774171 0.5280987371 #> [71] 0.2788692161 0.3355113363 0.3388684399 0.2468694879 0.1210995505 #> [76] 0.4063767171 0.1075867382 0.7758433735 0.5431794343 0.9817624143 #> [81] 0.4714252711 0.3129043274 0.8519159236 0.2192278604 0.2754109078 #> [86] 0.9434416124 0.7397910126 0.2484398137 0.5336137633 0.7483879288 ## Example 1.2: Increasing exposure, constant frequency per unit of exposure Note that variables named with _tmp are for illustration purposes only and not used in the later simulation modules of this demo. ## input parameters years_tmp <- 10 I_tmp <- years_tmp / time_unit # set linearly increasing exposure, ... E_tmp <- c(rep(12000, I)) + seq(from = 0, by = 100, length = I) # and constant frequency per unit of exposure lambda_tmp <- c(rep(0.03, I)) ## output # Number of claims occurring for each period i n_vector_tmp <- claim_frequency(I = I_tmp, E = E_tmp, freq = lambda_tmp) n_vector_tmp #> [1] 107 97 86 103 87 81 82 83 107 80 81 86 79 98 91 87 93 111 93 #> [20] 104 105 113 89 100 115 104 114 122 90 116 132 100 111 108 135 116 116 109 #> [39] 120 120 # Occurrence time of each claim r, for each period i occurrence_times_tmp <- claim_occurrence(frequency_vector = n_vector_tmp) occurrence_times_tmp[[1]] #> [1] 0.952972013 0.878173776 0.684479050 0.915558977 0.496780866 0.784500798 #> [7] 0.446433841 0.102953206 0.612290796 0.680195534 0.556182698 0.045605700 #> [13] 0.371326311 0.220061345 0.195921842 0.083790625 0.075338539 0.342769926 #> [19] 0.336097335 0.379061881 0.634857761 0.711008352 0.910231843 0.609100422 #> [25] 0.645031730 0.859860029 0.786352659 0.286475987 0.189036040 0.595847647 #> [31] 0.354306386 0.940303840 0.018530716 0.151189459 0.745556375 0.155205039 #> [37] 0.070178678 0.426025548 0.447296439 0.755066258 0.643531907 0.832750566 #> [43] 0.613205539 0.397535617 0.870752500 0.220184653 0.226098091 0.466065862 #> [49] 0.881361386 0.647172325 0.549784031 0.927304841 0.595728125 0.921661125 #> [55] 0.560342632 0.759705019 0.820286798 0.330417019 0.333587312 0.540555824 #> [61] 0.054696505 0.558244388 0.807569014 0.628752004 0.042540230 0.176635575 #> [67] 0.283089697 0.660460350 0.892414873 0.058447282 0.937083544 0.099011265 #> [73] 0.880388323 0.620242061 0.648976628 0.412398872 0.033443779 0.967655757 #> [79] 0.605652047 0.309612707 0.583694900 0.387392525 0.403679390 0.763759864 #> [85] 0.768409867 0.493427805 0.884637634 0.022691348 0.016921406 0.546337125 #> [91] 0.282798626 0.291636830 0.210914176 0.140094880 0.106370681 0.703040822 #> [97] 0.011059992 0.910601367 0.117060644 0.783328586 0.491064691 0.005622066 #> [103] 0.828679769 0.214179660 0.241332419 0.079605893 0.341526252 ## Example 1.3: Alternative claim frequency distribution Users can choose to specify their own claim frequency distribution via simfun, which takes both random generation functions (type = "r", the default) and cumulative distribution functions (type = "p"). For example, we can use the negative binomial distribution in base R, or the zero-truncated Poisson distribution from the actuar package. # simulate claim frequencies from negative binomial # 1. using type-"r" specification (default) claim_frequency(I = I, simfun = rnbinom, size = 100, mu = 100) #> [1] 94 103 73 123 131 73 113 101 95 91 120 84 106 112 88 72 94 88 105 #> [20] 95 115 75 90 85 93 92 123 107 109 92 93 105 116 103 100 84 93 102 #> [39] 81 93 # 2. using type-"p" specification, equivalent to above claim_frequency(I = I, simfun = pnbinom, type = "p", size = 100, mu = 100) #> [1] 121 77 89 91 118 110 121 87 98 96 91 108 85 83 67 109 101 93 110 #> [20] 86 100 94 106 90 102 106 98 104 130 117 95 81 86 97 115 104 95 89 #> [39] 97 64 # simulate claim frequencies from zero-truncated Poisson claim_frequency(I = I, simfun = actuar::rztpois, lambda = 90) #> [1] 80 90 83 74 78 97 105 102 81 85 109 96 93 101 87 88 86 93 76 #> [20] 95 75 74 105 80 103 93 101 78 108 78 90 91 103 98 81 106 80 100 #> [39] 89 100 claim_frequency(I = I, simfun = actuar::pztpois, type = "p", lambda = 90) #> [1] 89 89 82 89 83 92 96 106 96 105 104 97 85 99 104 86 88 100 76 #> [20] 114 79 90 100 98 89 99 87 83 68 88 88 73 99 111 75 75 86 89 #> [39] 94 94 Similar to Example 1.2, we can modify the frequency parameters to vary across periods: claim_frequency(I = I, simfun = actuar::rztpois, lambda = time_unit * E_tmp * lambda_tmp) #> [1] 98 83 92 114 88 105 97 98 99 82 106 94 92 92 87 97 97 101 96 #> [20] 103 105 91 108 125 120 134 108 128 94 95 114 122 119 116 117 105 115 103 #> [39] 120 121 If one wishes to code their own sampling function (either a direct random generating function, or a proper CDF), this can be achieved by: # sampling from non-homogeneous Poisson process rnhpp.count <- function(no_periods) { rate <- 3000 intensity <- function(x) { # e.g. cyclical Poisson process 0.03 * (sin(x * pi / 2) / 4 + 1) } claim_times <- poisson::nhpp.event.times(rate, no_periods * rate * 2, intensity) as.numeric(table(cut(claim_times, breaks = 0:no_periods))) } n_vector_tmp <- claim_frequency(I = I, simfun = rnhpp.count) plot(x = 1:I, y = n_vector_tmp, type = "l", main = "Claim frequency simulated from a cyclical Poisson process", xlab = "Occurrence period", ylab = "# Claims") ## Example 1.4: Alternative specification of the claim arrival process We note that the claim_occurrence() function for simulating the claim times conditional on claim frequencies assumes a uniform distribution, and that this cannot be modified without changing the module. Indeed, the modular structure of SynthETIC ensures that one can easily unplug any one module and replace it with a version modified to his/her own purpose. For example, if one wishes to replace this uniform distribution assumption and/or the whole Claim Occurrence module, they can simply supply their own vector of claim times and easily convert to the list format consistent with the SynthETIC framework for smooth integration with the later modules. Recall the example of non-homogeneous Poisson process from 1.3: rate_tmp <- 3000 intensity_tmp <- function(x) { # e.g. cyclical Poisson process 0.03 * (sin(x * pi / 2) / 4 + 1) } x_tmp <- poisson::nhpp.event.times(rate_tmp, I * rate_tmp, intensity_tmp) event_times_tmp <- x_tmp[x_tmp <= I] # Number of claims occurring for each period i # by counting the number of event times in each interval (i, i + 1) n_vector_tmp <- as.numeric(table(cut(event_times_tmp, breaks = 0:I))) n_vector_tmp #> [1] 106 100 77 78 110 113 65 82 100 93 100 70 97 74 63 70 113 108 67 #> [20] 66 112 109 82 74 115 102 81 72 102 102 87 44 108 112 56 91 94 121 #> [39] 75 87 # Occurrence time of each claim r, for each period i occurrence_times_tmp <- to_SynthETIC(x = event_times_tmp, frequency_vector = n_vector_tmp) occurrence_times_tmp[[1]] #> [1] 0.007728273 0.007945250 0.011723418 0.018534783 0.028319285 0.040252238 #> [7] 0.048106842 0.050856621 0.057240086 0.059381448 0.079972286 0.087408264 #> [13] 0.097266676 0.118078846 0.125170867 0.133381255 0.146589643 0.147090407 #> [19] 0.148041242 0.156275663 0.160733908 0.169585235 0.177193996 0.178100534 #> [25] 0.199377711 0.224454160 0.234607727 0.244816652 0.245845531 0.249132698 #> [31] 0.251743134 0.258342748 0.261091994 0.276257142 0.278985850 0.282803915 #> [37] 0.286553431 0.294774871 0.311442677 0.318563078 0.323961017 0.326671951 #> [43] 0.328234368 0.336979953 0.348133626 0.354743330 0.366844413 0.389373631 #> [49] 0.401162011 0.408418755 0.416610442 0.432105152 0.446045625 0.453974806 #> [55] 0.462735523 0.468141357 0.474674689 0.483620074 0.496221092 0.496316647 #> [61] 0.500560003 0.508516775 0.520222861 0.520677480 0.520784864 0.536489580 #> [67] 0.548359980 0.557365339 0.564630394 0.569655578 0.584048566 0.595048058 #> [73] 0.603414160 0.628469605 0.660625622 0.662136776 0.669929786 0.673350618 #> [79] 0.703629852 0.723444931 0.736488924 0.739906292 0.745396837 0.755282491 #> [85] 0.756610319 0.766758238 0.778293428 0.796807193 0.800105003 0.806856302 #> [91] 0.808539114 0.808711251 0.812124200 0.873272419 0.882532989 0.884940646 #> [97] 0.901151514 0.914971667 0.915009924 0.915256156 0.918208482 0.926779527 #> [103] 0.935431290 0.943375941 0.968904523 0.978526824 ## 2. Claim Size ## Example 2.1: Default power normal By default claim_size() assumes a left truncated power normal distribution: $$S^{0.2} \sim \mathcal{N}(\mu = 9.5, \sigma = 3)$$, left truncated at 30. There is no need to specify a sampling distribution if the user is happy with the default power normal. This example is mainly to demonstrate how the default function works. ### Input parameters We can specify the CDF to generate claim sizes from. The default distribution function can be coded as follows: # use a power normal S^0.2 ~ N(9.5, 3), left truncated at 30 # this is the default distribution driving the claim_size() function S_df <- function(s) { # truncate and rescale if (s < 30) { return(0) } else { p_trun <- pnorm(s^0.2, 9.5, 3) - pnorm(30^0.2, 9.5, 3) p_rescaled <- p_trun/(1 - pnorm(30^0.2, 9.5, 3)) return(p_rescaled) } } ### Implementation and Output # shorter equivalent: claim_sizes <- claim_size(frequency_vector = n_vector) claim_sizes <- claim_size(frequency_vector = n_vector, simfun = S_df, type = "p", range = c(0, 1e24)) claim_sizes[[1]] #> [1] 93291.1281 1825.1852 4440.4396 32287.6849 237695.8689 #> [6] 1792.0400 32672.5857 4356.8415 490648.0501 165492.3940 #> [11] 388.8723 81276.5608 1233458.6942 133725.9577 378393.8095 #> [16] 2882.6736 6686.3289 63052.1262 574912.0378 21882.4758 #> [21] 46725.3221 98541.1789 332462.0144 287900.0286 93188.9701 #> [26] 15287.3497 463495.3618 52649.0723 74152.5254 130910.1997 #> [31] 119880.4127 109487.8475 7902.1319 143295.5038 8429.2808 #> [36] 106321.3745 74289.0139 231568.1581 5655.3802 114095.7604 #> [41] 3674.5182 17833.0693 138709.3259 1183.5463 31943.5395 #> [46] 31788.6430 649305.4815 129626.7759 953064.5181 176055.2458 #> [51] 9448.9767 132756.9068 790907.3914 323981.5791 35816.4343 #> [56] 18368.9373 76700.1874 13216.5515 259080.7079 105172.6430 #> [61] 11050.3279 192579.0140 469054.6720 69482.8341 334880.5561 #> [66] 412407.0126 2759.4386 9451.1585 46474.0664 199298.5997 #> [71] 145184.7006 272916.3488 118555.3651 12932.7432 235195.6266 #> [76] 29125.3375 13172.2650 13447.7966 50760.3215 34996.6500 #> [81] 17735.1021 75531.2246 6297.6255 143859.3648 172966.1540 #> [86] 21274.0752 491206.5312 114356.5266 528858.8966 3688.4671 ## Example 2.2: Alternative claim size distribution Users can also choose any other individual claim size distribution, e.g. Weibull from base R or inverse Gaussian from actuar: ## weibull # estimate the weibull parameters to achieve the mean and cv matching that of # the built-in test claim dataset claim_size_mean <- mean(test_claim_dataset$claim_size)
claim_size_cv <- cv(test_claim_dataset$claim_size) weibull_shape <- get_Weibull_parameters(target_mean = claim_size_mean, target_cv = claim_size_cv)[1] weibull_scale <- get_Weibull_parameters(target_mean = claim_size_mean, target_cv = claim_size_cv)[2] # simulate claim sizes with the estimated parameters claim_sizes_weibull <- claim_size(frequency_vector = n_vector, simfun = rweibull, shape = weibull_shape, scale = weibull_scale) # plot empirical CDF plot(ecdf(unlist(test_claim_dataset$claim_size)), xlim = c(0, 2000000),
main = "Empirical distribution of simulated claim sizes",
xlab = "Individual claim size")
plot(ecdf(unlist(claim_sizes_weibull)), add = TRUE, col = 2)

## inverse Gaussian
# modify actuar::rinvgauss (left truncate it @30 and right censor it @5,000,000)
rinvgauss_censored <- function(n) {
s <- actuar::rinvgauss(n, mean = 180000, dispersion = 0.5e-5)
while (any(s < 30 | s > 5000000)) {
for (j in which(s < 30 | s > 5000000)) {
# for rejected values, resample
s[j] <- actuar::rinvgauss(1, mean = 180000, dispersion = 0.5e-5)
}
}
s
}
# simulate from the modified inverse Gaussian distribution
claim_sizes_invgauss <- claim_size(frequency_vector = n_vector, simfun = rinvgauss_censored)

# plot empirical CDF
plot(ecdf(unlist(claim_sizes_invgauss)), add = TRUE, col = 3)
legend.text <- c("Power normal", "Weibull", "Inverse Gaussian")
legend("bottomright", legend.text, col = 1:3, lty = 1, bty = "n")

## Example 2.3: Simulating claim sizes from covariates

The applications discussed above assume that the claim sizes are sampled from a single distribution for all policyholders (e.g. the default power normal, custom sampling distribution specified by simfun).

Suppose we instead want to simulate from a model which uses covariates to predict claim sizes. For example, consider a (theoretical) gamma GLM with log link:

\begin{align*} E(S_i) =\mu_i &=\exp(\boldsymbol{x}_i^\top \boldsymbol\beta)\\ &= \exp(\beta_0 + \beta_1 \times age_i + \beta_2 \times age_i^2)\\ &= \exp(27 - 0.768 \times age_i + 0.008 \times age_i^2) \end{align*}

# define the random generation function to simulate from the gamma GLM
sim_GLM <- function(n) {
# simulate covariates
age <- sample(20:70, size = n, replace = T)
mu <- exp(27 - 0.768 * age + 0.008 * age^2)
rgamma(n, shape = 10, scale = mu / 10)
}

claim_sizes_GLM <- claim_size(frequency_vector = n_vector, simfun = sim_GLM)
plot(ecdf(unlist(claim_sizes_GLM)), xlim = c(0, 2000000),
main = "Empirical distribution of claim sizes simulated from GLM",
xlab = "Individual claim size")

## Example 2.4: Bootstrapping from given loss data

Suppose we have an existing dataset of claim costs at hand that we wish to simulate from, e.g. ausautoBI8999 (an automobile bodily injury claim dataset in Australia) from CASDatasets. We can take a bootstrap resample of the dataset and then convert to SynthETIC format with ease:

# install.packages("CASdatasets", repos = "http://cas.uqam.ca/pub/", type = "source")
library(CASdatasets)
data("ausautoBI8999")
boot <- sample(ausautoBI8999$AggClaim, size = sum(n_vector), replace = TRUE) claim_sizes_bootstrap <- to_SynthETIC(boot, frequency_vector = n_vector) Another way to code this would be to write a random generation function to perform bootstrapping, and then use claim_size as usual: sim_boot <- function(n) { sample(ausautoBI8999$AggClaim, size = n, replace = TRUE)
}
claim_sizes_bootstrap <- claim_size(frequency_vector = n_vector, simfun = sim_boot)

Alternatively, one can easily fit a parametric distribution to an existing dataset with the help of the fitdistrplus package and then simulate from the fitted parametric distribution (Example 2.2).

## 3. Claim Notification

SynthETIC assumes the (removable) dependence of notification delay on claim size and occurrence period of the claim, and thus requires the user to specify a paramfun (parameter function) with arguments claim_size and occurrence_period (and possibly more, see Example 3.2). The dependencies can be removed if the arguments are not referenced inside the function; e.g. the default notification delay function (shown below) is independent of the individual claim’s occurrence_period.

Other than this pre-specified dependence structure, users are free to choose any distribution, whether it be a pre-defined distribution in R, or more advanced ones from packages, or a proper user-defined function, to better match their own claim experience.

Indeed, although not recommended, users are able to add further dependencies in their simulation. This is illustrated in Example 4.2 of the settlement delay module.

## Example 3.1: Default Weibull

By default, SynthETIC samples notification delays from a Weibull distribution:

## input
# specify the Weibull parameters as a function of claim_size and occurrence_period
notidel_param <- function(claim_size, occurrence_period) {
# NOTE: users may add to, but not remove these two arguments (claim_size,
# occurrence_period) as they are part of SynthETIC's internal structure

# specify the target mean and target coefficient of variation
target_mean <- min(3, max(1, 2-(log(claim_size/(0.50 * ref_claim)))/3))/4 / time_unit
target_cv <- 0.70
# convert to Weibull parameters
shape <- get_Weibull_parameters(target_mean, target_cv)[1]
scale <- get_Weibull_parameters(target_mean, target_cv)[2]

c(shape = shape, scale = scale)
}

## output
notidel <- claim_notification(n_vector, claim_sizes,
rfun = rweibull, paramfun = notidel_param)

## Example 3.2: Alternative distribution for notification delay

SynthETIC does not restrict the choice of the sampling distribution. For example, we can use a transformed gamma distribution:

## input
# specify the transformed gamma parameters as a function of claim_size and occurrence_period
trgamma_param <- function(claim_size, occurrence_period, rate) {
c(shape1 = max(1, claim_size / ref_claim),
shape2 = 1 - occurrence_period / 200,
rate = rate)
}

## output
# simulate notification delays from the transformed gamma
notidel_trgamma <- claim_notification(n_vector, claim_sizes,
rfun = actuar::rtrgamma,
paramfun = trgamma_param, rate = 2)

# graphically compare the result with the default Weibull distribution
plot(ecdf(unlist(notidel)), xlim = c(0, 15),
main = "Empirical distribution of simulated notification delays",
xlab = "Notification delay (in quarters)")
plot(ecdf(unlist(notidel_trgamma)), add = TRUE, col = 2)
legend.text <- c("Weibull (default)", "Transformed gamma")
legend("bottomright", legend.text, col = 1:2, lty = 1, bty = "n")

Clearly the transformed gamma with the parameters specified above accelerates the reporting of the simulated claims.

## Example 3.3: User-defined sampling function for notification delay

One may wish to simulate from a more exotic sampling distribution that cannot be easily written as a nice pre-defined distribution function and its parameters. For example, consider a mixed distribution:

rmixed_notidel <- function(n, claim_size) {
# consider a mixture distribution
# equal probability of sampling from x (Weibull) or y (transformed gamma)
x_selected <- sample(c(T, F), size = n, replace = TRUE)
x <- rweibull(n, shape = 2, scale = 1)
y <- actuar::rtrgamma(n, shape1 = min(1, claim_size / ref_claim), shape2 = 0.8, rate = 2)
result <- length(n)
result[x_selected] <- x[x_selected]; result[!x_selected] <- y[!x_selected]

return(result)
}

In this case, we can consider claim_size as the “parameter” for the sampling distribution (just in the same way as shape and scale for gamma distribution). Then we can either define a parameter function like below:

rmixed_params <- function(claim_size, occurrence_period) {
# claim_size is the only "parameter" required for rmixed_notidel
c(claim_size = claim_size)
}

or simply run

notidel_mixed <- claim_notification(n_vector, claim_sizes, rfun = rmixed_notidel)

which would give the same result as

notidel_mixed <- claim_notification(n_vector, claim_sizes,
rfun = rmixed_notidel, paramfun = rmixed_params)

## 4. Claim Closure

Claim settlement delay represents the delay from claim notification to closure. Like notification delay, SynthETIC assumes the (removable) dependence of settlement delay on claim size and occurrence period of the claim, and thus requires the user to specify a paramfun (parameter function) with arguments claim_size and occurrence_period (and possibly more, see Example 3.2).

Other than this pre-specified dependence structure, users are free to choose any distribution by specifying their own rfun and/or paramfun (see ?claim_closure).

Indeed, although not recommended, users are able to add further dependencies in their simulation. This is illustrated in Example 4.2.

## Example 4.1: Default Weibull

Below we show the default implementation with a Weibull distribution.

## input
# specify the Weibull parameters as a function of claim_size and occurrence_period
setldel_param <- function(claim_size, occurrence_period) {
# NOTE: users may add to, but not remove these two arguments (claim_size,
# occurrence_period) as they are part of SynthETIC's internal structure

# specify the target Weibull mean
if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
} else {
a <- max(0.85, 1 - 0.0075 * occurrence_period)
}
mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
target_mean <- mean_quarter / 4 / time_unit

# specify the target Weibull coefficient of variation
target_cv <- 0.60

c(shape = get_Weibull_parameters(target_mean, target_cv)[1, ],
scale = get_Weibull_parameters(target_mean, target_cv)[2, ])
}

## output
# simulate the settlement delays from the Weibull with parameters above
setldel <- claim_closure(n_vector, claim_sizes, rfun = rweibull, paramfun = setldel_param)
setldel[[1]]
#>  [1] 11.1171182  0.9915810  0.1826818  4.7878665  5.4618578  1.0064047
#>  [7]  1.6364392  1.1862547 21.9830526  8.3429796  0.5393147 16.4555725
#> [13]  9.2406402  1.3412914  8.8882334  0.2035008  0.5002912 20.5842449
#> [19] 13.8291207  5.0082492  3.1408487  6.1437615 22.7239822 29.0201396
#> [25]  4.9101573  7.8902569  7.1198205  8.3467641  5.7555371 10.4689594
#> [31] 16.2304449  9.6519939  2.7239285 31.0072406  2.2765702  1.6977778
#> [37]  5.9940011 25.9073247  0.6438297  2.2153375  1.0662963  1.3585586
#> [43] 23.4306557  1.3014886  4.7480768  5.3182398 25.1114165  3.8915542
#> [49] 14.0220317  8.9511932  5.5911252 14.7142304 17.0016193 40.1720387
#> [55] 12.2259944  6.9664398  5.5752670  3.3658310  9.8771442 23.1110775
#> [61]  2.1039141 25.1090208 58.2892416  6.0337357 23.1634540 17.0033148
#> [67]  1.0361984  2.5395749 20.3878280  3.2486178  8.2695258 22.6325874
#> [73]  5.6914801  4.0251805  8.0572108 12.0947715  4.8305101  2.5255476
#> [79] 30.3794386  8.1585016  3.3725944 15.8128384  2.1393232 21.5218345
#> [85]  6.7774983  3.3500768 12.7469291  6.3790721 42.1311726  0.2192005

There is no need to specify a sampling distribution if one is happy with the default Weibull specification. This example is just to demonstrate some of the behind-the-scenes work of the default implementation, and at the same time, to show how one may specify and input a random sampling distribution of their choosing.

## Example 4.2: Additional dependencies

Suppose we would like to add the dependence of settlement delay on notification delay, which is not natively included in SynthETIC default setting. For example, let’s consider the following parameter function:

## input
# an extended parameter function for the simulation of settlement delays
setldel_param_extd <- function(claim_size, occurrence_period, notidel) {

# specify the target Weibull mean
if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
} else {
a <- max(0.85, 1 - 0.0075 * occurrence_period)
}
mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
# suppose the setldel mean is linearly related to the notidel of the claim
target_mean <- (mean_quarter + notidel) / 4 / time_unit

# specify the target Weibull coefficient of variation
target_cv <- 0.60

c(shape = get_Weibull_parameters(target_mean, target_cv)[1, ],
scale = get_Weibull_parameters(target_mean, target_cv)[2, ])
}

As this parameter function setldel_param_extd is dependent on notidel, it should not be surprising that we need to input the simulated notification delays when calling claim_closure. We need to make sure that the argument names are matched exactly (notidel in this example) and that the input is specified as a vector of simulated quantities (not a list).

## output
# simulate the settlement delays from the Weibull with parameters above
notidel_vect <- unlist(notidel) # convert to a vector
setldel_extd <- claim_closure(n_vector, claim_sizes, rfun = rweibull,
paramfun = setldel_param_extd,
notidel = notidel_vect)
setldel_extd[[1]]
#>  [1] 14.6892032  9.9652517  3.6299539  4.4118651 25.6088185  0.8179742
#>  [7] 18.6616020  7.9238994 28.8911279 15.3810650  2.2529752 33.7775468
#> [13] 42.3389833  1.7759405  4.5073092  7.4776499  4.4497463 26.0141815
#> [19] 18.0192441  3.1060068 17.7476595  6.5391149 23.0580363 18.0881832
#> [25] 22.4513319  7.1284399 12.4748345  9.1175816  7.5754990 11.0818682
#> [31] 10.7193401 26.1311063  6.1921787 11.8489560  2.0428037 22.1470080
#> [37] 13.3925090 16.2728543  1.2630736  2.1189915  5.0667649  7.3133601
#> [43]  7.0238416  1.3570741  5.4988342  1.6089303 29.2986766 14.9138915
#> [49] 14.6849949  0.2915756  0.5653274  5.7288114  1.8952385 13.4975617
#> [55] 11.3844735  1.2345387 11.1303717  6.5988320 16.9218029 28.9542325
#> [61] 12.5252529  3.6869736 29.5083436  3.1620904 36.2292693 20.9114785
#> [67]  6.3516603  4.0006223 15.1775581 19.5001943 13.9311741  9.2651856
#> [73] 20.6266343  7.2239913  7.8972259 12.7737971  4.1574178  8.7913017
#> [79] 16.0277122  3.5685777 11.6426866 11.9206555  4.0544059 15.2399484
#> [85] 13.1029394  6.8485348 15.5917883 10.5412719 21.6355271  1.4583662

## 5. Claim Partial Payment - Number of Partial Payments

claim_payment_no() generates the number of partial payments associated with a particular claim, from a user-defined random generation function which may depend on claim_size.

## Example 5.1: Default mixture distribution

Below we spell out the default function in SynthETIC that simulates the number of partial payments (from a mixture distribution):

## input
# the default random generating function
rmixed_payment_no <- function(n, claim_size, claim_size_benchmark_1, claim_size_benchmark_2) {
# construct the range indicators
test_1 <- (claim_size_benchmark_1 < claim_size & claim_size <= claim_size_benchmark_2)
test_2 <- (claim_size > claim_size_benchmark_2)

# if claim_size <= claim_size_benchmark_1
no_pmt <- sample(c(1, 2), size = n, replace = T, prob = c(1/2, 1/2))
# if claim_size is between the two benchmark values
no_pmt[test_1] <- sample(c(2, 3), size = sum(test_1), replace = T, prob = c(1/3, 2/3))
# if claim_size > claim_size_benchmark_2
no_pmt_mean <- pmin(8, 4 + log(claim_size/claim_size_benchmark_2))
prob <- 1 / (no_pmt_mean - 3)
no_pmt[test_2] <- stats::rgeom(n = sum(test_2), prob = prob[test_2]) + 4

no_pmt
}

Since the random function directly takes claim_size as an input, no additional parameterisation is required (unlike in Example 3.1, where we first need a paramfun that turns the claim_size into Weibull parameters). We can simply run claim_payment_no() without inputting a paramfun.

## output
no_payments <- claim_payment_no(n_vector, claim_sizes, rfun = rmixed_payment_no,
claim_size_benchmark_1 = 0.0375 * ref_claim,
claim_size_benchmark_2 = 0.075 * ref_claim)
no_payments[[1]]
#>  [1]  5  2  1  4  7  2  6  2  4  8  1  5  8  7 11  2  1  7  5  5  4  4  6  6  8
#> [26]  4  4  4  4  4  7  8  3 15  3  5  4  6  2  5  2  4  5  2  5  5 11  4  8  5
#> [51]  3  5  9 11  4  4  4  3 11  9  3  5  5  7  9  4  1  3  8  4  7 22 12  3  4
#> [76]  5  2  3  6  4  4  4  2  6  8  5  8  5  7  1

Note that the claim_size_benchmark_1 and claim_size_benchmark_2 are passed on to rmixed_payment_no and will not be required if we choose an alternative sampling distribution.

This mixture sampling distribution has been included as the default. There is no need to reproduce the above code if the user is happy with this default distribution. A simple equivalent to the above code is just

no_payments <- claim_payment_no(n_vector, claim_sizes)

This example is here only to demonstrate how the default function operates. If one would like to keep the structure of this function but modify the benchmark values, they may do so via

no_payments_tmp <- claim_payment_no(n_vector, claim_sizes,
claim_size_benchmark_2 = 0.1 * ref_claim)

## Example 5.2: Alternative distribution for number of partial payments

Suppose we want to use a zero truncated Poisson distribution instead, with the rate parameter as a function of claim_size:

## input
paymentNo_param <- function(claim_size) {
no_pmt_mean <- pmax(4, pmin(8, 4 + log(claim_size / 15000)))
c(lambda = no_pmt_mean - 3)
}

## output
no_payments_pois <- claim_payment_no(
n_vector, claim_sizes, rfun = actuar::rztpois, paramfun = paymentNo_param)
table(unlist(no_payments_pois))
#>
#>   1   2   3   4   5   6   7   8   9  10  11  12  13
#> 954 849 636 484 312 178  91  71  27  15   2   4   1

## Interlude: Claims Dataset

We can use the following code to create a claims dataset containing all individual claims features that we have simulated so far:

claim_dataset <- generate_claim_dataset(
frequency_vector = n_vector,
occurrence_list = occurrence_times,
claim_size_list = claim_sizes,
settlement_list = setldel,
no_payments_list = no_payments
)
str(claim_dataset)
#> 'data.frame':    3624 obs. of  7 variables:
#>  $claim_no : int 1 2 3 4 5 6 7 8 9 10 ... #>$ occurrence_period: num  1 1 1 1 1 1 1 1 1 1 ...
#>  $occurrence_time : num 0.624 0.121 0.222 0.454 0.591 ... #>$ claim_size       : num  93291 1825 4440 32288 237696 ...
#>  $notidel : num 1.16 3.94 2.39 1.13 2.09 ... #>$ setldel          : num  11.117 0.992 0.183 4.788 5.462 ...
#>  $no_payment : num 5 2 1 4 7 2 6 2 4 8 ... test_claim_dataset, included as part of the package, is an example dataset of individual claims features resulting from a specific run with the default assumptions. str(test_claim_dataset) #> 'data.frame': 3624 obs. of 7 variables: #>$ claim_no         : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $occurrence_period: num 1 1 1 1 1 1 1 1 1 1 ... #>$ occurrence_time  : num  0.624 0.121 0.222 0.454 0.591 ...
#>  $claim_size : num 785871 22562 215771 117654 31627 ... #>$ notidel          : num  0.0652 1.1772 2.5262 0.9262 1.6507 ...
#>  $setldel : num 18.23 2.33 34 11.98 11.81 ... #>$ no_payment       : num  6 4 11 6 4 12 1 9 2 5 ...

## Example 6.1: Default Distribution

The default function samples the sizes of partial payments conditional on the number of partial payments, and the size of the claim:

## input
rmixed_payment_size <- function(n, claim_size) {
# n = number of simulations, here n should be the number of partial payments
if (n >= 4) {
# 1) Simulate the "complement" of the proportion of total claim size
#    represented by the last two payments
p_mean <- 1 - min(0.95, 0.75 + 0.04*log(claim_size/(0.10 * ref_claim)))
p_CV <- 0.20
p_parameters <- get_Beta_parameters(target_mean = p_mean, target_cv = p_CV)
last_two_pmts_complement <- stats::rbeta(
1, shape1 = p_parameters[1], shape2 = p_parameters[2])
last_two_pmts <- 1 - last_two_pmts_complement

# 2) Simulate the proportion of last_two_pmts paid in the second last payment
q_mean <- 0.9
q_CV <- 0.03
q_parameters <- get_Beta_parameters(target_mean = q_mean, target_cv = q_CV)
q <- stats::rbeta(1, shape1 = q_parameters[1], shape2 = q_parameters[2])

# 3) Calculate the respective proportions of claim amount paid in the
#    last 2 payments
p_second_last <- q * last_two_pmts
p_last <- (1-q) * last_two_pmts

# 4) Simulate the "unnormalised" proportions of claim amount paid
#    in the first (m - 2) payments
p_unnorm_mean <- last_two_pmts_complement/(n - 2)
p_unnorm_CV <- 0.10
p_unnorm_parameters <- get_Beta_parameters(
target_mean = p_unnorm_mean, target_cv = p_unnorm_CV)
amt <- stats::rbeta(
n - 2, shape1 = p_unnorm_parameters[1], shape2 = p_unnorm_parameters[2])

# 5) Normalise the proportions simulated in step 4
amt <- last_two_pmts_complement * (amt/sum(amt))
# 6) Attach the last 2 proportions, p_second_last and p_last
amt <- append(amt, c(p_second_last, p_last))
# 7) Multiply by claim_size to obtain the actual payment amounts
amt <- claim_size * amt

} else if (n == 2 | n == 3) {
p_unnorm_mean <- 1/n
p_unnorm_CV <- 0.10
p_unnorm_parameters <- get_Beta_parameters(
target_mean = p_unnorm_mean, target_cv = p_unnorm_CV)
amt <- stats::rbeta(
n, shape1 = p_unnorm_parameters[1], shape2 = p_unnorm_parameters[2])
# Normalise the proportions and multiply by claim_size to obtain the actual payment amounts
amt <- claim_size * amt/sum(amt)

} else {
# when there is a single payment
amt <- claim_size
}
return(amt)
}

## output
payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments,
rfun = rmixed_payment_size)
payment_sizes[[1]][[1]]
#> [1]  6092.875  6494.906  6321.400 65709.970  8671.977

As this is the default random generation function that SynthETIC adopts, a shorter equivalent command would be to call claim_payment_no without specifying a rfun.

payment_sizes <- claim_payment_size(n_vector, claim_sizes, no_payments)

## Example 6.2: Alternative payment size distribution

Let’s consider a simplistic example where we assume the partial payment sizes are (stochastically) equal. This will result in the following simulation function:

## input
unif_payment_size <- function(n, claim_size) {
prop <- runif(n)
prop.normalised <- prop / sum(prop)

return(claim_size * prop)
}

## output
# note that we don't need to specify a paramfun as rfun is directly a function
# of claim_size
payment_sizes_unif <- claim_payment_size(n_vector, claim_sizes, no_payments,
rfun = unif_payment_size)
payment_sizes_unif[[1]][[1]]
#> [1] 39081.067 18704.306  8875.815  6842.167 26536.580

## 7. Claim Payment Time

The simulation of the inter-partial delays is almost identical to that of partial payment sizes, except that it also depends on the claim settlement delay - the inter-partial delays should add up to the settlement delay.

Other than this, the SynthETIC function implementation of claim_payment_delay() is almost the same as claim_payment_size(), but of course, with a different default simulation function:

## input
r_pmtdel <- function(n, claim_size, setldel, setldel_mean) {
result <- c(rep(NA, n))

# First simulate the unnormalised values of d, sampled from a Weibull distribution
if (n >= 4) {
# 1) Simulate the last payment delay
unnorm_d_mean <- (1 / 4) / time_unit
unnorm_d_cv <- 0.20
parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
result[n] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])

# 2) Simulate all the other payment delays
for (i in 1:(n - 1)) {
unnorm_d_mean <- setldel_mean / n
unnorm_d_cv <- 0.35
parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
result[i] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])
}

} else {
for (i in 1:n) {
unnorm_d_mean <- setldel_mean / n
unnorm_d_cv <- 0.35
parameters <- get_Weibull_parameters(target_mean = unnorm_d_mean, target_cv = unnorm_d_cv)
result[i] <- stats::rweibull(1, shape = parameters[1], scale = parameters[2])
}
}

# Normalise d such that sum(inter-partial delays) = settlement delay
# To make sure that the pmtdels add up exactly to setldel, we treat the last one separately
result[1:n-1] <- (setldel/sum(result)) * result[1:n-1]
result[n] <- setldel - sum(result[1:n-1])

return(result)
}

param_pmtdel <- function(claim_size, setldel, occurrence_period) {
# mean settlement delay
if (claim_size < (0.10 * ref_claim) & occurrence_period >= 21) {
a <- min(0.85, 0.65 + 0.02 * (occurrence_period - 21))
} else {
a <- max(0.85, 1 - 0.0075 * occurrence_period)
}
mean_quarter <- a * min(25, max(1, 6 + 4*log(claim_size/(0.10 * ref_claim))))
target_mean <- mean_quarter / 4 / time_unit

c(claim_size = claim_size,
setldel = setldel,
setldel_mean = target_mean)
}

## output
payment_delays <- claim_payment_delay(
n_vector, claim_sizes, no_payments, setldel,
rfun = r_pmtdel, paramfun = param_pmtdel,
occurrence_period = rep(1:I, times = n_vector))

# payment times on a continuous time scale
payment_times <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays)
# payment times in periods
payment_periods <- claim_payment_time(n_vector, occurrence_times, notidel, payment_delays,
discrete = TRUE)
cbind(payment_delays[[1]][[1]], payment_times[[1]][[1]], payment_periods[[1]][[1]])
#>           [,1]      [,2] [,3]
#> [1,] 2.5234067  4.305932    5
#> [2,] 1.8768774  6.182809    7
#> [3,] 3.0022450  9.185054   10
#> [4,] 2.7343215 11.919376   12
#> [5,] 0.9802676 12.899643   13

## 8. Claim Inflation

### Input parameters

• Base Inflation: base_inflation_past = vector of historic quarterly inflation rates for the past $$I$$ periods, base_inflation_future = vector of expected quarterly base inflation rates for the next $$I$$ periods (users may also choose to simulate the future inflation rates); the lengths of the vector might differ from $$I$$ when a time_unit different from calendar quarter is used
• By default we assume nil base inflation (see documentation for claim_payment_inflation)
• Superimposed Inflation with respect to occurrence time: SI_occurrence = function of occurrence_time and claim_size that outputs the superimposed inflation index with respect to the occurrence time of the claim
• Superimposed Inflation with respect to payment time: SI_payment = function of payment_time and claim_size that outputs the superimposed inflation index with respect to payment time of the claim
# Base inflation: a vector of quarterly rates
# In this demo we set base inflation to be at 2% p.a. constant for both past and future
# Users can choose to randominise the future rates if they wish
demo_rate <- (1 + 0.02)^(1/4) - 1
base_inflation_past <- rep(demo_rate, times = 40)
base_inflation_future <- rep(demo_rate, times = 40)
base_inflation_vector <- c(base_inflation_past, base_inflation_future)

# Superimposed inflation:
# 1) With respect to occurrence "time" (continuous scale)
SI_occurrence <- function(occurrence_time, claim_size) {
if (occurrence_time <= 20 / 4 / time_unit) {1}
else {1 - 0.4*max(0, 1 - claim_size/(0.25 * ref_claim))}
}
# 2) With respect to payment "time" (continuous scale)
# -> compounding by user-defined time unit
SI_payment <- function(payment_time, claim_size) {
period_rate <- (1 + 0.30)^(time_unit) - 1
beta <- period_rate * max(0, 1 - claim_size/ref_claim)
(1 + beta)^payment_time
}

### Implementation and Output

# shorter equivalent code:
# payment_inflated <- claim_payment_inflation(
#   n_vector, payment_sizes, payment_times, occurrence_times, claim_sizes,
#   base_inflation_vector)
payment_inflated <- claim_payment_inflation(
n_vector,
payment_sizes,
payment_times,
occurrence_times,
claim_sizes,
base_inflation_vector,
SI_occurrence,
SI_payment
)
cbind(payment_sizes[[1]][[1]], payment_inflated[[1]][[1]])
#>           [,1]       [,2]
#> [1,]  6092.875   7253.092
#> [2,]  6494.906   8342.006
#> [3,]  6321.400   9168.372
#> [4,] 65709.970 106458.731
#> [5,]  8671.977  14618.468

## Interlude: Transaction Dataset

Use the following code to create a transactions dataset containing full information of all the partial payments made.

# construct a "claims" object to store all the simulated quantities
all_claims <- claims(
frequency_vector = n_vector,
occurrence_list = occurrence_times,
claim_size_list = claim_sizes,
settlement_list = setldel,
no_payments_list = no_payments,
payment_size_list = payment_sizes,
payment_delay_list = payment_delays,
payment_time_list = payment_times,
payment_inflated_list = payment_inflated
)
transaction_dataset <- generate_transaction_dataset(
all_claims,
adjust = FALSE # to keep the original (potentially out-of-bound) simulated payment times
)
str(transaction_dataset)
#> 'data.frame':    19530 obs. of  12 variables:
#>  $claim_no : int 1 1 1 1 1 2 2 3 4 4 ... #>$ pmt_no           : num  1 2 3 4 5 1 2 1 1 2 ...
#>  $occurrence_period: num 1 1 1 1 1 1 1 1 1 1 ... #>$ occurrence_time  : num  0.624 0.624 0.624 0.624 0.624 ...
#>  $claim_size : num 93291 93291 93291 93291 93291 ... #>$ notidel          : num  1.16 1.16 1.16 1.16 1.16 ...
#>  $setldel : num 11.1 11.1 11.1 11.1 11.1 ... #>$ payment_time     : num  4.31 6.18 9.19 11.92 12.9 ...
#>  $payment_period : num 5 7 10 12 13 5 6 3 4 5 ... #>$ payment_size     : num  6093 6495 6321 65710 8672 ...
#>  $payment_inflated : num 7253 8342 9168 106459 14618 ... #>$ payment_delay    : num  2.52 1.88 3 2.73 0.98 ...

test_transaction_dataset, included as part of the package, is an example dataset showing full information of the claims features at a transaction/payment level, generated by a specific SynthETIC run with the default assumptions.

str(test_transaction_dataset)
#> 'data.frame':    18983 obs. of  12 variables:
#>  $claim_no : int 1 1 1 1 1 1 2 2 2 2 ... #>$ pmt_no           : num  1 2 3 4 5 6 1 2 3 4 ...
#>  $occurrence_period: num 1 1 1 1 1 1 1 1 1 1 ... #>$ occurrence_time  : num  0.624 0.624 0.624 0.624 0.624 ...
#>  $claim_size : num 785871 785871 785871 785871 785871 ... #>$ notidel          : num  0.0652 0.0652 0.0652 0.0652 0.0652 ...
#>  $setldel : num 18.2 18.2 18.2 18.2 18.2 ... #>$ payment_time     : num  4.2 7.1 11.2 14.4 18.5 ...
#>  $payment_period : num 5 8 12 15 19 19 3 3 4 4 ... #>$ payment_size     : num  25105 26177 26333 26341 592457 ...
#>  $payment_inflated : num 25632 27113 27829 28294 649128 ... #>$ payment_delay    : num  3.51 2.9 4.06 3.29 4.01 ...

## Output

SynthETIC includes an output function which summarises the claim payments by occurrence and development periods. The usage of the function takes the form

claim_output(
frequency_vector = ,
payment_time_list = ,
payment_size_list = ,
aggregate_level = 1,
incremental = TRUE,
future = TRUE,
)

Note that by default, we aggregate all out-of-bound transactions into the maximum development period. But if we set adjust = FALSE, then the function would produce a separate “tail” column to represent all payments beyond the maximum development period (see function documentation ?claim_output).

Examples:

# 1. Constant dollar value INCREMENTAL triangle
output <- claim_output(n_vector, payment_times, payment_sizes,
incremental = TRUE)

# 2. Constant dollar value CUMULATIVE triangle
output_cum <- claim_output(n_vector, payment_times, payment_sizes,
incremental = FALSE)

# 3. Actual (i.e. inflated) INCREMENTAL triangle
output_actual <- claim_output(n_vector, payment_times, payment_inflated,
incremental = TRUE)

# 4. Actual (i.e. inflated) CUMULATIVE triangle
output_actual_cum <- claim_output(n_vector, payment_times, payment_inflated,
incremental = FALSE)

# Aggregate at a yearly level
claim_output(n_vector, payment_times, payment_sizes, aggregate_level = 4)
#>            DP1      DP2      DP3      DP4      DP5     DP6     DP7     DP8
#> AP1   792897.9  7703556 13398320  8496214  8550638 5902104 4980147 3293743
#> AP2   656662.0  6378950 13209783 17557894  7083117 5040601 4554371 4014801
#> AP3  1071847.6 10066061 12834250 11454990  6937617 8616116 4492922 3300173
#> AP4   556048.2 10060486 13688978 11277335  8053791 4231411 7721810 1731063
#> AP5  1775284.1 14840018 13780412 12638620  9922496 8517462 7897407 3709178
#> AP6  1668731.7 11597632 13267082 11234347  6573692 7261552 4057221 1117304
#> AP7   908162.9  8818871 15814701 11508374  7535432 4381094 4654829 2606762
#> AP8  1728070.3  9410898 11835411 11730882 10130446 3786122 2715643 1740342
#> AP9  1557396.8 12562977 13585206 10794725  9910423 6400001 4383061 1041704
#> AP10 1273688.5  6315236  8825411 11759917 12122095 8178040 4229625 1242914
#>            DP9       DP10
#> AP1  4426320.1 4483511.62
#> AP2  6246294.7 4295226.81
#> AP3  2197992.3 5058315.98
#> AP4  4798279.7 3416133.67
#> AP5  1226731.8 1712299.93
#> AP6  2535350.6   44439.34
#> AP7  2409058.7 1880768.33
#> AP8  3896563.0 3887402.93
#> AP9   670312.1 3354093.42
#> AP10  752631.2 2864835.16

Note that by setting future = FALSE we can obtain the upper left part of the triangle (i.e. only the past claim payments). The past data can then be used in conjunction with the ChainLadder package to perform chain-ladder reserving analysis:

# output the past cumulative triangle
cumtri <- claim_output(n_vector, payment_times, payment_sizes,
aggregate_level = 4, incremental = FALSE, future = FALSE)
# calculate the age to age factors
selected <- attr(ChainLadder::ata(cumtri), "vwtd")
# complete the triangle
CL_prediction <- cumtri
J <- nrow(cumtri)
for (i in 2:J) {
for (j in (J - i + 2):J) {
CL_prediction[i, j] <- CL_prediction[i, j - 1] * selected[j - 1]
}
}

CL_prediction
#>            DP1      DP2      DP3      DP4      DP5      DP6      DP7      DP8
#> AP1   792897.9  8496454 21894774 30390987 38941626 44843730 49823877 53117620
#> AP2   656662.0  7035612 20245395 37803289 44886406 49927007 54481378 58496179
#> AP3  1071847.6 11137909 23972159 35427149 42364765 50980881 55473803 58773976
#> AP4   556048.2 10616534 24305512 35582847 43636639 47868050 55589859 59280813
#> AP5  1775284.1 16615302 30395714 43034335 52956831 61474292 68379685 72919834
#> AP6  1668731.7 13266363 26533446 37767792 44341484 50771732 56474909 60224626
#> AP7   908162.9  9727034 25541734 37050108 44985564 51509215 57295232 61099416
#> AP8  1728070.3 11138968 22974380 34159044 41475287 47489889 52824417 56331756
#> AP9  1557396.8 14120374 31415758 46709956 56714375 64938898 72233468 77029494
#> AP10 1273688.5 12142962 27016308 40168712 48772117 55844880 62117922 66242314
#>           DP9     DP10
#> AP1  57543940 62027452
#> AP2  64742474 69786856
#> AP3  64393998 69411229
#> AP4  64949299 70009796
#> AP5  79892496 86117285
#> AP6  65983361 71124426
#> AP7  66941799 72157541
#> AP8  61718251 66527002
#> AP9  84395125 90970736
#> AP10 72576466 78231230

We observe that the chain-ladder analysis performs very poorly on this simulated claim dataset. This is perhaps unsurprising in view of the data features and the extent to which they breach chain ladder assumptions. Data sets such as this are useful for testing models that endeavour to represent data outside the scope of the chain-ladder.

## Plot of Cumulative Claims Payments

Note that by default, similar to the case of claim_output and claim_payment_inflation, we will truncate the claims development such that payments that were projected to fall out of the maximum development period are forced to be paid at the exact end of the maximum development period allowed. This convention will cause some concentration of transactions at the end of development period $$I$$ (shown as a surge in claims in the $$I$$th period).

Users can set adjust = FALSE to see the “true” picture of claims development without such artificial adjustment. If the plots look significantly different, this indicates to the user that the user’s selection of lag parameters (notification and/or settlement delays) is not well matched to the maximum number of development periods allowed, and consideration might be given to changing one or the other.

plot(test_claims_object)

# compare with the "full complete picture"
plot(test_claims_object, adjust = FALSE)