Pattern discovery is a general technique to derive patterns from the data itself, without using explicit knowledge about the data. Previously, I described how to discover patterns in Solvency 2 data, for example that values of column x are most of the time or (almost) always equal to the values of column y, or higher, or lower, etc. I also showed how to find columns whose values are the sum of values of other columns, which often occurs in financial reports.

Here I will take pattern discovery one step further and discover constant ratios between values of columns. That is, patterns where the quotient between values of two columns equals a constant.

An example will clarify this. We take the own funds data of Dutch insurance undertakings (sheet 20 of the Excel file) and select the last year of data.

df = get_sheet(20) df = df.xs(datetime(2017,12,31), axis = 0, level = 1, drop_level = False)

Now we run the generate-function from the insurlib package on Github. The generate-function allows the parameter pattern = “ratio”. Make sure the minimum support is higher than one otherwise you end up with a lot of ratio-patterns. Patterns are generated with the following code, with minimum support 3 and minimum confidence 10%.

rules = patterns.generate(dataframe = df, pattern = "ratio", parameters = {"min_confidence": 0.1, "min_support": 3})) rules = list(rules)

The first two patterns in the list are

['ratio', '9/20', ['mcr , total', 'scr , total']], 19, 120, 0.1367]

and

['ratio', '1/4', ['mcr , total', 'scr , total']], 36, 103, 0.259]

The two patterns state that the quotient between the values of the columns ‘mcr , total’ and ‘scr , total’ is 9/20 in 19 cases with confidence of almost 14% and is 1/4 (with 120 exception to this rule) is 36 cases with confidence of almost 26% (with 103 exceptions). In Solvency 2, the acronym MCR stands for Minimum Capital Requirement, and SCR stand for Solvency Capital Requirement (the solvency 2 risk-based capital requirement with VaR 99,5%). So the identified ratios are 9/20 (i.e. 45%) and 1/4 (i.e. 25%).

It is not desirable that the MCR is too low or too high in relation to the SCR. Solvency 2 legislation therefore prescribes that the MCR is bounded between 25% and 45% of the SCR. We see that these legal boundaries are found in the data itself. We have 36 insurance undertakings where the ratio is 25%, 19 undertaking where the ratio is 45%. The rest (should) have a ratio somewhere between 25% and 45%.

The algorithm applies a brute-force search for ratios between columns. It uses the fraction module of Python’s standard library to convert every quotient to a fraction, which enables the use of rational number arithmetic. For example

Fraction(2.25) >>> Fraction(9,4)

The fraction expression you see in the pattern definition is the string representation of the fraction object.

A common problem with dealing with quotients is that often the original data set contains quotients like 0.2499999 and 0.2500001 because values are rounded off to euros. In the fraction module, this can be overcome by putting a limit on the denominator (default is 1e7, but you can pass this as a parameter). The result is that the nearest fraction is found given the limit on the denominator.

The other rules that were found in the own funds data are

['ratio', '9/100', ['available and eligible own funds|total eligible own funds to meet the mcr , tier 2', 'scr , total']], 3, 12, 0.2] ['ratio', '1/5', ['available and eligible own funds|total eligible own funds to meet the mcr , tier 2', 'mcr , total']], 12, 3, 0.8] ['ratio', '3/20', ['available and eligible own funds|total eligible own funds to meet the scr , tier 3', 'scr , total']], 6, 15, 0.2857] ['ratio', '1/3', ['available and eligible own funds|total eligible own funds to meet the scr , tier 3', 'mcr , total']], 3, 18, 0.1429]

These ratio-patterns are boundaries on the available and eligible own funds and are prescribed by Solvency 2 legislation. In the Netherlands, they have a relatively low support because a limited number of Dutch insurance undertakings hit the legal boundaries of tier two and tier three capital.

Here we looked at the own funds data. But many more patterns of this type can be found if you run the complete data set of Dutch insurance undertakings. It takes some time but then you will find in total 121 ratio-patterns (with minimum support of 3) covering loss-absorbing capacity of deferred taxes, impact of transitional measures, operational risk capital, etc. And of course other data sets than Solvency 2 data are possible.

These ratio-patterns work well because the Solvency 2 legislation contains legal boundaries that are represented in the data. Also other nonlegal constant ratios can be found, such as entity-specific ratios. You can find ratio-patterns for each insurance undertaking separately and then signal when a pattern is violated. This would work with a relatively low limit on the denominator of the fractions, because we want to find constant ratios with a high confidence. But for this to work well you will need more than two years of data.

A brute-force approach works well for the Dutch public Solvency 2 data set (around 1270 dimensions). But for high dimensional data it will take some time because of the possible combinations. Perhaps smart ways exist to detect ratios more easily, for example via statistical correlations.

The ratio-patterns that were discovered could be related to the Solvency 2 legislation by automatically reading ratios and percents in the text. And in the same manner, entity-specific ratios could be related to the SFCR documents of that entity. But that is for future work.