Abstract:
In today's world, protecting information has become one of the most difficult tasks. Cyber
security events and data breaches continue to be expensive events that affect people and businesses all around the world. A breach occurs when sensitive information is accessed. Moreover, cyber threats are constantly evolving in order to take advantage of online behavior and trends, especially when teleworking has become a necessity due to the global invasion and prevalence of the Coronavirus disease 2019 during the past two years. Therefore, the necessity for cyber insurance, which covers the liability for a cyber-breach, becomes more evident as more business activities are automated and an increasing number of computers are used to hold sensitive information.
Unfortunately, research on cyber risk modeling has been fragmented and uncoordinated till date due to the lack of historical data available on cyber incidents which does not allow insurance premiums to be accurately priced, in addition to the constantly changing nature of cyber risk which makes the data easily become out-of-date. Hence, the aim of this thesis was the ratemaking of aggregate cyber loss. The VERIS dataset, one of the most extensive and publicly available datasets for global incident breaches, was used in this study. The main variables in the VERIS dataset are: type of breach, amount of a breach, timeline of the breach, Actors, Motive, Country, Variety, Assets, and Attributes.
Since the loss amounts are available in contrast to the loss frequency, we modeled, in this research, only the cyber risk severity, as a first step toward pricing cyber insurance coverage policies which require both the severity and the frequency distribution of cyber losses using the R programming language; R studio 4.0.3. First, the severity distribution was estimated using the loss distribution approach. Second, using machine learning, the Random Forest algorithm was applied to the data in order to select the most important variables that have the highest significant impact on cyber risk losses. Next, we applied the Generalized Linear Model using the most important variables selected by the Random Forest and the fitted distribution, in order to estimate the future loss amount. Last, we used the classical credibility theory to estimate the minimum number of observations required to reach 95% level of accuracy I modeling cyber risk.
Keywords: Cyber risk, Cyber security, Cyber insurance, Ratemaking, Loss Distribution Approach, Machine Learning, Random Forest, Generalized Linear Model, Classical credibility theory, R Studio. 4.0.3. First, the severity distribution was estimated using the loss distribution approach. Second, using machine learning, the Random Forest algorithm was applied to the data in order to select the most important variables that have the highest significant impact on cyber risk losses. Next, we applied the Generalized Linear Model using the most important variables selected by the Random Forest and the fitted distribution, in order to estimate the future loss amount. Last, we used the classical credibility theory to estimate the minimum number of observations required to reach 95% level of accuracy I modeling cyber risk.
Description:
"A thesis submitted to the Faculty of Natural and Applied Sciences in partial fulfillment of the requirements for the degree of Master of Science in Actuarial Sciences"; M.S. -- Faculty of Natural and Applied Sciences, Notre Dame University, Louaize, 2021; Includes bibliographical references (pages 85-86).