Cyber security
06/05/2021

Sizing your SIEM solution

Sizing your SIEM solution

Do you know how to calculate the size of a SIEM solution that meets your needs?
Avoid any financial surprises: Get the basic formulas for calculating the correct sizing of your SIEM installation here – before you make your purchase.

In Denmark, we are currently in a phase where many customers are switching from one SIEM supplier to another because they have been disappointed with the economy and the service that has been provided. The financial surprises are often due to a lack of overview – both among themselves and the suppliers. Therefore, they have ended up in a situation where the solution has suddenly become twice as expensive as first assumed.

Before you acquire a SIEM solution, it is therefore crucial that you have control over the scope of the solution, ie. number of software licenses, server investment and operating costs, etc. You must have control of the sizing of the solution, i.e. how much equipment you want to monitor, and the number of logs, so that you ultimately calculate the costs of the entire solution.

The factor many use in the SIEM world is EPS events per day. second, which is the log data a given IT equipment generates per. second, whether it’s a server, business system, client, or network.

Devices do not generate the same amount of EPS (for example, Windows servers tend to generate much larger logs than Linux and Unix servers). So for the sake of convenience, we have in this article taken as our starting point an average value.

After calculating the number of EPS that each unit generates, the next step is to calculate the EPD value (events per day).

Ie. multiply by 86400 (24 hours x 60 minutes x 60 seconds):

EPD = ∑ EPS X 86400

Once the EPD value has been calculated, the size of an average log message must be found so that you get an overview of the daily need for storage space. IT equipment generates log files. These logs start from 200 bytes on network and infrastructure devices and up to 10 kilobytes or more on the application and database page. The Syslog standard (RFC 5424) sets the maximum size of a log message to 2 kilobytes. Based on this information, it would be fair to assume that a raw log message size will be 500 bytes. Ie. the average raw log message size is set to 500 bytes and the number of daily log messages in GB is calculated as follows:

Daily raw log size = EPD * 500 / (1024) 3

The SIEM system makes some changes to the log messages to make them understandable and meaningful in the SIEM system itself. This operation is called “Normalization”, which increases the log size depending on the solution used. My personal experience is that the log size increases by approx. 90 to 100% after the “Normalization”. Some have seen an increase of up to 200%. Ie. the result of daily normalized log size is calculated according to the following formula:

Daily normalized log size = Daily raw log size * 2

The calculated value does not represent the actual daily amount of data for a SIEM system.

SIEM manufacturers come up with different compression solutions, and some claim that they compress logs 10 times (10: 1), which is quite optimistic. It is my assessment that one can well assume a compression ratio of 8: 1 for calculations.

There are suppliers who choose to compress most of the data, so you only have quick access to data that is created in the near future. This method is good and useful, and economical for the solution

So the formula will be:

Daily storage needs = Daily normalized log size / 8

A decision must be made as to how long (number of months / years) you want to store your logs, also called the Retention period.

Here you must be aware that the longer you want to store data, the larger and more expensive the SIEM solution will be.

In this example, I assume  that the retention period is 1 year i.e. 365 days. This means that the annual storage requirement would be roughly 365 times the daily storage requirement if you want to be on the safe side. Nevertheless, EPS numbers are falling drastically on weekends and holidays. So be aware that the current EPS number you have is not the average EPS number for the entire period. We often meet customers who come with a very high EPS number and when we ask about their storage consumption, this is not related to the EPS number. So it is highly recommended calculate on the EPS number for the entire year or use this model to do the math from EPS numbers to storage size and the other way around from storage size to EPS numbers:

Annual storage consumption = Daily storage consumption * 365

As for the retention period I have seen many decision makers try to stay on the very safe side and choose retention periods that are unnecessarily long.

According to the company Mandiant Solutions, the average number of days a hacker was present on a victim’s network before they were discovered: 197 days in 2019, 205 days in 2014, from 229 days in 2013 and 243 days in 2012. This brings me to the conclusion that the retention period for security alarms and surveillance alarms should be at least 1 year and no longer than 3 years. The same study by Mandiant Solutions states that “the longest time a hacker was present before he was discovered was six years and three months”. Last but not least, the storage period naturally depends on the vulnerability of the data you have, and how strict the requirements are for the retention period. Eg. the requirements in the financial sector will be high, just as there are also higher requirements when it comes to citizens’ personal data.

I hope you can use this model to calculate your EPS and how much Storage is needed.

It can be a really good idea to also calculate the opposite way – ie. you know your storage consumption your retention period and then you can calculate your EPS, because customers often tend to calculate the average EPS consumption per year incorrect, i.e. they make measurements in a peak period and do not take into account periods of holidays, public holidays, etc.

In CapMon, we have developed models that can quickly and accurately calculate your Storage. If you want to go further, you can contact me. and have an informal chat about how your SIEM solution should be dimensioned.

Karsten Højer
+45 4079 0385

kh@capmon.dk