Performance Benchmark - Administrator Guide - 6.5 - Cortex XSOAR - Cortex - Security Operations

Cortex XSOAR Administrator Guide

Product
Cortex XSOAR
Version
6.5
Creation date
2022-09-28
Last date published
2024-02-11
End_of_Life
EoL
Category
Administrator Guide
Abstract

Details the Cortex XSOAR hardware specifications and requirements and benchmarking performance tests conducted in Cortex XSOAR labs.

Cortex XSOAR is designed to maximize performance and enable scalability, to provide the best experience and performance. A benchmarking process is conducted annually to ensure the best performance levels.

Cortex XSOAR performance is determined by compute, memory, and HD performance. Each component can impact a different part of the system, therefore it is important to ensure that you deploy Cortex XSOAR on an infrastructure that meets all requirements.

The amount of data each incident holds can have a significant impact on performance and disk space. To achieve optimal performance and disk usage, we recommend that an incident be no larger than 0.5mb.

Testing was performed with Cortex XSOAR v6.5 Build 2102531

Testing was performed for deployments with the Bolt database, with Elasticsearch, and with High Availability (Elasticsearch with multiple app servers). For all three environments, 60 events were ingested per minute from a SIEM.

Benchmark Process

The benchmarking process is executed using an integration that creates phishing events at a rate of 60 events per minute. Incidents are then ingested into Cortex XSOAR through the classification and mapping process, which creates phishing incidents in the system. Each incident automatically triggers the Phishing Investigation - Generic v2.

The Phishing Investigation - Generic v2 playbook performs the following actions:

  • Parse and process the email

  • Auto-run IOC extraction and reputation checks for all indicators

  • Extract attachments

  • Calculate incident severity based on IOCs

  • Notify users (administrators and the email sender) about the progress of the incident

  • Close the incident

Specifications

Database

Type

Release

CPU Cores

RAM

Disk

Bolt DB

c5.4xlarge

4.14.256-197.484.amzn2.x86_64

16

30.63 GB

9.77 TB Maximum IOPS (16 KiB I/O) 20,000

Elasticsearch

c5.4xlarge

4.14.231-173.361.amzn2.x86_64

16

30.63 GB

0.98 TB Maximum IOPS (16 KiB I/O) 20,000

High Availability Elasticsearch - 4 app servers with identical specifications

c5.9xlarge

4.14.246-187.474.amzn2.x86_64

36

68.69 GB

0.49 TB Maximum IOPS (16 KiB I/O) 80,000NFS shared file system

Elasticsearch Cluster Architecture

The following is the standalone Elasticsearch cluster architecture used for performance benchmark testing:

Node

Node ES Roles

Node CPU

Node Memory

Node JVM

Node AWS Instance Type

Node Disk Size and Type

Node ES Version

Master 1

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.9.0

Master 2

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.9.0

Master 3

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.9.0

Data 1

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Data 2

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Data 3

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Client 1

remote cluster client node

8 vCPUs

16.0 GiB

8 GB

c5.2xlarge

1000 GiB gp2

7.9.0

Client 2

remote cluster client node

8 vCPUs

16.0 GiB

8 GB

c5.2xlarge

1000 GiB gp2

7.9.0

High Availability Elasticsearch Cluster Architecture

The following is the high availability Elasticsearch cluster architecture used for performance benchmark testing:

Node

Node ES Roles

Node CPU

Node Memory

Node JVM

Node AWS Instance Type

Node Disk Size and Type

Node ES Version

Master 1

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.11.0

Master 2

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.9.0

Master 3

master-eligible, remote cluster client node

4 vCPUs

8.0 GiB

4 GB

c5.xlarge

1000 GiB gp2

7.9.0

Data 1

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Data 2

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Data 3

data, ingest, remote cluster client, transform

8 vCPUs

64.0 GiB

30 GB

r5.2xlarge

4000 GiB io1

7.9.0

Client 1

remote cluster client node

8 vCPUs

16.0 GiB

8 GB

c5.2xlarge

1000 GiB gp2

7.9.0

Client 2

remote cluster client node

8 vCPUs

16.0 GiB

8 GB

c5.2xlarge

1000 GiB gp2

7.9.0

Utilization

Average over 7 days.

Database

CPU

RAM

RAM XSOAR

RAM Docker

RAM Python

HDD

Bolt DB

37.1%

9.27 GB

6.30 GB

1.65 GB

1.72 GB

7.46 TB

Elasticsearch

70.7%

20.83 GB

3.05 GB

3.79 GB

3.99 GB

0.09 TB

High Availability - APP Server 1

20.3%

42.08 GB

2.81 GB

1.46 GB

1.78 GB

0.03 TB

High Availability - APP Server 2

20.3%

39.95 GB

2.81 GB

1.63 GB

1.89 GB

0.03 TB

High Availability - APP Server 3

19.5%

47.37 GB

2.24 GB

1.52 GB

1.80 GB

0.03 TB

High Availability - APP Server 4

20.1%

46.72 GB

2.49 GB

1.53 GB

1.88 GB

0.03 TB

Workers

Database

Total Workers

Busy (Average over 7 day period)

Bolt DB

600

271

Elasticsearch

1,000

166

High Availability Elasticsearch

1,000

63

Incidents

Database

Last 7 days closed incidents

Incidents per-hour rate

Bolt DB

239,713

1,384

Elasticsearch

552,287

1,679

High Availability Elasticsearch

964,668

5,686

Phishing Use Case

Based on the `Phishing Investigation - Generic v2` playbook. Average over 7 days.

Database

Fetch duration - how long it took for integration to fetch data - time

Ingestion duration - classification and mapping

Playbook duration - how long it took to run playbook on incident

Bolt DB

349ms

1m 59s 277ms

2m 47s 237ms

Elasticsearch

303ms

20s 441ms

41s 917ms

High Availability Elasticsearch

479ms

41s 9ms

41s 14ms

Searches

Average over all time.

Database

All time active

All time pending

All time closed

Bolt DB

2s 57ms

2s 2ms

3s 896ms

Elasticsearch

904ms

899ms

1s 940ms

High Availability Elasticsearch

353ms

332ms

1s 220ms

Average over 7 days.

Database

Last 7 days active

Last 7 days pending

Last 7 days closed

Bolt DB

1s 491ms

1s 481ms

2s 57ms

Elasticsearch

919ms

890ms

1s 17ms

High Availability Elasticsearch

370ms

332ms

708ms

Bolt Database Details

The following server configurations were used for the BoltDB testing environment:

Key

Value

content.unlock.scripts

CommonServerPython

crashed.inv.playbooks.rerun.disable

True

create.related.indicators.entry

True

custom.transformer.override.convertkeystotablefieldformat

True

execution.demisto rest api.demisto-api-post

False

feedintegrationscript.timeout

100

indicators.update.expiration.scheduled.job.enabled

False

instance.execute.external

True

investigation.task.partial.index

15

job.monitor.log

True

monitor.long.running.enabled

False

monitoring.pprof

True

instance.execute.external.edl_instance

True

playbook.willnotexecute.old.eval

False

relationships.enabled

False

tim.features.enabled

True

workers.count.tasks

600

Bolt DB - Web Client - JS Heap

Page

Total Size

Used Size

Automation

92.64 MB

73.13 MB

Incidents

104.46 MB

76.91 MB

Indicators

107.49 MB

69.17 MB

Integrations

93.08 MB

74.68 MB

Jobs

95.09 MB

75.37 MB

Login_page_load

52.58 MB

36.36 MB

Playbooks

104.05 MB

71.46 MB

Reports

67.78 MB

56.64 MB

Bolt DB - Visual

Page

First Visual Change

Fully Loaded

Largest Image

Last Visual Change

Speed Index

Time to First Byte

Automation

553ms

547ms

486ms

1s 112ms

774ms

20ms

Incidents

795ms

3s 990ms

824ms

8s 123ms

1s 200ms

20ms

Indicators

1s 39ms

145ms

1s 328ms

1s 466ms

1s 309ms

20ms

Integrations

978ms

636ms

1s 319ms

1s 462ms

1s 275ms

20ms

Jobs

477ms

22ms

484ms

922ms

493ms

20ms

Login_page_load

4s 359ms

7s 413ms

4s 361ms

4s 359ms

20ms

Playbooks

759ms

730ms

1s 56ms

3s 727ms

1s 44ms

20ms

Reports

535ms

636ms

539ms

1s 71ms

571ms

20ms

benchmark_standalone_bolt_busy_workers.png
benchmark_standalone_bolt_incidents_fetch_duration.png
benchmark_standalone_bolt_incidents_ingestion_duration.png
benchmark_standalone_bolt_incidents_per_hour.png
benchmark_standalone_bolt_incidents_search_duration.png
benchmark_standalone_bolt_playbook_execution_time.png

Elasticsearch Details

The following server configurations were used for the Elasticsearch testing environment:

Key

Value

containers.low.water.mark.demisto/python:1.3-alpine

40

create.related.indicators.entry

True

custom.transformer.override.convertkeystotablefieldformat

True

execution.demisto rest api.demisto-api-post

False

feedintegrationscript.timeout

60

instance.execute.external.edl_instance

True

investigation.task.partial.index

15

job.monitor.log

True

log.rolling.backups

20

monitor.long.running.enabled

False

monitoring.pprof

True

playbook.willnotexecute.old.eval

False

tim.features.enabled

True

workers.count.tasks

1,000

benchmark_standalone_elastic_busy_workers.png
benchmark_standalone_elastic_incidents_fetch_duration.png
benchmark_standalone_elastic_incidents_ingestion_duration.png
benchmark_standalone_elastic_incidents_per_hour.png
benchmark_standalone_elastic_incidents_search_duration.png
benchmark_standalone_elastic_playbook_execution_time.png

High Availability Details

The following server configurations were used for the high availability testing environment:

Key

Value

containers.low.water.mark.demisto/python:1.3-alpine

40

create.related.indicators.entry

True

custom.transformer.override.convertkeystotablefieldformat

True

disable.msgs.sending

True

execution.demisto rest api.demisto-api-post

False

feedintegrationscript.timeout

80

fetch.parallel.create.incidents.enable

False

fetch.parallel.create.incidents.size

2

instance.execute.external.edl_instance

True

investigation.task.partial.index

7

job.monitor.log

True

monitor.long.running.enabled

False

monitoring.pprof

True

playbook.willnotexecute.old.eval

False

reputation.calc.algorithm

1

reputation.calc.algorithm.tasks

1

tim.features.enabled

True

workers.count.tasks

1,000

benchmark_ha_busy_workers.png
benchmark_ha_incidents_fetch_duration.png
benchmark_ha_incidents_ingestion_duration.png
benchmark_ha_incidents_per_hour.png
benchmark_ha_incidents_search_duration.png
benchmark_ha_playbook_execution_time.png