Performance Tuning of Cortex XSOAR Server

Identify common causes of slow system performance and implement system improvements in Cortex XSOAR. Troubleshooting.
Performance tuning may include general troubleshooting for memory and CPU usage, or troubleshooting of more specific UI issues, for example playbook execution. The information in this article will help you identify common causes of slow system performance and implement system improvements.
Memory Issues
Memory issues are a common cause of slow system performance. To verify if memory issues are causing performance issues, check for memory spikes in the system health dashboard and search the journalctl log for the following entry:
kernel: Out of memory: Kill process X (server) score X or sacrifice child
.
Issue
Verification
Solution
Storing large amount of historical data (more than 1 year)
Check the file sizes in the partition folder to verify that no file is larger than 10GB. The default directory is
/var/lib/demisto/data/partitionsData
.
Archive old data. For more information, learn how to Free up Disk Space with Data Archiving.
Inefficient Playbooks storing too much data
Validate with Largest Incidents by Storage Size widget. This widget is part of the Common Widgets content pack.
Archive old data. For more information, learn how to Free up Disk Space with Data Archiving
Playbooks performing many loops or creating many entries
Verify with dbstats route
https://localhost:8443/dbstats
that no investigation has more than 500 entries.
Reduce the number of loops and/or entries.
Server machine does not meet the minimum memory requirements
Open the
docker.log
file in the log bundle and check the memory and CPU of the machine.
Verify that your server meets the minimum memory requirements.
Storage does not meet minimum requirements
N/A
Verify that the disk is SSD based, with 3k dedicated IOPS.
Cortex XSOAR may show incorrect memory usage when using Golang v1.12 and later
See this KB article.
Extreme user searches - search parameters
Check dashboard and widget logs, as well as the debug log, for queries that take more than 30 seconds. Examples:
2018-12-03 13:28:43:5918 info [POST] "/statistics/widgets/query" 200 25.835815953s
2018-12-03 13:58:07.031 debug Getting stats for widget active-incidents-by-type Active
Dashboards/Widgets:
  • Check your dashboards and widgets to see if they are using the
    All times
    time range and modify the time range as needed.
  • Confirm that manual searches are not being run with the
    All times
    time range.
  • Select the
    Hide the Panel
    option on the incidents page.
Playbooks:
Check for tasks that execute a query but do not have a time range argument specified, or where the time range is too broad.
Too many parallel searches
Contact Cortex XSOAR customer support.
Add the following server configuration and value to limit the number of parallel searches:
workers.count.search = 10
Too much data indexing of investigation tasks or entries
Check folders with prefix
/var/lib/demisto/data/demistoidx/invTaskIdx_
for files larger than 3GB.
Add the following server configuration and value to limit indexing:
investigation.task.partial.index = 7
Docker containers
Check
docker.log
file. Verify number of running containers, machine CPU/memory and Docker stats that indicate whether a container is consuming too many resources.
Audits index folder
Check the file sizes in the audits index folder to verify that no file is larger than 3GB. The default directory is
/var/lib/demisto/data/demistoidx/audits
.
Messages index
Check the file size of the messages index to verify it isn’t larger than 3GB. The default directory is
/var/lib/demisto/data/demistoidx/messages
.
Drop messages and delete the index. Add configuration setting:
disable.msgs.sending
CPU Spikes
Check for CPU spikes by viewing the system health dashboard or by using system tools such as the Linux top command.
Issue
Verification
Solution
Too many threads
Check threads in the
go_stats.log
file. More than 3000 threads (referred to as goroutines in Golang) indicates a possible thread leak or too many processes/tasks running in parallel.
Contact Cortex XSOAR customer support. Restart service.
Workers overloaded
Check
workers.log
file.
Available
or
Buffer Space
==
0
, indicates the system is overloaded. If
Total
==
Busy
, the system has all workers busy and you need to increase users.
Docker containers overloaded
Check
docker.log
file. Verify number of running containers, machine CPU/memory and Docker stats that indicate whether a container is consuming too many resources.
Change limit for pool of running Docker images (default is 20) with the server configuration:
containers.high.water.mark
or for a specific Docker image
containers.high.water.mark.${image_name}
Slow Playbooks
Issue
Verification
Solution
Indicator Extraction Enabled
N/A
Check Indicator Extraction settings and turn off Indicator Extraction where not needed. See Indicator Extraction.
Enrichment integrations that fail or timeout
Verify you don’t have enrichment integrations that fail frequently or experience timeouts. This might occur with free enrichment services that quickly exceed quotas.
Depending on the integration, you might need to increase the quota or modify integration settings.
Playbook storing a large amount of data
Check if Playbooks are storing more than 0.5 MB per incident. Confirm by running
!PrintContext
in War Room, downloading output entry to a file and checking file size. If file size is not over 0.5 MB, run
!Print value=${incident}
to view incident data.
View Playbook metadata to understand which tasks are generating large amounts of data and then optimize tasks to reduce data storage.
Complex Playbooks with many tasks
Check for Playbooks with a large number of tasks or sub playbooks.
Add the following server configuration and value to prevent repeated task checks in Playbooks:
playbook.willnotexecute.old.eval = false
Other Possible Performance Issues
Issue
Verification
Solution
Insufficient Disk Space
Check the
filesystem.log
in the log bundle for large files and folders.
Archive old data. For more information, learn how to free up disk space with data archiving.
Indicators Page
Check for noticeable lags on the Indicators page.
Exclude indicators that appear in every incident.
Overall slow UI
Check network latency and ping other XSOAR components, such as engines from the server.
Check with IT regarding latency between client and server.
WebSockets
If the Cortex XSOAR server is responding slowly and does not receive data updates on certain pages and actions, the WebSocket might be disconnecting.
  1. Confirm that this issue persists across different browsers (Chrome, Firefox, etc.)
  2. Check WebSocket messages on the server.
  3. Check the
    server.log
    file for messages to confirm that the WebSocket scenario is working, e.g., WebSocket req arrived and HTTP connection upgraded to WebSocket.
  4. Check for WebSocket errors such as: Closing WebSocket ReadPump with err: websocket: close 1005 (no status).
Solution
Configure the WebSocket buffer size.

Recommended For You