Disaster Recovery and Live Backup Overview - Administrator Guide - 6.5 - Cortex XSOAR - Cortex

Disaster Recovery and Live Backup Overview - Administrator Guide - 6.5 - Cortex XSOAR - Cortex - Security Operations

Cortex XSOAR Administrator Guide

Product

Cortex XSOAR

Version

6.5

Creation date

2022-09-28

Last date published

2024-03-21

End_of_Life

EoL

Note

This chapter contains instructions for configuring Live Backup for a single server deployment. For multi-tenant deployments, follow the configure Live Backup instructions in the multi-tenant guide.Configure Live Backup

Server actions are mirrored in real-time. There might be pending actions due to high server load, connectivity issues, and so on. Consider the following:

Live Backup uses a single main server and a single standby server. Beyond these, additional servers are not currently supported.
Active/Active configuration is not currently supported.
Each host retains its own distinct IP address and host name.
Neither host has any awareness of which node is truly active. Therefore, failover is not dynamic, meaning that making a node active must be done manually, by an administrator.

In the event of a server failover, engines dynamically reconnect to the active host.

Note

When using Cortex XSOAR with Elasticsearch, Live Backup is not available. To back up or restore the contents of your Elasticsearch database, see Disaster Recovery for Elasticsearch. You can also implement a full high availability solution.

Warning

As the process of making a Cortex XSOAR server active is a manual process, it is conceivable that two servers could be active simultaneously. You must avoid this scenario because both hosts collect and work on potentially the same security incidents, which could possibly lead to the following:

Incident duplication
A higher load on your integration endpoints
Possible significant database inconsistencies due to duplication of internal identifiers being shared between nodes and causing existing incidents to be overwritten.

Tip

If there is ever uncertainty about whether a host that is presently down or stopped was in an active state before it went offline, it is recommended that you put the presently active host into a standby state before starting the Cortex XSOAR service on the other host. You can then make it active again after you have confirmed whether the host you are starting is already in active mode.

To configure the live backup environment, see Configure the Live Backup Environment.

The following scenarios describe how to test, and deal with active server failures:

When you first install the Cortex XSOAR server and it starts for the first time, you can use a configuration file to transition between DR states, as described in Transition Between DR States Through the Configuration File.

If you need to upgrade your live backup environment, see Upgrade the Live Backup Environment.

For details about the relationship between engines and disaster recovery, see Engines and Disaster Recovery. For information about host names, DNS, and disaster recovery, see Host Names, DNS, and Disaster Recovery.

Troubleshoot Live Backup

If you receive an out of memory error when live backup is enabled, consider changing the server configurations for disaster recovery.

Select Settings → ABOUT → Troubleshooting → Add Server Configuration.

Add the following configurations

Key	Description	Value
`dr.batch.size`	Controls the number of actions sent to the disaster recovery server in one request. A very high value can cause memory issues. A very low value can cause performance issues, which causes the backup server to be synced slower (not in real-time). It is recommended to start low (25-50) and increase according to memory usage.	Default is `300`
`dr.memory.limit.mb`	Limits the memory size (in MB) of the action items, which should prevent out of memory errors. `dr.batch.size` and `dr.memory.limit.db` work together, so the threshold is reached when the limitation of either configuration is met. If you receive an out of memory error, consider reducing to 100.	Default is `300`
`dr.queue.size`	The total number of actions to keep in memory before entering recovery mode. It is recommended to keep the default number, as it is relative to the size of the `dr.batch.size` configuration.	Default is *`10dr.batch.size`**

Key

Description

Value

dr.batch.size

Controls the number of actions sent to the disaster recovery server in one request. A very high value can cause memory issues. A very low value can cause performance issues, which causes the backup server to be synced slower (not in real-time).

It is recommended to start low (25-50) and increase according to memory usage.

Default is 300

dr.memory.limit.mb

Limits the memory size (in MB) of the action items, which should prevent out of memory errors.

dr.batch.size and dr.memory.limit.db work together, so the threshold is reached when the limitation of either configuration is met.

If you receive an out of memory error, consider reducing to 100.

Default is 300

dr.queue.size

The total number of actions to keep in memory before entering recovery mode. It is recommended to keep the default number, as it is relative to the size of the dr.batch.size configuration.

Default is 10*dr.batch.size