Create a Machine Learning Model

Create a machine learning (ml) model in Cortex XSOAR to predict the classification of phishing incidents.
A machine learning model enables Cortex XSOAR to predict the classification of phishing incidents. For example, whether the incident should be classified as legitimate, malicious, or spam. You can use these models in conjunction with your default investigation playbooks, or run commands separately in the War Room. It is usually used for training a model to predict the classification of a phishing incident. The main goal of the machine learning model is leveraging past phishing incidents to assist with the investigation of future incidents.
  1. Select
    Settings
    Advanced
    ML Models
    New Model
    .
  2. Define the Incidents Training Set Scope.
    1. In the
      Model name
      field, type the name of the model that you want to create.
    2. (
      Optional
      ) In the
      Description
      field, type a meaningful description for the model.
    3. To choose which incidents are to be used for training the model, in the
      Incident type
      field, from the drop down list, select the type of incident for training, such as Phishing.
    4. Select the date range from which incidents will be used for the training set. The more incidents, the better the expected results. It is recommended to use a longer period.
    5. In the
      Maximum number of incidents to test
      field, type the number of incidents that will be used to train the model.
      Reduce the number only if the number of incidents is too large and causes performance problems. Use a higher number if you have more samples in your environment. Default is 3000.
  3. Select the field for which you want the model to learn to predict.
    1. In the
      Incident field
      from the drop down list, select the relevant field.
      The Incident Field (classification field), stores the classification of the incident. This is a single select field, where the classification or the closed reason of incidents are stored. The out of the box fields are “Email Classification” or “Close Reason”, but you can use any other custom field.
      After selecting the Incident field in the
      Field Values
      field, you can see the different values of classifications and the number of values across the selected incidents scope of incidents.
  4. Set the final classification values.
    1. In the
      Verdict
      columns, define the names of the verdicts for mapping your existing classification values.
      This stage allows you to control which incidents’ classifications will be used in the training, and also merge multiple classifications into a single category. Verdict is a group of classifications values, for which each verdict includes one classification or more. The trained model predicts each new incident as one of those verdicts.
    2. Map your data by associating the verdict with your defined classification values by dragging and dropping the
      Field Values
      into the respective
      Verdict
      fields.
      Where values remain in the
      Field Values
      column, their corresponding incidents are not involved in the training. You may want to leave classifications such as
      Undetermined
      ,
      Internal Phishing Test
      , or any other classifications that you do not want to participate in the training. For example:
      It is possible to drag multiple classifications values into a single verdict. If so, the model treats all the classification values under the same verdict as if they had the same classification. This allows you to better define the prediction task of the model and merge some smaller groups into a single group.
      This might be helpful if you have different subtypes of classifications. For example, if you have classification values of Spear Phishing, Malware, and Ransomware, you may want to map them all into a single verdict called Phishing. If you want to have a model which distinguishes between one classification and the rest (for instance, if you want to train a model which distinguishes between phishing and the rest of the classifications, you can map all other classifications other than phishing into a single verdict called “Non-Phishing”). In the following example we have 2 verdicts, one has phishing, the other has everything other than phishing:
      You can have 2-3 different verdicts, where each verdict needs a minimum of 50 incidents for each. For an example. see Machine Learning Model Example
  5. (
    Optional
    ) Change the fields where the email body and email subject are stored in the incident.
    1. In the
      Argument Mapping
      select the equivalent fields for Email body, Email HTML and Email subject.
      By default, training is done based on the Email body, Email HTML, and Email subject.
  6. Train the model by clicking
    Start Training
    .
    You will be redirected back to the Machine Learning Models page, and the training process takes several minutes (it is possible to close the page).
    If training is completed successfully, the percentage scores appear, which reflect the precision of the model of the different verdicts.
  7. (
    Optional
    ) View detailed performance information of the model.
    1. Expand the results information by clicking
      +
      to the model name.
    2. View a detailed evaluation, by clicking
      Evaluation of model performance
      .
      A window opens showing a detailed evaluation of the model, which enables you to decide whether and how to use the trained model. You can see a detailed breakdown, showing what is the expected performance of the model for each class, displaying different metrics, such as precision, coverage, suggestions for applying a confidence threshold, etc.
      If using the phishing incident type, you can now use model in the machine learning or War Room window or in the playbook. For more information, see Machine Learning Models Overview.

Recommended For You