Automatic De-Duplication Using Scripts

Automate de-duplication of incidents using scripts. Identify and close duplicate incidents in Cortex XSOAR.
There are various scripts you can use in automations and playbooks to identify and close duplicate incidents:
  • FindSimilarIncidentsByText
  • FindSimilarIncidents
  • GetDuplicatesMI

FindSimilarIncidentsByText

  • Identifies similar incidents based on text similarity. For this script you specify incident keys, labels, or custom fields.
  • The comparison is based on the TF-IDF method.
  • A score is calculated for each candidate (0-1), and incidents are considered duplicates when exceeding the threshold. The default threshold is 98%.
!FindSimilarIncidentsByText textFields=name,details maximumNumberOfIncidents=1000 threshold=0.95 timeFrameHours=24 ignoreClosedIncidents=no
This command example checks for duplicate incidents using the following methodology:
  1. Query for duplicate candidates:
    • Incidents created in the previous 24 hours [timeFrameHours=24].
    • Includes closed incidents [ignoreClosedIncidents].
    • Maximum number of incidents to check is 1,000 [maximumNumberOfIncidents=1000].
  2. For each candidate, concatenate name and details incident fields [textFields=name,details] into a text document.
  3. Compare the current incident text with all candidates using the TF-IDF method
  4. Check if there is at least one similar candidate:
    • Candidates with a TF-IDF score of 95% [threshold=0.95]. If there is at least one candidate, announce duplicate.

FindSimilarIncidents

  • Rule-based script that identifies similar incidents based on common incident keys, labels, custom fields, or context keys.
  • We recommend using incident keys, for example, "type" for same incident type.
  • Due to performance considerations, we recommend not using context keys, for example, if the value also appears in the label key. Each duplicate candidate creates an additional server query.
!FindSimilarIncidents similarIncidentKeys="type,severity" similarLabelsKeys="Email/from,Email/subject:*,Email/text:5" ignoreClosedIncidents="yes" maxNumberOfIncidents="1000" hoursBack="48" timeField="created" maxResults="10"
This command example checks for duplicate incidents using the following methodology:
  1. Query for duplicate candidates:
    • Incidents created in the 48 hours [hoursBack="48", timeField=created] before the original incidents
    • Excludes closed incidents [ignoreClosedIncidents=yes]
    • Maximum number of incidents to check is 1,000 [maxNumberOfIncidents=1000]
    • Filters by the same incident type and severity [similarIncidentKeys=type,severity]
  2. Check for candidate with the same Email/from label, or similar Email/subject label:
    • Contains, or contained, the original incident Email/subject label, and similar Email/text label
    • Equal or a maximum difference of 5 words from the original Email/text label [similarLabelsKeys="Email/from,Email/subject:*,Email/text:5"]
  3. If duplicate incidents are found, store the results in the context:
    • Maximum of 10 [maxResults="10"]

GetDuplicatesMI

  • Identifies duplicate incidents based on a machine learning (ML) algorithm, which uses ML techniques with predefined data. Alternatively, you can use data from the local environment.
  • This script takes several features into consideration: labels comparison, email labels (relevant for phishing scenarios), incident time difference, and shared indicators (which you can customize with arguments).
!GetDuplicatesMl maxNumberOfIncidents="1000" timeFrameDays="7" ignoreClosedIncidents="yes" threshold="0.5" compareIndicators="Email, IP, Domain, File SHA256, File MD5, URL" compareEmailLabels="Email/headers/From, Email/headers/Subject, Email/text, Email/html, Email/attachments" compareOtherLabels="yes" compareIncidentTimeDiff="yes" UseLocalEnvDuplicatesInLastDays="0" ipComparisonSubnetMask="32" maxCandidates="10"
This command example checks for duplicate incidents using the following methodology.
  1. Query for duplicate candidates:
    • Incidents created in the 7 days [timeFrameDays="7"] before the original incidents
    • Excludes closed incidents [ignoreClosedIncidents=yes]
    • Maximum number of incidents to check is 1,000 [maxNumberOfIncidents=1000]
  2. For each candidate calculate features based on similar email labels and other labels:
    • Email labels [compareEmailLabels, compareOtherLabels]
    • Indicators [compareIndicators, ipComparisonSubnetMask]
    • Time difference between the incidents [compareIncidentTimeDiff]
  3. Build machine learning model based on pre-defined data set:
    • Does not take into account local environment data set - linked and duplicate incidents in the system [UseLocalEnvDuplicatesInLastDays=0]
  4. Predict whether each candidate is a duplicate:
    • Prediction is based on a score (probability) between 0-1

Recommended For You