Train a Classifier on Languages with Adjusted Tokenization
ml machine learning
Cortex XSOAR allows you to customize automations
and playbooks to support phishing classifiers for languages other
than English. Cortex XSOAR offers adjusted tokenization for the
following languages:
German
French
Spanish
Portuguese
Italian
Dutch
You need to configure the following
automations and playbooks:
DBotPreProcessTextData
WordTokenizerNLP
DBotPredictPhishingWords
DBot Create Phishing Classifier V2
Go to
Automation
.
Configure
the language for
DBotPreProcessTextData
.
Copy the
DBotPreProcessTextData
automation,
by selecting
Duplicate Automation
.
(Optional)
Change the name of the duplicated
automation to make it distinguishable.
From the
Advanced
section,
in the
Docker image name
field, type
demisto/dl:languages1.0
.
In the
Arguments
section, expand
the
language
argument.
In the
Initial value
field,
change the language to train the classifier.
Click
Save Version
.
Configure the language for
WordTokenizerNLP
.
Copy the
WordTokenizerNLP
automation,
by selecting the
Duplicate Automation
.
(Optional)
Change the name of the duplicated
automation to make it distinguishable.
From the
Advanced
section,
in the
Docker image name
field, type
demisto/dl:languages1.0
.
In the
Arguments
section, expand
the
language
argument.
In the
Initial value
field,
change the language to train the classifier.
Click
Save Version
.
Configure the language for
DBotPredictPhishingWords
.
Copy the
DBotPredictPhishingWords
automation
by selecting
Duplicate Automation
.
(Optional)
Change the name of the duplicated
automation to make it distinguishable.
From the
Advanced
section,
in the
Docker image name
field, type
demisto/dl:languages1.0
.
In the
Arguments
section, expand
the
language
argument.
In the
Initial value
field
change the language to train the classifier.
Click
Save Version
.
Go to
Playbooks
.
Search for
DBot Create Phishing Classifier V2
to
update the playbook.
Copy the playbook, by selecting
Duplicate
Playbook
.
Select the
Pre-process file
task.
From the drop down menu, replace the automation with
the duplicated version of