PAN-DB Categorization

URL filtering works by comparing requested URLs to block and allow lists, custom categories, the dataplane cache, the management plane cache, and then PAN-DB.
When a user requests a URL the firewall determines the URL category by comparing the URL with the following components (in order) until it finds a match:
If a requested URL matches an expired entry in the dataplane (DP) URL cache, the cache responds with the expired category, but also sends a URL categorization query to the management plane (MP) cache. This prevents unnecessary delays in the DP, assuming that the frequency of category change is low. Similarly, in the MP URL cache, if a URL query from the DP cache matches an expired entry in the MP cache, the MP responds to the DP with the expired category and will also send a URL categorization request to PAN-DB, the URL category database. Upon getting the response from PAN-DB, the firewall sends the updated category to the DP.
As new URLs and categories are defined or if critical updates are needed, PAN-DB is updated. Each time the firewall queries PAN-DB for a URL lookup or if no cloud lookups have occurred for 30 minutes, the database versions on the firewall are compared and if they do not match, an incremental update will be performed.
The following table describes the PAN-DB components in detail.
URL Filtering Seed Database
The initial seed database downloaded to the firewall is a small subset of PAN-DB. This is done because the full database contains millions of URLs, and many of these URLs may never be accessed by your users. When downloading the initial seed database, you select a region (North America, Europe, APAC, Japan). Each region contains a subset of URLs most accessed for the given region. This allows the firewall to store a much smaller URL database for better URL lookup performance. If a user accesses a website that is not in the local URL database, the firewall queries PAN-DB and then adds the new URL to the local database. This way the local database on the firewall is continually populated/customized based on actual user activity.
Re-downloading the PAN-DB seed database will clear the local database.
See Table 1, for information on the private cloud.
The PAN-DB cloud service is implemented using Amazon Web Services (AWS). AWS provides a distributed, high-performance, and stable environment for seed database downloads and URL lookups for Palo Alto Networks firewalls and communication is performed over SSL. The AWS cloud systems hold the entire PAN-DB and is updated as new URLs are identified. PAN-DB supports an automated mechanism to update the local URL database on the firewall if the version does not match. Each time the firewall queries the cloud servers for URL lookups, it will also check for critical updates. If there have been no queries to the cloud servers for more than 30 minutes, the firewall will check for updates on the cloud systems.
The cloud system also provides a mechanism to submit URL category change requests. This is performed through the test-a-site service and is available directly from the firewall (URL filtering profile setup) and from the Palo Alto Networks Test A Site website. You can also submit a URL categorization change request directly from the URL filtering log on the firewall in the log details section.
Management Plane (MP) URL Cache
When you activate PAN-DB on the firewall, the firewall downloads a seed database from PAN-DB to initially populate the local cache for improved lookup performance. Each regional seed database contains the top URLs for the region and the size of the seed database (number of URL entries) also depends on the platform. The URL MP cache is automatically written to the local drive on the firewall every eight hours, before the firewall is rebooted, or when the cloud upgrades the URL database version on the firewall. After rebooting the firewall, the file that was saved to the local drive will be loaded to the MP cache. A least recently used (LRU) mechanism is also implemented in the URL MP cache in case the cache is full. If the cache becomes full, the URLs that have been accessed the least will be replaced by newer URLs.
Dataplane (DP) URL Cache
This is a subset of the MP cache and is a customized, dynamic URL database that is stored in the DP and is used to improve URL lookup performance. The URL DP cache is cleared at each firewall reboot. The number of URLs that are stored in the URL DP cache varies by hardware platform and the current URLs stored in the TRIE (data structure). A least recently used (LRU) mechanism is implemented in the DP cache in case the cache is full. If the cache becomes full, the URLs that have been accessed the least will be replaced by newer URLs. Entries in the URL DP cache expire after a specified period of time; this expiration period is not configurable.

Recommended For You