Cortex Data Lake Calculator






Cortex Data Lake Calculator – Estimate Your Storage & Ingestion Needs


Cortex Data Lake Calculator

Accurately estimate log ingestion rates and storage capacity for Palo Alto Networks Cortex Data Lake.


Total number of NGFWs or Prisma Access instances logging to CDL.
Please enter a valid number of devices.


Typical range: 50 (Small Branch) to 2000+ (Data Center).
Please enter a valid LPS value.


Number of days you need to store logs for compliance or operations.
Retention must be at least 1 day.


Complex environments with many fields generate larger log entries.


Total Storage Required
2.42 TB
Daily Ingestion Rate:
82.39 GB/Day
Total Monthly Logs:
2.59 Billion
Avg. Bandwidth Required:
7.63 Mbps

Storage Growth Projection (30 Days)

Visual representation of cumulative storage consumption over the selected retention period.

Estimated Cortex Data Lake Capacity Benchmarks
Scenario Device Count Avg LPS Retention Total Storage (TB)
Small Business 2 50 30 Days 0.25 TB
Mid-Sized Enterprise 25 250 90 Days 48.54 TB
Large Data Center 50 1,500 30 Days 181.89 TB

What is the Cortex Data Lake Calculator?

The cortex data lake calculator is a critical tool for network security engineers and IT architects tasked with sizing Palo Alto Networks’ cloud-native logging solution. As organizations move away from on-premise Panorama Log Collectors, understanding how much data is ingested and how much storage is consumed becomes vital for both budgetary planning and compliance adherence.

Using a cortex data lake calculator allows you to input specific metrics—such as the number of Next-Generation Firewalls (NGFW), average Logs Per Second (LPS), and retention requirements—to determine the exact TB (Terabyte) license required. This prevents over-provisioning (wasting money) or under-provisioning (losing critical historical log data).

Cortex Data Lake Calculator Formula and Mathematical Explanation

The math behind log estimation involves converting events into data volumes over time. Here is the step-by-step derivation used by this cortex data lake calculator:

  1. Logs Per Day: Calculate the total number of logs generated daily by all devices.

    Total Logs = Device Count × Average LPS × 86,400 (seconds in a day)
  2. Daily Ingestion (Bytes): Multiply the logs by the average log size.

    Daily Bytes = Total Logs × Log Size (in Bytes)
  3. Convert to Gigabytes:

    Daily GB = Daily Bytes / (1024^3)
  4. Total Retention Storage (Terabytes):

    Total TB = (Daily GB × Retention Days) / 1024
Variable Meaning Unit Typical Range
LPS Logs Per Second Events/Sec 10 – 10,000
Retention Time data is stored Days 30 – 365
Log Size Metadata weight Bytes 500 – 1,500
Ingestion Data flow rate GB/Day 1 – 5,000

Practical Examples (Real-World Use Cases)

Example 1: Regional Retail Chain

A retail chain has 50 small branch firewalls. Each firewall averages 20 LPS. They require 90 days of retention for PCI-DSS compliance. Using the cortex data lake calculator, we find:

  • Total LPS: 1,000
  • Daily Ingestion: ~82.4 GB
  • Total Storage Required: 7.24 TB

Example 2: High-Traffic Financial Institution

A bank uses 10 large data center firewalls, each pushing 5,000 LPS due to high transaction volume. They only need 30 days of active retention in the lake. The cortex data lake calculator results in:

  • Total LPS: 50,000
  • Daily Ingestion: ~4,119 GB
  • Total Storage Required: 120.69 TB

How to Use This Cortex Data Lake Calculator

Following these steps ensures the most accurate results from the cortex data lake calculator:

  • Step 1: Inventory Your Devices: Count all firewalls, Prisma Access nodes, and Cortex XDR agents that will send logs.
  • Step 2: Determine Average LPS: Look at your current Panorama or local firewall statistics during peak hours to get a realistic Logs Per Second average.
  • Step 3: Define Retention: Check your internal data retention policies or industry regulations (e.g., HIPAA, GDPR).
  • Step 4: Select Log Size: If you use heavy decryption and user-id features, select “Detailed” for better accuracy.
  • Step 5: Review Results: Check the “Bandwidth Required” to ensure your internet circuits can handle the outbound log traffic.

Key Factors That Affect Cortex Data Lake Calculator Results

  1. Traffic Volume: High-bandwidth connections usually generate more session logs.
  2. Security Profiles: Enabling more threat inspections (Antivirus, Vulnerability, URL Filtering) increases the number of “Threat” logs per session.
  3. SSL Decryption: Decrypting traffic allows the firewall to see more details, often leading to more verbose logging.
  4. Log Suppression: Configuring firewalls to suppress redundant logs (like frequent DNS queries) can significantly reduce ingestion rates.
  5. GlobalProtect Usage: High remote user counts generate significant Connection and Host Information Profile (HIP) logs.
  6. IoT Security: Enabling IoT security subscriptions adds specific metadata logs that can increase daily storage needs.

Frequently Asked Questions (FAQ)

1. Is Cortex Data Lake the same as Panorama logging?

No, Cortex Data Lake is a cloud-based service, while Panorama logging is typically on-premise. This cortex data lake calculator specifically estimates cloud storage needs.

2. What happens if I exceed my calculated storage?

When storage limits are reached, the Cortex Data Lake typically overwrites the oldest logs first (First-In, First-Out). Accurate sizing ensures you meet your compliance window.

3. Does this calculator account for Cortex XDR data?

Yes, but you must include XDR agent log rates in your LPS input. XDR logs are often more frequent than firewall logs.

4. How does log compression affect the result?

Cortex Data Lake automatically compresses data. The “Standard” log size in our cortex data lake calculator already accounts for typical cloud compression ratios.

5. Can I use this for Prisma Access sizing?

Absolutely. Prisma Access uses CDL for all logging, making this tool essential for Prisma deployments.

6. Why is bandwidth consumption important?

Log traffic is overhead. If your daily ingestion is high, you must ensure your uplink bandwidth can support the constant stream of logs without impacting production traffic.

7. Does the 1TB license mean 1TB of logs per day or total?

In CDL licensing, it usually refers to total storage capacity, though some older models used ingestion-based licensing. This tool calculates total capacity.

8. Are logs encrypted in the Data Lake?

Yes, all data in Cortex Data Lake is encrypted at rest and in transit, which does not affect the storage calculation significantly.

Related Tools and Internal Resources

© 2023 Security Infrastructure Tools. All calculations are estimates based on standard industry metrics.


Leave a Comment