The Log Retention Trap: Why Your Security Data is Costing Too Much

Long-term log storage is critical for incident response and compliance, but traditional indexing tools like Elasticsearch make it cost-prohibitive at scale.

“You don’t need the logs until you really need the logs. By then, if they’re gone or unsearchable, the game is already over.”

The Long Tail of Security Incidents

The average dwell time for a sophisticated breach is often measured in months, not days. When an incident is finally detected, the first question is always: “How far back does this go?”

If your retention is capped at 30 or 90 days because of storage costs, you are flying blind. To perform proper root-cause analysis and understand the full scope of a lateral movement or data exfiltration, you need access to audit logs and telemetry from 6, 12, or even 24 months ago.

Beyond incident response, long-term retention is a hard requirement for compliance frameworks like SOC 2, HIPAA, and PCI-DSS. But there’s a problem: storing that much data using traditional methods is expensive.

The “Elastic Tax”: Why Indexing Breaks at Scale

For years, the industry standard for log storage has been the ELK stack (Elasticsearch, Logstash, Kibana) or OpenSearch. While these are great for searching text, they are fundamentally not optimized for long-term security telemetry at scale.

The Problem with Inverted Indexes

Elasticsearch works by creating an inverted index for every field. This means for every log line you ingest, the system does a massive amount of “pre-work” to make every single word searchable.

This leads to three major issues:

  1. Storage Bloat: The index itself can often take up as much space (or more) as the raw data. You end up paying for 2-3x the storage you actually need.
  2. Heavy RAM Requirements: Elasticsearch needs a lot of memory to keep those indexes performant. In a cloud environment, high-RAM instances are among the most expensive.
  3. The Re-indexing Nightmare: As your data grows, you eventually have to re-index old data or deal with massive “shards” that slow down the entire cluster.

When you’re ingesting gigabytes or terabytes a day, the cost of running an OpenSearch cluster doesn’t just grow linearly—it explodes.

Optimized Storage: The Columnar Advantage

To store logs for a long time without breaking the bank, you need a different architecture. Modern security platforms (including Xpernix) have moved toward columnar storage (like ClickHouse).

Instead of indexing every word, a columnar database stores data by field. If you want to search for a specific user_id across a billion rows, the system only reads the user_id column from the disk.

FeatureElasticsearch / OpenSearchColumnar (e.g., ClickHouse)
Storage EfficiencyLow (heavy indexing)High (90% compression)
Hardware CostHigh (High RAM/CPU)Low (Disk-heavy, low RAM)
Search SpeedFast for full-textFast for structured queries
ScalingComplex & ExpensiveSimple & Linear

Final Thought

Security data is only useful if it’s accessible. If you’re deleting logs to save money, or if your “warm” storage costs are eating your entire security budget, your SIEM is failing you. By moving away from the “index-everything” model to optimized columnar storage, you can keep the data you need for as long as you need it—without the “Elastic tax.”

Contact us if you’re ready to stop overpaying for your security logs.