What is LiquidCache?

A caching layer to unify compute and storage.
Published

November 24, 2025

Modified

November 24, 2025

WarningAcknowledgments

This work is supported by funding from:
1.  InfluxData, Bauplan, and SpiralDB.
2.  The taxpayers of the State of Wisconsin and the federal government.

Your support for science is greatly appreciated!

LiquidCache is a caching layer that unifies the design goals of compute and storage1.

1 Check out our research paper (VLDB 2025) for more technical details.

It accelerates query performance without needing to leave Parquet.

It addresses this fundamental tension:

Instead of squeezing the last bits of performance from Parquet2, or trying to create future-proof file formats3, LiquidCache addresses this problem through a new abstraction: the caching layer.

LiquidCache overview. It caches different object store sources and serves different analytical applications.

It is built on open standards: Parquet for data storage, DataFusion as the query engine, and Arrow Flight for data transfer. This makes LiquidCache highly composable – you can easily integrate it into your existing analytics stack.

Why LiquidCache?

We like S3

  1. Simple durability: 11 nines of durability—you never have to worry about data loss.
  2. Simple scalability: virtually unlimited space and throughput.

But S3 is slow and expensive

  1. ≈100 ms first‑byte latency plus transfer latency; this quickly adds up when multiple round‑trips are needed to fetch data.4
  2. Storage, request, and data‑transfer/egress costs; prices have remained largely unchanged for a decade even as underlying hardware has become ~20× cheaper.

S3 prices have barely changed for a decade, despite ~20× reductions in underlying hardware costs, credit to Andrew Lamb

LiquidCache: foundation of diskless architectures

  1. Caches are everywhere5: compute‑local caches (e.g., Snowflake/Databricks local NVMe, Spark host caches)6, shared‑nothing caches, and cache services7.
  2. DLC trilemma: among durability, low latency, and low cost, you can only choose two8.

How LiquidCache Works

We like Parquet

  1. All major query engines support it (DataFusion, Spark, Trino, DuckDB, Snowflake, BigQuery, and more).
  2. It is battle‑tested and keeps evolving (e.g., page indexes, new encodings).
  3. It is under open, stable governance (Apache Software Foundation), so your data is in good hands.

But sometimes we want more aggressive performance

  1. There are better encodings and compression schemes out there.
  2. Parquet is critical data infrastructure: it evolves cautiously to keep your data safe and stable—it can’t try new research today and abandon your data tomorrow.

LiquidCache: cache-only, pushdown-optimized data representation

  1. LiquidCache uses state‑of‑the‑art encodings and compression chosen by the workload.9
  2. Liquid data is invisible to the rest of the ecosystem: it is cache‑only. This means it can freely change its layout, adding or removing encodings without breaking any user code.
  3. LiquidCache transparently, progressively, and selectively transcodes Parquet data to the liquid format.
  4. Liquid data is designed for efficient pushdown to save both compute and network resources.

9 The liquid format is heavily inspired by Vortex. We plan to support a Vortex backend in the future.

Without any changes to Parquet, LiquidCache takes care of the performance optimizations.

Conclusions

LiquidCache is the one‑stop shop for diskless, serverless, and pushdown‑native analytics.

It is built on open standards (Parquet, Arrow Flight, DataFusion) for easy integration and stable governance.

LiquidCache caches Parquet as liquid data, which is ultra-optimized for compute pushdown, compressed execution, modern storage, and network‑efficient data transfer.

Who are we?

  • LiquidCache started as a research project led by Xiangpeng Hao at UW‑Madison ADSL.
  • It was made possible by a research gift from InfluxData. One year later, SpiralDB and Bauplan also joined the journey.10
  • LiquidCache will remain a public‑benefit project in appreciation of the support from taxpayers, research gifts, and the open‑source community.

10 Support our research here!