Estuary

Open SourceAPI

Estuary Flow is an open source real-time data integration platform for building low-latency CDC pipelines between databa

estuary.dev

Last updated: April 2026

Estuary Flow is an open source real-time data integration platform for building low-latency CDC pipelines between databases, APIs, and data warehouses.

Visit Website

11views

DatabasesOpen Source Enterprise Developer-Focused API-First

About

Estuary Flow is a real-time data integration platform designed for building low-latency data pipelines that continuously capture changes from source systems and deliver them to destinations in near real time. Built on open source foundations and available as both a self-hosted tool and a managed cloud service, Estuary addresses the gap between batch ETL tools and complex custom streaming infrastructure.

The core technology in Estuary is change data capture (CDC), which tracks changes to source databases at the transaction log level rather than scanning tables periodically. By reading the PostgreSQL WAL, MySQL binlog, or other database change streams, Estuary captures every insert, update, and delete as it happens, enabling downstream systems to stay synchronized with minimal latency. This CDC approach is far more efficient and timely than batch polling and avoids the table-locking overhead of bulk exports.

Estuary Collections are the central concept for data storage and routing. A collection is a persistent, append-only log of data that captures all historical changes from a source. Collections are stored in cloud object storage (S3, GCS) in a structured format, making them both durable and queryable by multiple consumers. This architecture separates the capture of data from its delivery, enabling multiple destinations to consume from the same collection independently at their own pace.

The connector ecosystem covers a wide range of data sources and destinations. Source connectors include PostgreSQL, MySQL, SQL Server, MongoDB, Salesforce, HubSpot, Stripe, Google Analytics, and others. Destination connectors include Snowflake, BigQuery, Redshift, Databricks, Elasticsearch, and others. New connectors are added regularly based on community and customer demand.

Transformations in Estuary use a declarative data flow language for remapping, filtering, and reshaping data between collections. For more complex transformations, TypeScript or SQL transformations can be applied within the pipeline. This transformation capability allows data to be prepared for each destination's specific schema requirements.

The managed Estuary Cloud service provides a fully hosted pipeline infrastructure with a web console for managing connectors, monitoring pipeline health, and configuring transformations. Pricing is based on data volume, making it accessible for organizations with varying data throughput requirements.

Positioning

Estuary is the real-time data integration platform that unifies ETL and ELT into a single system capable of moving data with millisecond latency. While traditional data integration tools like Fivetran and Airbyte operate in batch mode — syncing data every hour or every few minutes at best — Estuary Flow captures changes as they happen using CDC (Change Data Capture) and streams them continuously to destinations. This makes it possible to build real-time analytics, operational dashboards, and event-driven architectures without separate streaming infrastructure.

Built on Gazette, an open-source distributed streaming framework, Estuary Flow provides the reliability of batch ETL with the speed of event streaming. The platform handles schema evolution, exactly-once delivery, and automatic backfills — problems that typically require significant engineering effort when building streaming pipelines from scratch with tools like Kafka Connect or Debezium. For data teams that need fresher data without the complexity of managing streaming infrastructure, Estuary bridges the gap.

What You Get

Real-Time CDC
Change Data Capture from PostgreSQL, MySQL, MongoDB, SQL Server, and more with millisecond latency and minimal impact on source database performance.
200+ Connectors
Pre-built connectors for databases, SaaS APIs, cloud storage, data warehouses, and streaming platforms with automatic schema detection and evolution handling.
Flow Runtime
Distributed streaming engine that processes data in real time with exactly-once semantics, automatic scaling, and built-in schema management.
Transformations
In-stream data transformations using SQL or TypeScript, supporting filtering, mapping, joining, and aggregating data before it reaches the destination.
Automatic Backfills
Seamless historical data loading that automatically backfills existing data before switching to real-time streaming, ensuring complete datasets in destinations.
Materialized Views
Continuously updated views that maintain real-time aggregations, joins, and transformations — combining streaming processing with materialized query results.

Core Areas

Real-Time Data Integration

Continuous data movement from operational databases and SaaS applications to warehouses, lakes, and streaming platforms with millisecond latency.

Streaming ETL/ELT

In-stream transformations that clean, enrich, and reshape data in real time, eliminating the latency of traditional batch-then-transform pipelines.

Operational Analytics

Enable real-time dashboards, alerting, and decision-making by keeping analytical destinations continuously synchronized with operational data sources.

Why It Matters

The data industry has a latency problem. Most organizations move data in hourly or daily batches, meaning their analytics are always stale and their automated systems react to yesterday's reality. Building real-time pipelines with Kafka, Debezium, and custom code requires specialized streaming engineering skills that most data teams don't have. Estuary makes real-time data integration accessible without the infrastructure complexity.

For data teams, Estuary means freshening analytics from hours-old to seconds-old without rearchitecting their entire data stack. The same connectors and transformations that feed a data warehouse can simultaneously feed a real-time operational system, eliminating the need to maintain parallel batch and streaming infrastructure.