OpenXData 2025 Highlights

2025 Speakers

Amit Dutta Photo
Amit Dutta
Software Engineer
Meta Logo
Amaresh Bingumalla Photo
Amaresh Bingumalla
Senior Data Platform Engineer
Peloton Interactive Inc. Logo
Joseph Machado Photo
Joseph Machado
Senior Data Engineer
Netflix Logo
Shelby Heinecke Headshot
Shelby Heinecke
Senior AI Research Manager
Salesforce
Rajwardhan Singh Photo
Rajwardhan Singh
Engineering Manager
ZOOM (ZM) Logo
Aditi Pandit Photo
Aditi Pandit
Principal Engineer
IBM Logo
Aakash Pradeep headshot
Aakash Pradeep
Principal Software Engineer
Twilio Logo
Vinoth Chandar Photo
Vinoth Chandar
‍CEO
Onehouse Logo
Jason Liu Photo
Jason Liu
Senior Software Engineer
Amazon Logo
Abhishek Srinivasa Raju Padmavathi Headshot
Abhishek Srinivasa Raju Padmavathi
Senior Software Development Engineer
Amazon logo
Victor Agababov Photo
Victor Agababov
Software Engineer
Google Logo
Nisha Paliwal Headshot
Nisha Paliwal
Author, Advisor to C level
AI value secret logo
Koti	Darla Headshot
Koti Darla
Tech Lead Data Engineer
Southwest Airlines Logo
Arturas Tutkus Headshot
Arturas Tutkus
Engineering manager
KAYAK logo
Nikhil Simha Photo
Nikhil Simha
CTO
Zipline AI Logo
Dipti Borkar's picture
Dipti Borkar
VP & GM
Microsoft's logo
Josh Caplan Headshot
Josh Caplan
Director of Product
Microsoft logo
Binwei Yang Photo
Binwei Yang
Principal Software Engineer
IBM Logo
Jing Li's image
Jing Li
Senior Staff Software Engineer
Uber's logo
Image of Matthew Topol
Matthew Topol
Co-Founder
Columnar Logo
Kyle Weller Photo
Kyle Weller
Head of Product
Onehouse Logo
Jonathan Rau Headshot
Jonathan Rau
VP/Distinguished Engineer
Query
Aishwarya Ramasethu Photo
Aishwarya Ramasethu
AI Engineer
Prediction Guard Logo
Sida Shen Photo
Sida Shen
Product Manager
CelerData Logo
Ciro Greco Photo
Ciro Greco
Co-founder and CEO
Bauplan Logo
Kasun Indrasiri Gamage's photo
Kasun Indrasiri Gamage
Senior Product Manager
Confluent's logo
Yashwanth Dasari's photo
Yashwanth Dasari
Senior Manager, Product Marketing & GTM Strategy
Confluent's logo
Jonathan's image
Jonathan Brito
Staff Product Manager
Databricks' logo
Amy Chen headshot
Amy Chen
Staff Product Manager
dbt Labs Logo
Jaikumar Ganesh Photo
Jaikumar Ganesh
Head of Engineering
Anyscale Logo
Pushkar Garg Photo
Pushkar Garg
Staff Machine Learning Engineer
Clari Logo
Chandra Krishnan headshot
Chandra Krishnan
Solutions Engineer
Onehouse Logo

Select Keynotes

9:30AM
  –  
9:50AM
 PST
Adopting a 'horses for courses' approach to building your data platform
Today's data platforms too often start with an engine-first mindset...
9:30AM
  –  
9:50AM
 PST
Adopting a 'horses for courses' approach to building your data platform

Today's data platforms too often start with an engine-first mindset, picking a compute engine and force-fitting data strategies around it. This approach seems like the right short-term decision, but given the gravity data possesses, it ends up locking organizations into rigid architectures, inflating costs, and ultimately slowing innovation. Instead, we must flip the model: by putting open, interoperable data at the heart of the data platform, and selecting specialized engines as needed, for e.g., Apache Flink for Stream processing and Ray for Machine Learning. A 'horses for courses' approach acknowledges that no single engine is best for every workload, and embraces a modular, future-ready architecture from the ground up.

This talk will make the case for a radical but proven idea: treat your data as a first-class citizen, and treat compute engines as interchangeable tools. We'll explore real-world examples where decoupled data strategies have allowed companies like LinkedIn, Uber and Netflix to evolve quickly across generations of technologies, and discuss practical strategies to avoid the endless migration treadmill. We will illustrate this using real-world comparisons of compute engines across key workloads, such as analytics, data science, machine learning, and stream processing.

Vinoth Chandar
CEO
,  
Onehouse
,  
,  
,  
Watch now
10:35AM
  –  
10:55AM
 PST
Unity Catalog and Open Table Format Unification
Open table formats, including Delta Lake, Apache Hudi™, and Apache Iceberg™, have become the leading industry...
10:35AM
  –  
10:55AM
 PST
Unity Catalog and Open Table Format Unification

Open table formats, including Delta Lake, Apache Hudi™, and Apache Iceberg™, have become the leading industry standards for Lakehouse storage. As these formats have grown in popularity, so has the importance of catalogs, which are responsible for managing reads and writes to tables. In this session, we will cover how new data silos have emerged along these two foundational components of a lakehouse. We will show you how Unity Catalog breaks data silos and how new features in OSS are unifying the Lakehouse ecosystem.

Jonathan Brito
Staff Product Manager
,  
Databricks
,  
,  
,  
Watch now
11:45AM
  –  
12:05PM
 PST
Not Just Lettuce: How Apache Iceberg™ and dbt Are Reshaping the Data Aisle
The recent explosion of open table formats like Iceberg....
11:45AM
  –  
12:05PM
 PST
Not Just Lettuce: How Apache Iceberg™ and dbt Are Reshaping the Data Aisle

The recent explosion of open table formats like Iceberg, Delta Lake, and Hudi has unlocked new levels of interoperability, allowing data to be stored and accessed across a growing range of engines and environments. However, this flexibility also introduces complexity, making it more challenging to maintain consistency, quality, and governance across teams and platforms.

dbt is the data control plane, bringing order to this fragmentation by centralizing business logic and enforcing best practices around data quality, documentation, and governance. Through support for the Analytics Development Lifecycle (ADLC) and deep integration with open formats, dbt empowers teams to standardize development workflows while choosing the right compute and storage for each use case — enabling a scalable, future-proof foundation for modern data platforms.

In this short talk, Amy will dig into how open table formats, with a focus on Iceberg, have changed the way the industry scales data. Open table formats give you the head of lettuce — dbt gives you the recipe to make something useful out of it.

Amy Chen
Staff Product Manager
,  
dbt Labs
,  
,  
,  
Watch now
1:35PM
  –  
1:55PM
 PST
From Kafka to Open Tables: Simplify Data Streaming Integrations with Confluent Tableflow
Modern data platforms demand real-time data—but integrating streaming pipelines...
1:35PM
  –  
1:55PM
 PST
From Kafka to Open Tables: Simplify Data Streaming Integrations with Confluent Tableflow

Modern data platforms demand real-time data—but integrating streaming pipelines with open table formats like Apache Iceberg™, Delta Lake, and Apache Hudi™, has traditionally been complex, expensive, and risky. In this session, you’ll learn how Confluent’s data streaming platform—with unified Apache Kafka® and Apache Flink®—makes it simple to stream all your data into Iceberg tables and Onehouse with Tableflow. Built for open lakehouse architectures, Tableflow lets you represent Kafka topics and their associated schemas as open table formats in just a few clicks, eliminating the need for custom, brittle integrations and batch jobs. See how Confluent enables faster delivery of real-time data products ready for use across open data systems.

Kasun Indrasiri Gamage
Senior Product Manager
,  
Confluent
Yashwanth Dasari
Senior Manager, Product Marketing & GTM Strategy
,  
Confluent
,  
,  
Watch now
Register for 2026
Track 1

Lorem Ipsum Dolor Sit Amet

9:55 AM
  –  
10:05 AM
 PST
An Intro to Trajectory Data for AI Agents
9:55 AM
  –  
10:05 AM
 PST
An Intro to Trajectory Data for AI Agents

AI agents need more than just language - they need to act. This talk introduces trajectory data - an emerging class of data used to training LLM agents. These are sequences of observations, actions, and outcomes that drive agent learning. Whether you're training agents or building the data pipelines behind them, this is your guide to the data powering the next generation of AI agents.

Shelby Heinecke Headshot
Shelby Heinecke
Senior AI Research Manager
,  
Salesforce
,  
,  
,  
Watch now
10:10 AM
  –  
10:30 AM
 PST
OneLake: The OneDrive for data
10:10 AM
  –  
10:30 AM
 PST
OneLake: The OneDrive for data

OneLake eliminates pervasive and chaotic data silos created by developers configuring their own isolated storage. OneLake provides a single, unified storage system for all developers. Unifying data across an organization and clouds becomes trivial. With the OneLake data hub, users can easily explore and discover data to reuse, manage or gain insights. With business domains, different business units can work independently in a data mesh pattern, without the overhead of maintaining separate data stores.

Josh Caplan Headshot
Josh Caplan
Director of Product
,  
Microsoft
,  
,  
,  
Watch now
11:05 AM
  –  
11:25 AM
 PST
To Build or Buy: Key Considerations for a Production-grade Data Lakehouse Platform
11:05 AM
  –  
11:25 AM
 PST
To Build or Buy: Key Considerations for a Production-grade Data Lakehouse Platform

The data lakehouse architecture has made big waves in recent years. But there are so many considerations. Which table formats should you start with? What file formats are the most performant? With which data catalogs and query engines do you need to integrate? To be honest, it can become a bit overwhelming.

But what data engineer doesn't like a good technical challenge? This is where it sometimes becomes a philosophical decision of build vs buy.

In this presentation, Onehouse VP of Product Kyle Weller will break down the pros and cons he has seen over nearly a decade of helping organizations implement their own data lakehouses and building the Universal Data Lakehouse at Onehouse. You'll learn about:

  • The strengths of open table formats such as Apache Hudi™, Apache Iceberg™ and Delta Lake
  • Interoperability via abstraction layers such as Apache XTable™ (incubating)
  • Lakehouse optimizations for cost and performance via Apache Spark™-based runtimes
Kyle Weller Headshot
Kyle Weller
VP of Product
,  
Onehouse
,  
,  
,  
Watch now
11:30 AM
  –  
11:40 AM
 PST
Building a Data Lake for the Enterprise
11:30 AM
  –  
11:40 AM
 PST
Building a Data Lake for the Enterprise

In this talk, I will go over some of the implementation details of how we built a Data Lake for Clari by using a federated query engine built on Trino & Airflow while using Iceberg as the data storage format.
Clari is an Enterprise Revenue Orchestration Platform helping customers run their revenue cadences and helping sales teams close deals more efficiently. Being an enterprise company, we have strict legal requirements of following data governance policies.
I will cover how to design a scalable architecture for building data ingestion pipelines for bringing together data from various sources for the purposes of AI and ML.
I will also cover some of the use cases that have been unlocked in the company with respect to building Agentic Frameworks with the development of this data lake.

Pushkar Garg Photo
Pushkar Garg
Staff Machine Learning Engineer
,  
Clari
,  
,  
,  
Watch now
12:20 PM
  –  
12:40 PM
 PST
Powering Amazon Unit Economics at Scale Using Apache Hudi™
12:20 PM
  –  
12:40 PM
 PST
Powering Amazon Unit Economics at Scale Using Apache Hudi™

Understanding and improving unit-level profitability at Amazon's scale is a massive challenge, one that requires flexibility, precision, and operational efficiency. It's not only about the massive amount of data we ingest and produce, but also the need to support our evergrowing businesses within Amazon. In this talk, we'll walk through how we built a scalable, configuration-driven platform called Nexus, and how Apache Hudi™ became the cornerstone of its data lake architecture.

Jason Liu Photo
Jason Liu
Senior Software Engineer
,  
Amazon
Abhishek Srinivasa Raju Padmavathi's image
Abhishek Srinivasa Raju Padmavathi
Senior Software Development Engineer
,  
Amazon
,  
,  
Watch now
12:45 PM
  –  
12:55 PM
 PST
Scaling Multi-modal Data using Ray Data
12:45 PM
  –  
12:55 PM
 PST
Scaling Multi-modal Data using Ray Data

In the coming years, use of unstructured and multi-modal data for AI workloads will grow exponentially. This talk will focus on how Ray Data effectively scales data processing for these modalities across heterogeneous architectures and is positioned to become a key component of future AI platforms

Jaikumar Ganesh Photo
Jaikumar Ganesh
Head of Engineering
,  
Anyscale
,  
,  
,  
Watch now
1:00 PM
  –  
1:20 PM
 PST
Apache Gluten: Revolutionizing Big Data Processing Efficiency
1:00 PM
  –  
1:20 PM
 PST
Apache Gluten: Revolutionizing Big Data Processing Efficiency

Apache Gluten (incubating) is an emerging open-source project in the Apache software ecosystem. It's designed to enhance the performance and scalability of data processing frameworks such as Apache Spark. By leveraging cutting-edge technologies such as vectorized execution, columnar data formats, and advanced memory management techniques, Apache Gluten aims to deliver significant improvements in data processing speed and efficiency.

The primary goal of Apache Gluten is to address the ever-growing demand for real-time data analytics and large-scale data processing. It achieves this by optimizing the execution of complex data processing tasks and reducing the overall resource consumption. As a result, organizations can process massive datasets more quickly and cost-effectively, enabling them to gain valuable insights and make data-driven decisions faster than ever before.

Binwei Yang Photo
Binwei Yang
Pricinple Software Engineer
,  
IBM
,  
,  
,  
Watch now
2:00 PM
  –  
2:10 PM
 PST
System level security for enterprise AI pipelines
2:00 PM
  –  
2:10 PM
 PST
System level security for enterprise AI pipelines

As the adoption of LLMs continues to expand, awareness of the risks associated with them is also increasing. It is essential to manage these risks effectively amidst the ongoing hype, technological optimism, and fear-driven narratives. This presentation will explore how to address vulnerabilities that may emerge. Our focus will extend beyond simply securing interactions with the models, emphasizing the critical role of surrounding infrastructure and monitoring practices.

The talk will introduce a structured framework for developing "system-level secure" AI deployments from the ground up. This framework covers pre-deployment risks (such as poisoned models), deployment risks (including model deserialization), and online attack vectors (such as prompt injection). Drawing on two years of experience deploying AI systems in sensitive environments with strict privacy and security requirements, the talk will provide actionable strategies to help organizations build secure, resilient applications using open-source LLMs. Attendees will gain practical insights into strengthening both AI models and the supporting infrastructure, equipping them to develop robust AI solutions in an increasingly complex threat environment.

Aishwarya Ramasethu Photo
Aishwarya Ramasethu
AI Engineer
,  
Prediction Guard
,  
,  
,  
Watch now
2:15 PM
  –  
2:25 PM
 PST
AI Agents for ETL/ELT Code Generation: Multiply Productivity with Generative AI
2:15 PM
  –  
2:25 PM
 PST
AI Agents for ETL/ELT Code Generation: Multiply Productivity with Generative AI

This talk delves into the revolutionary potential of AI agents, powered by generative AI and large language models (LLMs), in transforming ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) processes.
With an emphasis on automating code generation, streamlining workflows, and enhancing data quality, this session explores how AI-driven solutions are reshaping the landscape of data engineering. Attendees will learn how these intelligent agents can reduce manual coding, eliminate errors, improve operational efficiency, and meet compliance requirements, all while accelerating development timelines.
We will also cover key use cases where AI agents facilitate real-time data transformation, ensure data governance, and promote seamless deployment across cloud environments, giving businesses a competitive edge in today's data-driven world.

Koti Darla Headshot
Koti Darla
Tech Lead Data Engineer at Southwest Airlines at Southwest Airlines
,  
Southwest Airlines
,  
,  
,  
Watch now
2:40 PM
  –  
3:00 PM
 PST
The Trifecta of Tech: Why Software, Data, and AI Must Work Together to Create Real Value
2:40 PM
  –  
3:00 PM
 PST
The Trifecta of Tech: Why Software, Data, and AI Must Work Together to Create Real Value

In today’s rush to adopt AI, many organizations overlook a critical truth: value doesn’t come from AI alone—it comes from the powerful combination of software engineering, data engineering, and AI/ML engineering. In this fast-paced, 15-minute talk, Nisha Paliwal draws on 25+ years of experience in banking and tech to unpack the "Trifacta" that fuels real transformation.

From legacy systems to self-learning platforms, she’ll share stories, stats, and insights on how this triad—when integrated—enables banks to move faster, deliver smarter experiences, and generate measurable impact.

Whether you're a technologist, leader, or change agent, you’ll walk away with a fresh lens on cross-functional collaboration, and practical ways to break silos, build trust, and unlock innovation at scale.

Nisha Paliwal Headshot
Nisha Paliwal
Author, Advisor to C level
,  
Self Employed
,  
,  
,  
Watch now
3:05 PM
  –  
3:25 PM
 PST
Data as Software and the programmable lakehouse
3:05 PM
  –  
3:25 PM
 PST
Data as Software and the programmable lakehouse

Two very big things happened in recent years that completely transformed the data landscape:

  • Pre-trained AI models: These models have democratized AI, enabling software engineers to integrate advanced capabilities into applications with simple API calls and without extensive machine learning expertise. ​
  • Lakehouses: Open formats over object storage bring together the flexibility of data lakes and the data management strengths of data warehouses, offering a more streamlined and scalable approach to data management.

And yet, most data platforms remain difficult to digest for traditional software developers.

This talk introduces "Data as Software," a practical approach to data engineering and AI that leverages the lakehouse architecture to simplify data platforms for developers. By leveraging serverless functions as a runtime and Git-based workflows in the data catalog, we can build systems that makes it exponentially simpler for data developers to apply familiar software engineering concepts to data, such as modular, reusable code, automated testing (TDD), continuous integration (CI/CD), and version control.

Ciro Greco Photo
Ciro Greco
Co-founder and CEO
,  
Bauplan
,  
,  
,  
Watch now
Track 2

Lorem Ipsum Dolor Sit Amet

9:55 AM
  –  
10:05 AM
 PST
A Flexible, Efficient Lakehouse Architecture for Streaming Ingestion
9:55 AM
  –  
10:05 AM
 PST
A Flexible, Efficient Lakehouse Architecture for Streaming Ingestion

Zoom went from a meeting platform to a household name during the COVID-19 pandemic. That kind of attention and usage required significant storage and processing to keep up. In fact, Zoom had to scale their data lakehouse to 100TB/day while meeting GDPR requirements.

Join this session to learn how Zoom built its lakehouse around Amazon Managed Streaming for Kafka (AmazonMSK), Amazon EMR clusters running Apache Spark™ Structured Streaming jobs (for optimized parallel processing of 150 million Kafka messages every 5 minutes) and Apache Hudi on Amazon S3 (for flexible, cost-efficient storage). Raj will talk through the lakehouse architecture decisions, data modelling and data layering, the medallion architecture for data engineering, and how Zoom leverages various open table formats, including Apache Hudi™, Apache Iceberg™ and Delta Lake.

Rajwardhan Singh Photo
Rajwardhan Singh
Job Title Engineering Manager
,  
ZOOM (ZM)
,  
,  
,  
Watch now
10:10 AM
  –  
10:30 AM
 PST
Bringing the Power of Google’s Infrastructure to your Apache Iceberg™ Lakehouse with BigQuery
10:10 AM
  –  
10:30 AM
 PST
Bringing the Power of Google’s Infrastructure to your Apache Iceberg™ Lakehouse with BigQuery

Apache Iceberg has become a popular table format for building data lakehouses, enabling multi-engine interoperability. This presentation explains how Google BigQuery leverages Google's planet-scale infrastructure to enhance Iceberg, delivering unparalleled performance, scalability, and resilience.

Victor Agababov Photo
Victor Agababov
Software Engineer
,  
Google, Inc
,  
,  
,  
Watch now
11:05 AM
  –  
11:25 AM
 PST
Open Source Query Performance - Inside the next-gen Presto C++ engine
11:05 AM
  –  
11:25 AM
 PST
Open Source Query Performance - Inside the next-gen Presto C++ engine

Presto (https://prestodb.io/) is a popular open source SQL query engine for high performance analytics in the Open Data Lakehouse. Originally developed at Meta, Presto has been adopted by some of the largest data-driven companies in the world including Uber, ByteDance, Alibaba and Bolt. Today it’s available to run on your own or through managed services such as IBM watsonx.data and AWS Athena.

Presto is fast, reliable and efficient at scale. The latest innovation in Presto is a state of the art C++ native query execution engine that replaces the old Java execution engine. Presto C++ is built using Velox, which is another Meta open source project for common runtime primitives across query engines. Deployments with the new Presto native engine show massive price performance improvements with fleet sizes shrinking to almost 1/3rd of their Java cluster counterparts, leading to enormous cost savings.

The Presto Native engine project began in 2020 and since then, it has matured into production use at Meta, Uber, and in IBM watsonx.data. This talk gives an in-depth look at this journey, covering:

  • Introduction to Prestissimo/Velox architecture
  • Production experiences and learnings from Meta
  • Benchmarking results from TPC-DS workloads
  • New lakehouse capabilities enabled by the native engine

Beyond the product features, we will highlight how the open source community shaped this innovation and the benefits of building technology like this openly across many companies.

Aditi Pandit Photo
Aditi Pandit
Principal Engineer
,  
IBM
Amit Dutta
Software Engineer
,  
Meta
,  
,  
Watch now
11:30 AM
  –  
11:40 AM
 PST
Moving fast and not causing chaos
11:30 AM
  –  
11:40 AM
 PST
Moving fast and not causing chaos

Data engineering teams often struggle to balance speed with stability, creating friction between innovation and reliability. This talk explores how to strategically adapt software engineering best practices specifically for data environments, addressing unique challenges like unpredictable data quality and complex dependencies. Through practical examples and a detailed case study, we'll demonstrate how properly implemented testing, versioning, observability, and incremental deployment patterns enable data teams to move quickly without sacrificing stability. Attendees will leave with a concrete roadmap for implementing these practices in their organizations, allowing their teams to build and ship with both speed and confidence.

Joseph Machado Photo
Joseph Machado
Senior Data Engineer
,  
Netflix
,  
,  
,  
Watch now
12:20 PM
  –  
12:40 PM
 PST
How Our Team Used Open Table Formats to Boost Querying and Reduce Latency for Faster Data Access
12:20 PM
  –  
12:40 PM
 PST
How Our Team Used Open Table Formats to Boost Querying and Reduce Latency for Faster Data Access

In this talk, Amaresh Bingumalla shares how his team utilized Apache Hudi to enhance data querying on blob storages such as S3. He describes how Hudi helped them cut ETL time and costs, enabling efficient querying without taxing production RDS instances. He also talks about how they leveraged open-source tools such as Apache Hudi™, Apache Spark™, and Apache Kafka™ to build near real-time data pipelines. He explains how they improved their ML and analytics workflows by using some of the key Hudi features such as ACID compliance, time travel queries, and incremental reads, and paired with Datahub, how they boosted data discoverability for downstream systems.

Amaresh Bingumalla Photo
Amaresh Bingumalla
Senior Data Platform Engineer
,  
Peloton Interactive Inc.
,  
,  
,  
Watch now
12:45 PM
  –  
12:55 PM
 PST
ODBC Takes an Arrow to the Knee
12:45 PM
  –  
12:55 PM
 PST
ODBC Takes an Arrow to the Knee

For decades, ODBC/JDBC have been the standard for row-oriented database access. However, modern OLAP systems tend instead to be column-oriented for performance - leading to significant conversion costs when requesting data from database systems. This is where Arrow Database Connectivity comes in!

ADBC is similar to ODBC/JDBC in that it defines a single API which is implemented by drivers to provide access to different databases. However, ADBC's API is defined in terms of the Apache Arrow in-memory columnar format. Applications can code to this standard API much like they would for ODBC or JDBC, but fetch result sets in the Arrow format, avoiding transposition and conversion costs if possible.

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Snowflake, Flight SQL or Postgres) with Arrow-native in-memory data.

Matthew Topol Photo
Matthew Topol
Co-Founder
,  
Columnar
,  
,  
,  
Watch now
1:00 PM
  –  
1:25 PM
 PST
Panel: The Rise of Open Data Platforms
1:00 PM
  –  
1:25 PM
 PST
Panel: The Rise of Open Data Platforms

The rise of open data platforms is reshaping the future of data architectures. In this panel, we will explore the evolution of modern data ecosystems, with a focus on lakehouses, open query engines, and open table formats. We will examine how these open-source technologies are breaking down traditional data silos, enabling scalable, flexible, and cost-effective solutions. Panelists will discuss the impact of open standards on data accessibility, performance, and interoperability, while offering insights into the growing importance of community-driven development in shaping the future of data platforms. Join us for an engaging conversation about the convergence of open technologies and the next wave of data architecture evolution.

Arturas Tutkus Headshot
Arturas Tutkus
Engineering manager
,  
KAYAK
Dipti Borkar
VP & GM
,  
Microsoft
Jing Li
Senior Staff Software Engineer
,  
Uber
Jonathan Rau Headshot
Jonathan Rau
VP/Distinguished Engineer
,  
Query
Watch now
2:00 PM
  –  
2:10 PM
 PST
Cross paradigm compute engine for AI/ML data
2:00 PM
  –  
2:10 PM
 PST
Cross paradigm compute engine for AI/ML data

AI/ML systems require realtime information derived from many data sources. This context is needed to create prompts and features. Most successful AI models require rich context from a vast number of sources.

To power this, engineers need to manually split their logic and place it in various data processing “paradigms” - stream processing, batch processing, embedding generation and inference services.

Today practitioners need to spend tremendous effort to stitch together disparate technologies to power for *each* piece of context.

While at Airbnb, we created a system to automate the data and systems engineering required to power AI models both for training / fine-tuning and for online inference.

It is deployed in critical ML pathways and actively developed by Stripe, Uber, OpenAI and Roku (in addition to Airbnb).

In this talk I will go over use cases, the Chronon project overview, and future directions.

Nikhil Simha Photo
Nikhil Simha
CTO
,  
Zipline AI
,  
,  
,  
Watch now
2:15 PM
  –  
2:25 PM
 PST
Scale Without Silos: Customer-Facing Analytics on Open Data
2:15 PM
  –  
2:25 PM
 PST
Scale Without Silos: Customer-Facing Analytics on Open Data

Customer-facing analytics is your competitive advantage, but ensuring high performance and scalability often comes at the cost of data governance and increased data silos. The open data lakehouse offers a solution—but how do you power low-latency, high-concurrency queries at scale while maintaining an open architecture?

In this talk, we’ll dive into the core query engine innovations that make customer-facing analytics on an open lakehouse possible. We’ll cover:

  • Key challenges of customer-facing analytics at scale
  • Query engine essentials for achieving fast, concurrent queries without sacrificing governance
  • Real-world case studies, including how industry leaders like TRM Labs are moving their customer-facing workloads to the open lakehouse

Join us to explore how you can unlock the full potential of customer-facing analytics—without compromising on governance, flexibility, or cost efficiency.

Sida Shen Photo
Sida Shen
Product Manager
,  
CelerData
,  
,  
,  
Watch now
2:40 PM
  –  
3:00 PM
 PST
Data Mesh and Governance at Twilio
2:40 PM
  –  
3:00 PM
 PST
Data Mesh and Governance at Twilio

At Twilio, our data mesh enables data democratization by allowing domains to share and access data through a central analytics platform without duplicating datasets—and vice versa. Using AWS Glue and Lake Formation, only metadata is shared across AWS accounts, making the implementation efficient with low overhead while ensuring data remains consistent, secure, and always up to date. This approach supports scalable, governed, and seamless data collaboration across the organization.

Aakash Pradeep Headshot
Aakash Pradeep
Principal Software Engineer
,  
Twilio
,  
,  
,  
Watch now
Register for 2026

Workshop

3:35PM
  –  
4:00PM
 PST
Open Data using Onehouse Cloud
If you've ever tried to build a data lakehouse, you know it's no small task. You've got to tie...
3:35PM
  –  
4:00PM
 PST
Open Data using Onehouse Cloud

If you've ever tried to build a data lakehouse, you know it's no small task. You've got to tie together file formats, table formats, storage platforms, catalogs, compute, and more. But what if there was an easy button?

Join this session to see how Onehouse delivers the Universal Data Lakehouse that is:

Fast - Ingest and incrementally process data from stream, operational databases, and cloud storage with minute-level data freshness.

Efficient - Innovative optimizations ensure that you squeeze every bit of performance out of your resources with a runtime optimized for lakehouse workloads.

Simple - Onehouse is delivered as a fully managed cloud sevice, so you can spin up a production-ready lakehouse in days--or less.

The session will include a live demo. Attendees will be elible for up to $1,000 in free credits to try Onehouse for their organization.

Chandra Krishnan
Solutions Engineer
,  
Onehouse
Watch now

Full agenda coming soon!

Register For 2026

Secure your spot at the premier data practitioner event! Don’t miss out on expert insights, hands-on workshops, and networking opportunities.