Skip to main content

QRY Lakeflow

Status: Production Ready

Overview

Remember when building data pipelines meant spending weeks wrestling with Airflow DAGs, debugging cryptic YAML errors at 2 AM, and explaining to your manager why "it works on my machine" doesn't apply to production? Those days are over.

Lakeflow is QRY's data orchestration engine for building, scheduling, and monitoring data pipelines and jobs. Think Databricks Lakeflow meets a UX designer who actually uses the product. Define complex data workflows using simple YAML or drag-and-drop visual editing — your choice, and yes, they sync automatically.

Key Features

Professional IDE Experience

Lakeflow features a full IDE interface inspired by VS Code and modern development tools. Work with multiple pipelines and jobs simultaneously using tabs, keyboard shortcuts, and powerful navigation.

Dual Editing Modes

Switch between YAML and visual drag-and-drop editing whenever you want. Changes in one mode automatically sync to the other. It's like having your cake and eating it too, except the cake is a perfectly configured ETL pipeline.

Interactive DAG Canvas

Design task workflows with drag-and-drop connections. No more drawing diagrams on whiteboards that nobody updates. Your DAG is always accurate because it is the configuration.

Real-Time Monitoring

Watch execution progress live with animated task nodes. Blue pulse means running, green means success, red means "time to investigate." It's like watching your data do a choreographed dance, except occasionally one dancer trips.

AI-Assisted Authoring

Tell Lakeflow what you want in plain English, and it generates the YAML for you. "Create a pipeline that aggregates daily sales by region" becomes a working configuration in seconds. Your keyboard thanks you.

Stage-Based Pipeline Design

Visual Source → Transform → Expectations → Sink flow that makes data lineage obvious. Even your manager can follow along during demos.

Git Integration

Version control your pipelines and jobs with Git Folders. Track changes, collaborate with teammates, and integrate with CI/CD workflows.

Two Core Abstractions

Lakeflow gives you two powerful building blocks:

TypePurposeThink of it as...
PipelinesSingle-purpose data transformations (ETL/ELT)A well-trained specialist
JobsMulti-task orchestrations with dependenciesA conductor leading an orchestra

Getting Started

  1. Open QRY
  2. Click Lakeflow in the left navigation rail
  3. Marvel at the dashboard showing all your pipelines and jobs

The IDE Interface

Lakeflow uses a three-panel IDE layout similar to VS Code:

┌─────────────────────────────────────────────────────────────┐
│ Tab Bar (open pipelines/jobs, drag to reorder) │
├──────────┬────────────────────────────────┬─────────────────┤
│ │ │ │
│ File │ Editor Area │ Right Rail │
│ Explorer │ (YAML or Visual) │ (panel icons) │
│ │ │ │
│ Folders │ ├─────────────────┤
│ & Items │ │ Panel Content │
│ │ │ (contextual) │
└──────────┴────────────────────────────────┴─────────────────┘

Components:

  • Tab Bar: Open multiple pipelines/jobs simultaneously, drag tabs to reorder
  • File Explorer: Browse folders, search items, filter by type/status
  • Editor Area: Main workspace for YAML editing or visual canvas
  • Right Rail: Quick access to Run History, Settings, Validation
  • Panel Area: Contextual panels that slide in from the right

Keyboard Shortcuts

Master these shortcuts to work efficiently:

ShortcutAction
Cmd+SSave current item
Cmd+EnterValidate configuration
Cmd+Shift+EnterDeploy/Activate
Cmd+PQuick open (search items)
Cmd+BToggle sidebar
Cmd+\Toggle split view

The Interface at a Glance

AreaWhat it does
ToolbarCreate new items, filter by status/workspace, bulk actions
Folder TreeOrganize pipelines and jobs (yes, folders actually work here)
Item ListTable or card view - your choice
Status IndicatorsVisual badges so you know what's running, broken, or waiting

Pipelines: Your Data Transformation Workhorses

Pipelines are single-purpose data transformation workflows that move and transform data between sources and targets. One pipeline, one job, done well.

Pipeline Lifecycle

DRAFT → DEPLOYED → DEPRECATED
  • Draft: Work in progress, edit freely
  • Deployed: Production-ready, running in the wild
  • Deprecated: Retirement home for pipelines you can't quite delete yet

Creating a Pipeline

  1. Click + New Pipeline in the toolbar
  2. Choose your preferred editing mode (YAML or Visual)
  3. Define your pipeline:
name: sales_daily_summary
description: "Aggregate daily sales by region"

source:
datasource: bigquery
catalog: my_project
schema: raw_data

target:
catalog: my_project
schema: analytics

tables:
- name: daily_sales
type: live_table
query: |
SELECT
DATE(transaction_date) as sale_date,
region,
SUM(amount) as total_amount,
COUNT(*) as transaction_count
FROM source.transactions
WHERE transaction_date >= CURRENT_DATE - 30
GROUP BY 1, 2
  1. Click Validate to catch errors before they catch you
  2. Click Save to preserve your work
  3. Click Deploy when you're ready for prime time

Visual Pipeline Editor

The Pipeline Editor shows a linear stage-based flow:

Source → Transform → Expectations → Sink

Each stage is a clickable card:

StageColorWhat you configure
SourceCyanWhere data comes from, query, incremental settings
TransformPurpleSQL or Python transformations
ExpectationsAmberData quality checks (because garbage in = garbage out)
SinkGreenWhere data lands, write mode, merge keys

Pipeline Table Types

TypeWhen to use
live_tableReal-time aggregations, computed on-demand
materialized_viewPerformance optimization, pre-computed results
streamingContinuous ingestion, real-time data feeds

Jobs: Orchestrating the Orchestra

Jobs combine multiple tasks into complex workflows. Tasks can depend on each other, forming a directed acyclic graph (DAG) - fancy words for "things happen in the right order."

Job Lifecycle

DRAFT → ACTIVE ⟷ PAUSED → ARCHIVED
  • Draft: Build and test without affecting production
  • Active: Scheduled and running
  • Paused: Taking a break, preserves schedule
  • Archived: Soft deleted (in case you change your mind)

Creating a Job

  1. Click + New Job in the toolbar
  2. Define your orchestration:
name: daily_analytics_workflow
description: "End-to-end daily analytics processing"

schedule:
cron: "0 6 * * *" # 6 AM UTC daily
timezone: "UTC"

tasks:
- name: extract_data
type: pipeline
pipeline_name: raw_data_ingestion
timeout_seconds: 1800

- name: transform_data
type: pipeline
pipeline_name: sales_daily_summary
depends_on:
- extract_data
timeout_seconds: 3600

- name: generate_report
type: prompt
prompt_config:
prompt: "Analyze today's sales data and summarize key insights"
model: "gemini-2.0-flash"
context:
include_upstream_results: true
depends_on:
- transform_data

- name: notify_team
type: notification
notification_config:
channels:
- type: email
recipients:
- analytics@company.com
message: "Daily analytics job completed successfully"
depends_on:
- generate_report
  1. Click Validate, then Save, then Activate

Task Types

Lakeflow supports five task types for maximum flexibility:

Pipeline Task

Run a Lakeflow pipeline as part of your job.

- name: run_etl
type: pipeline
pipeline_name: my_pipeline
timeout_seconds: 3600

Prompt Task

Execute an AI prompt - yes, you can have AI analyze your data as part of the workflow.

- name: analyze_data
type: prompt
prompt_config:
prompt: "Analyze the data and provide insights"
system: "You are a data analyst"
model: "gemini-2.0-flash"
tools:
- DatabaseTool
- PythonTool
context:
include_upstream_results: true
output:
format: "markdown"
max_tokens: 4000

Python Task

Run custom Python code in a sandboxed environment.

- name: custom_processing
type: python
python_config:
script: |
import pandas as pd

# Access upstream results
upstream_data = context.get('upstream_results', {})

# Your custom logic
result = {"processed": True}
print(f"Processed data: {result}")
requirements:
- pandas>=2.0.0
timeout_seconds: 600

Notification Task

Send alerts when things happen (or don't).

- name: send_alert
type: notification
notification_config:
channels:
- type: email
recipients:
- team@company.com
message: "Job completed with status: {{ job.status }}"
subject: "Daily Job Update"

Condition Task

Control flow based on upstream results - because sometimes you need if/else in your pipelines.

- name: check_quality
type: condition
condition_config:
expression: "upstream.data_quality.score > 0.95"
on_true: continue
on_false: skip_downstream

Visual Job Editor

The Job Editor provides an interactive DAG canvas:

Task Palette (left sidebar):

Task TypeIconColor
PipelineWorkflowIndigo
PromptMessagePurple
PythonCodeGreen
NotificationBellOrange
ConditionGit branchSlate

Creating Tasks:

  • Drag and Drop: Grab a task type from the palette, drop it on the canvas
  • YAML Editing: Switch to YAML mode, add your task, watch the visual DAG update

Connecting Tasks:

  1. Hover over a task node
  2. Drag from the bottom handle
  3. Connect to another task's top handle
  4. The depends_on relationship creates automatically

Interactive DAG Features

FeatureHow
PanClick and drag on empty canvas
ZoomMouse wheel or pinch
MiniMapOverview navigation in corner
SelectClick nodes, Shift+click for multi-select

Scheduling

Use standard cron expressions with timezone support:

schedule:
cron: "0 6 * * *" # Daily at 6 AM
timezone: "America/New_York"

Common Patterns:

PatternWhen it runs
0 * * * *Every hour
0 6 * * *Daily at 6 AM
0 6 * * 1Every Monday at 6 AM
0 6 1 * *First day of month at 6 AM
*/15 * * * *Every 15 minutes

Format: minute hour day month weekday

Real-Time Execution Monitoring

When a job runs, the DAG comes alive:

StatusAppearance
PendingGray nodes
RunningBlue nodes with pulse animation
CompletedGreen nodes
FailedRed nodes
SkippedGray nodes with opacity

Edge animations show data flow direction. It's oddly satisfying to watch.

Folder Organization

Keep your pipelines and jobs organized:

  1. Click + next to "Folders" in the sidebar
  2. Configure:
    • Name: URL-safe slug (sales-etl)
    • Display Name: Human-readable (Sales ETL Pipelines)
    • Color: Visual identifier
    • Icon: Choose from Lucide icons

Move items via drag-and-drop, context menu, or bulk actions.

Git Folders

Folders can be Git-enabled for version control:

  1. Create a folder or select existing one
  2. Click Enable Git in folder settings
  3. Optionally connect to a remote repository

Git Folders provide:

  • Version history for all pipelines and jobs inside
  • Branch management for safe experimentation
  • Remote sync with GitHub, GitLab, Bitbucket
  • CI/CD integration for automated deployments

See Git Folders documentation for complete details.

Workspace Integration

Lakeflow integrates with QRY Workspaces for team collaboration:

  • Personal: Visible only to you
  • Workspace: Shared with team members

Permissions follow the usual pattern: View, Execute, Edit, Admin.

AI-Assisted YAML Generation

Open the AI Assistant from the toolbar and describe what you want:

"Create a pipeline that aggregates daily sales by product category
from the transactions table in BigQuery and stores results in
the analytics schema"

The AI generates the YAML. Click Apply. Done.

Example Prompts:

Pipeline:

Create a pipeline to deduplicate customer records from the raw_customers
table based on email, keeping the most recent entry

Job:

Build a job that runs every Monday at 9 AM to:
1. Refresh the weekly sales pipeline
2. Generate an AI summary of sales trends
3. Email the report to the sales team

Best Practices

Pipeline Design

  1. Single Responsibility: One pipeline, one purpose
  2. Idempotency: Design to be safely re-runnable
  3. Data Quality Checks: Use expectations to catch issues early
  4. Documentation: Future you will thank present you

Job Orchestration

  1. Modular Tasks: Break complex workflows into discrete steps
  2. Realistic Timeouts: Don't guess, measure
  3. Retry Configuration: Handle transient failures gracefully
  4. Notifications: Alert on failures before your users do

Naming Conventions

Pipelines:  {domain}_{action}_{frequency}
e.g., sales_aggregate_daily

Jobs: {domain}_{workflow}_{frequency}
e.g., analytics_reporting_weekly

Folders: {domain}-{category}
e.g., sales-etl, finance-reports

Scheduling Strategy

  • Off-peak hours: Schedule heavy jobs during low-usage times
  • Dependency chains: Stagger dependent jobs appropriately
  • Timezone awareness: Consider your team's working hours
  • Buffer time: Allow gaps between scheduled jobs

Troubleshooting

Pipeline Won't Deploy

  • Validate the YAML configuration
  • Check for syntax errors in SQL queries
  • Verify datasource connections exist
  • Ensure you have edit permissions

Job Not Running on Schedule

  • Verify job status is Active (not Draft or Paused)
  • Check cron expression syntax
  • Verify timezone setting
  • Review scheduler service logs

Task Stuck in Running

  • Check task timeout settings
  • Review underlying query/script performance
  • Cancel the run and investigate
  • Check resource limits (memory, CPU)

API Reference

For programmatic access:

# List pipelines
GET /api/lakeflow/pipelines

# Create pipeline
POST /api/lakeflow/pipelines
Content-Type: application/json
{
"definition": "<yaml>",
"format": "yaml"
}

# Run job
POST /api/lakeflow/jobs/{id}/run

# Stream run progress (SSE)
GET /api/lakeflow/job-runs/{run_id}/stream

See the API Reference for complete documentation.

See Also

  • Git Folders - Version control for pipelines and jobs
  • DataFlow - AI-native ETL for simpler migrations
  • Forge - LLM-driven database migration platform (Teradata/Oracle/Cloudera → BigQuery)
  • Scheduled Tasks - For scheduling conversations and reports
  • Notebooks - Reusable analysis workflows
  • Workspaces - Team collaboration

Last updated: April 2026

QRYA product of IXEN.