Skip to main content

DataFlow: AI-Native ETL/ELT

Ever tried to move data between databases and ended up writing more boilerplate than actual logic? Or watched a 3-hour ETL job crash at 97% with a cryptic error that basically says "good luck figuring this out"?

Meet DataFlow - where "copy this table from Oracle to BigQuery" is literally all you need to say.

The Magic: Data Pipeline as Conversation

Traditional ETL tools make you:

  1. Write complex configuration files
  2. Define schemas manually
  3. Handle errors with blind retries
  4. Pray nothing breaks at 3 AM

DataFlow flips the script:

  1. You describe what you want in plain English
  2. QRY figures out the optimal strategy based on your data
  3. If something breaks, the AI actually understands why and fixes it
  4. You sleep soundly while your data moves itself

How It Works

Just Ask

You: I need to copy the sales_transactions table from Oracle to BigQuery.
It's about 50GB.

QRY: I'll help you move the sales_transactions table. Let me analyze
the source and plan the optimal strategy.

Here's what I found:
- Source: oracle.sales.sales_transactions
- Estimated size: 47.3 GB (312 million rows)
- Strategy: Distributed processing (8 parallel workers)
- Estimated time: 25-35 minutes

Type mappings:
- TRANSACTION_ID (NUMBER) → INT64
- AMOUNT (NUMBER(18,2)) → NUMERIC
- TRANSACTION_DATE (DATE) → DATE
- CUSTOMER_NAME (VARCHAR2) → STRING

Should I proceed?

You: Yes, and add a column for when this data was loaded.

QRY: Got it. Adding LOAD_TIMESTAMP with the current UTC time.

Pipeline started!

Progress:
████████░░░░░░░░░░░░ 42% | 132M rows | 19.8 GB
Current partition: 2024-03 to 2024-04
ETA: 18 minutes

Smart Type Conversions

DataFlow automatically handles type mappings between databases. Oracle NUMBER becomes BigQuery INT64 or NUMERIC. VARCHAR2 becomes STRING. CLOBs, BLOBs, dates, timestamps - all handled automatically.

No more manual schema definitions. No more type mismatch errors at runtime.

Auto-Healing When Things Go Wrong

Traditional ETL:

Error: Connection timeout at row 5,234,891
Action: Retry from beginning (again)
Time wasted: 2 hours

DataFlow:

Issue detected: Connection timeout during write

Analyzing... The target database is experiencing high load.
Action: Reducing batch size and switching to off-peak connection pool.
Resuming from checkpoint at row 5,234,891.

Progress continues...

The AI doesn't just retry blindly - it understands the context, diagnoses the problem, and adapts the strategy.

Key Features

Any-to-Any Database Transfers

Move data between any connected datasources:

  • OracleBigQuery
  • PostgreSQLSnowflake
  • MySQLCloudera
  • Any combination you have connected

Automatic Scaling

DataFlow picks the right execution mode for your data:

Data SizeHow It Runs
Small (< 100 MB)Instant, inline execution
Medium (100 MB - 10 GB)Background job
Large (10 GB - 100 GB)Distributed worker pool
Massive (> 100 GB)Spark cluster

You don't need to configure anything. Just describe what you want.

Schedule Recurring Pipelines

You: Can you run this every night at 2 AM?

QRY: Created scheduled pipeline:
- Schedule: Daily at 02:00 UTC
- Mode: Incremental (only new transactions)
- Filter: WHERE transaction_date >= yesterday
- Notifications: Email on completion/failure

The nightly job will only transfer new transactions
since the last run, making it much faster.

Add Transformations On-The-Fly

You: Copy the customers table, but combine first_name and last_name
into a full_name column, and filter out inactive customers.

QRY: I'll copy the customers table with these modifications:
- New column: full_name = first_name + ' ' + last_name
- Filter: WHERE status = 'active'

Proceeding...

Real-Time Progress Tracking

Check on your pipelines anytime:

You: How's that Oracle migration going?

QRY: Pipeline Status: oracle_migration_20241027

Progress: 67% complete
Processed: 3.2 GB of 4.8 GB
Rows: 45.2M of 67.5M
Speed: 125K rows/second
ETA: 5 minutes

No errors encountered.

Use Cases

Data Warehouse Loading

Move operational data from your ERP (Oracle, SAP) to your analytics warehouse (BigQuery, Snowflake) on a schedule.

Database Migration

Moving to a new platform? DataFlow handles the heavy lifting with smart type conversions and progress tracking.

Cross-Platform Analytics

Combine data from multiple sources for unified reporting without manual exports and imports.

Backup and Archival

Automatically archive old data to cheaper storage tiers while maintaining queryability.

Dev/Test Data Refresh

Keep development and test environments updated with production-like data (with optional anonymization).

Security & Compliance

  • RBAC Enforced: You can only move data you have permission to read/write
  • Audit Logging: Every pipeline execution is logged with user, source, target, and row counts
  • Encrypted Transit: All database connections use TLS
  • Credential Isolation: Workers use service accounts, not your personal credentials
  • Temporary Data Cleanup: Intermediate files are encrypted and auto-deleted

Getting Started

  1. Connect your datasources - Make sure both source and target databases are connected in QRY
  2. Start a conversation - Just describe what data you want to move and where
  3. Review the plan - QRY shows you estimated size, time, and type mappings
  4. Confirm and go - Watch the progress or go grab coffee

That's it. No YAML files. No Airflow DAGs. No Python scripts. Just conversation.

When to Use DataFlow vs Forge

Use CaseBest Choice
Simple table copyDataFlow - just describe what you want
Ad-hoc data movementDataFlow - conversational interface
Complex multi-step pipelinesLakeflow - visual DAG and YAML
Scheduled multi-task workflowsLakeflow - job orchestration
Data quality expectationsLakeflow - built-in expectations
AI + ETL + notifications in one workflowLakeflow - job with mixed task types

Think of DataFlow as your friendly conversational assistant for data movement, and Lakeflow as your enterprise-grade orchestration platform when you need more control.


DataFlow: Because "SELECT * FROM source INSERT INTO destination" should actually be that simple.

QRYA product of IXEN.