Skip to main content

Git Folders

Unified version control for your notebooks, pipelines, and jobs. Because "final_v3_REALLY_FINAL.qrynb" isn't a version control strategy.

Status

Production Ready - Full Git integration with local history, remote sync, and collaboration features

Overview

Git Folders brings professional version control to QRY. Instead of managing assets in isolation, organize them into Git-enabled folders that track every change, enable collaboration, and integrate with your existing GitHub, GitLab, or Bitbucket workflows.

Think of it as Databricks Git Folders, but designed specifically for QRY's notebooks, Forge pipelines, and jobs. Your data assets deserve the same version control discipline as your application code.

What you get:

  • Version history for every notebook, pipeline, and job
  • Branch management for experimenting safely
  • Remote sync with GitHub, GitLab, Bitbucket, and Azure DevOps
  • Conflict resolution when teammates edit the same files
  • CI/CD integration for automated deployments

How It Works

The Architecture

Git Folders uses a hybrid approach: your database remains the source of truth for fast access, while Git provides versioning and collaboration.

┌─────────────────────────────────────────────────────┐
│ QRY Workspace │
├─────────────────────────────────────────────────────┤
│ 📓 Notebooks 📊 Pipelines ⚙️ Jobs │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Git Sync │ │
│ │ Service │ │
│ └──────┬───────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ ▼ ▼ │
│ Shadow Git Remote Sync CI/CD API │
│ (Local) (Push/Pull) (Webhooks) │
│ │ │ │
│ │ ┌────▼────┐ │
│ │ │ GitHub │ │
│ │ │ GitLab │ │
│ │ │ Bitbucket│ │
│ │ └─────────┘ │
└───────┼─────────────────────────────────────────────┘

Local Git Repository

Key principle: Changes are always saved to the database first (for speed), then synced to Git (for history and collaboration). This means you never lose work, even if Git sync fails.

Folder Structure

Git Folders organize your assets in a clean hierarchy:

📁 analytics-team/
│ └─ Remote: github.com/acme/analytics

├── 📓 notebooks/
│ ├── sales-analysis.qrynb
│ └── weekly-report.qrynb

├── 📊 pipelines/
│ └── customer-etl.yaml

└── ⚙️ jobs/
└── daily-refresh.yaml

Each folder can be:

  • Local only: Version history without remote sync
  • Connected to remote: Full push/pull capabilities

Getting Started

Creating a Git Folder

From scratch:

  1. Navigate to Forge or Notebooks
  2. Click + New Folder in the sidebar
  3. Choose Git Folder
  4. Configure:
    • Name: URL-safe slug (analytics-team)
    • Display Name: Human-readable (Analytics Team)
    • Workspace: Where it belongs

From existing repository:

  1. Click Clone Repository
  2. Enter the remote URL
  3. Authenticate (PAT, SSH, or OAuth)
  4. Select branch to clone
  5. QRY imports all compatible assets

Connecting to Remote

  1. Open folder settings (⚙️)
  2. Click Connect to Remote
  3. Select provider:
    • GitHub - Personal or organization repos
    • GitLab - Cloud or self-hosted
    • Bitbucket - Cloud or Server
    • Azure DevOps - Microsoft's platform
  4. Authenticate:
    • Personal Access Token (recommended)
    • SSH Key (for advanced users)
    • OAuth (for seamless browser auth)
  5. Choose or create repository
  6. Select default branch

Core Operations

Committing Changes

When you save a notebook, pipeline, or job, Git Folders tracks the change locally. To create a permanent version:

  1. Click Git in the folder toolbar
  2. Review changed files:
    ✚ notebooks/new-analysis.qrynb     (new file)
    ✎ notebooks/sales-analysis.qrynb (modified)
  3. Enter commit message
  4. Click Commit

Commit message tips:

  • Be descriptive: "Add regional breakdown to sales analysis"
  • Not helpful: "Updates" or "Fixed stuff"

Viewing History

Every asset has complete version history:

  1. Open any notebook, pipeline, or job
  2. Click History (🕒)
  3. Browse commits:
    ● abc1234  Update visualization chart type
    │ John Doe • 2 hours ago
    │ [View] [Diff] [Restore]

    ● def5678 Add regional breakdown
    │ Jane Smith • 1 day ago

    ● 789abcd Initial version
    John Doe • 1 week ago

Comparing Versions (Diff)

See exactly what changed between versions:

  1. Click Diff on any commit
  2. Choose versions to compare
  3. View changes:
    Cell 3 (Python) - Modified
    - df.plot(kind='bar', x='region', y='total')
    + df.plot(kind='pie', labels='region', values='total')
    + plt.title('Regional Distribution')

    Cell 5 (Prompt) - Added
    + Provide executive summary of regional performance

What gets diffed:

  • Notebooks: Cell-by-cell comparison
  • Pipelines: YAML configuration changes
  • Jobs: Task and schedule changes

Restoring Previous Versions

Made a mistake? Roll back easily:

  1. Find the commit you want in History
  2. Click Restore
  3. Choose:
    • Restore (with backup): Creates backup of current state first
    • Restore (direct): Immediate rollback
  4. Your asset returns to that version

Important: Restoring creates a new state, it doesn't rewrite history. Your previous versions remain available.

Remote Sync

Pushing Changes

Send your local commits to the remote repository:

  1. Make sure you have commits to push (check status)
  2. Click Push in the Git panel
  3. Select branch (usually main)
  4. Confirm push

Status indicators:

  • Synced: Local and remote match
  • Ahead by N: You have commits to push
  • Behind by N: Remote has commits to pull
  • Diverged: Both have different commits

Pulling Changes

Get the latest changes from your team:

  1. Click Pull in the Git panel
  2. If no conflicts: Changes merge automatically
  3. If conflicts: Conflict resolution UI opens

Pro tip: Pull frequently to avoid large merges.

Branch Operations

Work on features without affecting the main branch:

Create branch:

1. Click branch dropdown (⚡ main)
2. Click "New Branch"
3. Name it (feature/new-analysis)
4. Start working

Switch branches:

1. Click branch dropdown
2. Select target branch
3. Your workspace updates

Merge branches:

1. Switch to target branch (e.g., main)
2. Click "Merge"
3. Select source branch
4. Resolve any conflicts
5. Complete merge

Conflict Resolution

When you and a teammate edit the same file, Git Folders provides a visual merge interface:

┌─────────────────────────────────────────────────────┐
│ Merge Conflicts (2 files) │
├─────────────────────────────────────────────────────┤
│ │
│ 📓 notebooks/sales-analysis.qrynb │
│ ├─ Conflict: Both modified │
│ ├─ Your changes: Modified cell 3 (Python) │
│ └─ Their changes: Different chart type │
│ │
│ [Accept Mine] [Accept Theirs] [Open Editor] │
│ │
│ ───────────────────────────────────────────────── │
│ │
│ 📊 pipelines/customer-etl.yaml │
│ ├─ Conflict: Deleted remotely │
│ └─ Your version: Has uncommitted changes │
│ │
│ [Keep Mine] [Accept Deletion] [Rename & Keep] │
│ │
├─────────────────────────────────────────────────────┤
│ [Cancel] [Resolve All & Pull] │
└─────────────────────────────────────────────────────┘

Resolution options:

  • Accept Mine: Keep your version
  • Accept Theirs: Use the remote version
  • Open Editor: Manually merge changes
  • Rename & Keep: Keep both with different names

File Formats

Git Folders uses human-readable formats that work well with Git:

Notebooks (.qrynb)

version: "1.0"
kind: notebook
metadata:
name: "Sales Analysis"
description: "Weekly sales performance analysis"
tags: ["sales", "weekly"]

settings:
datasource_id: "uuid-here"
catalog: "analytics"
schema: "sales"

cells:
- id: "cell-1"
type: markdown
content: |
# Weekly Sales Analysis
This notebook analyzes sales trends...

- id: "cell-2"
type: sql
name: "sales_data"
content: |
SELECT region, SUM(revenue) as total
FROM sales
GROUP BY region

- id: "cell-3"
type: python
content: |
df = sql['sales_data']
df.plot(kind='bar', x='region', y='total')

Why YAML?

  • Human-readable diffs
  • Easy to review in PRs
  • Git-friendly (no binary blobs)
  • Portable between QRY instances

Pipelines & Jobs (.yaml)

Already YAML-based - no conversion needed. See Lakeflow documentation for format details.

Integration with Lakeflow

Git Folders integrates seamlessly with Lakeflow's folder system:

Unified hierarchy:

📁 data-platform/         (Git Folder)
├── 📊 pipelines/
│ ├── raw-ingestion.yaml
│ └── customer-etl.yaml
├── ⚙️ jobs/
│ └── daily-workflow.yaml
└── 📓 notebooks/
└── data-quality-check.qrynb

Benefits:

  • Single folder for related assets
  • Shared version history
  • Atomic commits across asset types
  • One remote repository

Integration with Notebooks

Git Folders works with the Notebook IDE:

From Notebook Editor:

  • Save: Changes tracked automatically
  • History: View notebook version history
  • Diff: Compare cell changes
  • Restore: Roll back to previous versions

In File Explorer:

  • Navigate Git Folders
  • See sync status indicators
  • Quick commit from context menu

CI/CD Integration

Automate deployments with Git webhooks and APIs.

Webhook Endpoints

Configure your Git provider to notify QRY:

POST /api/git/webhooks/{provider}

Supported events:

  • push - Deploy on merge to main
  • pull_request - Preview deployments
  • tag - Version-based releases

API Access

Programmatic control for automation:

# List folders
GET /api/git/folders

# Trigger sync
POST /api/git/folders/{id}/sync

# Get status
GET /api/git/folders/{id}/status

Production Folders

Mark folders as production for extra protection:

  • Read-only in UI: Changes only via CI/CD
  • Deployment logs: Track what deployed when
  • Rollback support: Quick revert to previous deploy

Best Practices

Folder Organization

DO:

  • ✅ Group related assets (pipeline + job + monitoring notebook)
  • ✅ Use meaningful folder names
  • ✅ Keep folders focused (not everything in one mega-folder)
  • ✅ Match team/project boundaries

DON'T:

  • ❌ One folder per asset (defeats the purpose)
  • ❌ Mix unrelated projects
  • ❌ Deeply nested hierarchies

Commit Hygiene

Good commits:

Add customer segmentation to weekly analysis

- New SQL cell for segment calculation
- Updated visualization with segment breakdown
- Added markdown documentation

Bad commits:

stuff

Branching Strategy

For most teams:

  1. main - Production-ready assets
  2. feature/* - New development
  3. Merge via pull request

For solo work:

  • Commit to main directly
  • Use branches for experiments

Sync Frequency

  • Push: After completing a logical unit of work
  • Pull: Start of each work session
  • Don't: Push every single save (too noisy)

Troubleshooting

"Sync failed"

Check:

  1. Remote credentials still valid?
  2. Network connectivity?
  3. Repository still exists?
  4. You have push permissions?

Fix:

  • Re-authenticate in folder settings
  • Check remote repository status
  • Verify your access level

"Conflict detected"

This is normal! It means you and a teammate edited the same file.

Resolution:

  1. Open conflict resolution UI
  2. Review both versions
  3. Choose which to keep (or merge manually)
  4. Complete the pull

"Large file warning"

Git isn't great with large files. If you see this:

  • Check for accidentally committed data files
  • Use .gitignore for outputs
  • Consider Git LFS for large assets

"History not showing"

Check:

  1. Is this a Git-enabled folder?
  2. Have commits been made?
  3. Are you looking at the right branch?

Security

Credential Storage

  • All credentials encrypted at rest (AES-256)
  • Per-user encryption keys
  • OAuth tokens auto-refresh
  • SSH keys stored securely

Access Control

  • Git Folders inherit workspace permissions
  • Commits include author identity
  • Full audit log of all operations
  • Service accounts for CI/CD (no user impersonation)

Data Protection

  • Sensitive variables excluded from commits
  • Credential references (not values) in configs
  • Option to exclude outputs
  • .gitignore support

Combine Features

Git Folders work great with Notebooks for version-controlled analysis, Lakeflow for pipeline versioning, and Workspaces for team collaboration.

Enterprise Features

For sparse checkout, Git LFS, and advanced CI/CD integrations, contact your QRY administrator.

QRYA product of IXEN.