Git Folders
A Git Folder in QRY is a folder of notebooks (and Lakeflow pipelines and jobs) backed by a real Git repository. You can push your work to GitHub / GitLab / Bitbucket, pull updates from teammates, and see a real diff between versions — without copying YAML around or losing inline cell history.
The implementation uses pygit2 with a custom PostgreSQL ODB backend: there's no filesystem dependency, so it works in Kubernetes without persistent volumes. Notebooks serialise to .qrynb, pipelines and jobs to .yaml with literal-block style for readable SQL diffs.
Goal
You finish this page with a Git Folder linked to a real remote, your notebook pushed, and a clear sense of how the diff/conflict workflow looks.
Prerequisites
- A repository on GitHub, GitLab, or Bitbucket you can push to.
- An OAuth integration enabled for your tenant or a personal access token (PAT) with repo scope.
- A notebook (or pipeline / job) you want to track.
Steps
1. Create a Git Folder
In the Notebooks IDE, open the Source Control panel (bottom-left of the Explorer). Click Connect repository and pick your provider.
- OAuth — opens the provider's auth flow in a popup; grant access to the repo you want.
- PAT — paste a personal access token. The token is stored Fernet-encrypted under your tenant's
JWT_SECRET_KEY.
Pick the repo and the branch (or create a new one). QRY creates a Git Folder mirroring the repo's notebook directory.
2. Add notebooks to the folder
Move existing notebooks into the Git Folder via drag-drop in the Explorer, or create new ones inside it. New notebooks are immediately tracked.
3. Stage and commit
In the Source Control panel you'll see modified notebooks. Click the + next to each to stage, write a commit message, and click Commit.
The diff view shows changed cells with cell-level granularity — added cells in green, removed in red, modified showing a side-by-side cell diff.
4. Push and pull
Push uploads your commits to the remote. Pull fetches the remote and merges.
When a teammate pushes a different version of a notebook you also edited, pull surfaces a cell-level conflict resolver: you pick yours, theirs, or both per conflicting cell, instead of resolving line-by-line text conflicts in raw .qrynb JSON.
What gets serialised
| Asset | Format | Why |
|---|---|---|
| Notebooks | .qrynb (JSON) | Cell-aware structure, model-per-cell, captured outputs |
| Pipelines | .yaml | Lakeflow pipeline DSL, literal-block style for SQL readability |
| Jobs | .yaml | Lakeflow job DAG; same readability rules |
Captured outputs (chart PNGs, table samples) are included by default — they make the notebook diffable in the QRY UI on the receiver's side. If you don't want outputs in the repo, exclude them per-folder in the Git Folder settings.
What does NOT go into Git
- Datasource bindings — these are environment-specific. A notebook pulled into a different tenant has to be re-bound.
- Workspace assignment — same reasoning.
- Run history of scheduled notebooks — execution logs live in the scheduled-tasks system, not Git.
- Memory entries and domain context — separate stores, not file-based.
Common issues
Push fails with 403 Forbidden.
Your token / OAuth scope doesn't include write access. Re-authorise with repo scope (GitHub) or equivalent.
Pull surfaces a cell-conflict every commit. Two people are editing the same cell. The cell-level resolver helps, but better: split the notebook so people work in different cells.
A .qrynb file looks unreadable in GitHub's web view.
It's JSON. The QRY diff view is the clean way to look at notebook changes; the raw file is for the machine.
OAuth fails with redirect_uri_mismatch.
Your tenant's Git OAuth client isn't configured with this tenant's URL as a callback. Ask an admin to update the OAuth app on the provider side.
Cell outputs balloon the repo size. Disable output serialisation in Git Folder settings, or strip outputs in a periodic cleanup commit.
See also
- Creating a notebook — make the notebook before tracking it.
- Cell types and models — what's preserved in
.qrynb. - Git Folders reference — full feature reference, including the PostgreSQL ODB implementation.