Show HN: Mandala – Automatically save, query and version Python computations

`mandala` is a framework I wrote to automate tracking ML experiments for my research. It differs from other experiment tracking tools by making persistence, query and versioning logic a generic part of the programming language itself, as opposed to an external logging tool you must learn and adapt to. The goal is to be able to write expressive computational code without thinking about persistence (like in an interactive session), and still have the full benefits of a versioned, queriable storage afterwards.

Surprisingly, it turns out that this vision can pretty much be achieved with two generic tools:

1. a memoization+versioning decorator, `@op`, which tracks inputs, outputs, code and runtime dependencies (other functions called, or global variables accessed) every time a function is called. Essentially, this makes function calls replace logging: if you want something saved, you write a function that returns it. Using (a lot of) hashing, `@op` ensures that the same version of the function is never executed twice on the same inputs.

Importantly, the decorator encourages/enforces composition. Before a call, `@op` functions wrap their inputs in special objects, `Ref`s, and return `Ref`s in turn. Furthermore, data structures can be made transparent to `@op`s, so that an `@op` can be called on a list of outputs of other `@op`s, or on an element of the output of another `@op`. This creates an expressive "web" of `@op` calls over time.

2. a data structure, `ComputationFrame`, can automatically organize any such web of `@op` calls into a high-level view, by grouping calls with a similar role into "operations", and their inputs/outputs into "variables". It can detect "imperative" patterns - like feedback loops, branching/merging, and grouping multiple results in a single object - and surface them in the graph.

`ComputationFrame`s are a "synthesis" of computation graphs and relational databases, and can be automatically "exported" as dataframes, where columns are variables and operations in the graph, and rows contain values and calls for (possibly partial) executions of the graph. The upshot is that you can query the relationships between any variables in a project in one line, even in the presence of very heterogeneous patterns in the graph.

I'm very excited about this project - which is still in an alpha version being actively developed - and especially about the `ComputationFrame` data structure. I'd love to hear the feedback of the HN community.

Colab quickstart: https://colab.research.google.com/github/amakelov/mandala/bl...

Blog post introducing `ComputationFrame`s (can be opened in Colab too): https://amakelov.github.io/mandala/blog/01_cf/

Docs: https://amakelov.github.io/mandala/


Comments URL: https://news.ycombinator.com/item?id=40940181

Points: 34

# Comments: 13

https://github.com/amakelov/mandala

Vytvořeno 22d | 12. 7. 2024 2:50:24


Chcete-li přidat komentář, přihlaste se