Graphzero
Zero-copy C++ graph engine to train PyTorch GNNs with 0 RAM.
Graphzero – Zero-copy C++ graph engine for PyTorch GNN training without RAM limits
Summary: GraphZero enables training large PyTorch Graph Neural Networks by memory-mapping graph datasets directly from SSD, avoiding RAM overload. It uses a custom C++20 engine with nanobind to expose zero-copy NumPy arrays to PyTorch, allowing models up to 50GB on consumer hardware.
What it does
GraphZero compiles graph data into optimized binary formats and memory-maps them via POSIX mmap, streaming data from SSD during training. It provides raw C++ pointers as zero-copy NumPy arrays to PyTorch, while multi-threaded neighbor sampling uses OpenMP and releases the Python GIL to maximize disk I/O.
Who it's for
It targets developers and researchers working with large-scale graph neural networks who face PyTorch out-of-memory crashes on standard hardware.
Why it matters
It solves the problem of loading massive graph datasets into RAM by streaming data directly from SSD, enabling training of large GNN models without system memory crashes.