Introducing GraphStorm 0.3: Native Support for Multi-Task Learning, New APIs, and Benchmark Study
GraphStorm 0.3: Advancing Enterprise Graph Machine Learning
GraphStorm is a cutting-edge low-code enterprise graph machine learning (GML) framework that enables users to build, train, and deploy graph ML solutions on complex enterprise-scale graphs in a fraction of the time it would typically take. With GraphStorm, users can harness the power of graph relationships and interactions to drive solutions for various scenarios, such as fraud detection, recommendations, community detection, and search/retrieval problems.
Today, we are thrilled to announce the launch of GraphStorm 0.3, which introduces exciting new features to enhance the capabilities of the framework. One of the key highlights of this release is the native support for multi-task learning on graphs. This feature enables users to define multiple training targets on different nodes and edges within a single training loop, opening up possibilities for more sophisticated and comprehensive graph ML solutions.
In addition to multi-task learning, GraphStorm 0.3 also introduces new APIs that allow users to customize GraphStorm pipelines and components with ease. With just 12 lines of code, users can implement a custom node classification training loop, streamlining the process of tailoring solutions to specific requirements. To help users get started with the new APIs, we have provided Jupyter notebook examples for node classification and link prediction tasks in our documentation.
Furthermore, we have released a comprehensive study on co-training language models (LM) and graph neural networks (GNN) for large graphs with rich text features using the Microsoft Academic Graph (MAG) dataset. This study demonstrates the performance and scalability of GraphStorm on text-rich graphs and highlights best practices for configuring GML training loops to achieve optimal performance and efficiency.
Many enterprise applications require graph data associated with multiple tasks on different nodes and edges. GraphStorm 0.3 addresses this need by supporting multi-task learning on graphs across a range of common tasks, including node classification, node regression, edge classification, edge regression, link prediction, and node feature reconstruction. Users can specify training targets through a YAML configuration file, enabling them to model complex applications with ease.
To showcase the scalability of GraphStorm, we benchmarked the framework on large synthetic graphs with billions of edges, demonstrating its ability to handle massive graph datasets efficiently. The results highlight the remarkable performance of GraphStorm in processing and training on graphs of varying sizes, showcasing its versatility in handling large-scale graph ML challenges.
In conclusion, GraphStorm 0.3 is a significant milestone in the evolution of enterprise graph machine learning, offering users enhanced capabilities and flexibility to tackle complex graph ML tasks with efficiency and ease. For more information on GraphStorm and to get started with the framework, visit the GraphStorm GitHub repository and documentation.
About the Authors:
– Xiang Song: Senior Applied Scientist at AWS AI Research and Education (AIRE)
– Jian Zhang: Senior Applied Scientist specializing in machine learning techniques
– Florian Saupe: Principal Technical Product Manager at AWS AI/ML research
Stay tuned for more updates and advancements in graph machine learning from the GraphStorm team. Exciting developments are on the horizon as we continue to push the boundaries of what is possible with enterprise-scale graph ML solutions.