Feng Yao 姚烽
Logo PhD student @ Northeastern University

I am a Ph.D. student in Computer Science at Northeastern University (China), supervised by Prof. Yanfeng Zhang, and a member of the iDC-NEU research group.

I’m interested in building distributed and parallel graph processing systems, heterogeneous graph data management. I am also interested in vector database.


Education
  • Northeastern University
    Northeastern University
    Ph.D. Student
    Sep. 2021 - present
  • Northeastern University
    Northeastern University
    M.S. in Computer Science
    Sep. 2018 - Jul. 2021
  • Changchun University of Science and Technology
    Changchun University of Science and Technology
    B.S. in Computer Science
    Sep. 2014 - Jul. 2018
Experience
  • Tongyi Lab
    Tongyi Lab
    Research Intern
    Mar. 2022 - Mar. 2024
Honors & Awards
  • Southern Manganese Industries Scholarship
    2024
  • Huawei Scholarship
    2023
Selected Publications (view all )
GeoLayer: Towards Low-Latency and Cost-Efficient Geo-Distributed Graph Stores with Layered Graph
GeoLayer: Towards Low-Latency and Cost-Efficient Geo-Distributed Graph Stores with Layered Graph

Feng Yao, Xiaokang Yang, Shufeng Gong, Song Yu, Yanfeng Zhang, Ge Yu

IEEE International Conference on Data Engineering (ICDE) 2026

In this paper, we propose GeoLayer, a geodistributed graph storage framework that jointly optimizes graph replica placement and pattern request routing. We first construct a latency-aware layered graph architecture that decomposes the graph topology into multiple layers, aiming to reduce the decision space and computational complexity of the optimization problem, while mitigating the impact of network heterogeneity in geo-distributed environments. Building on the layered graph, we introduce an overlap-centric replica placement scheme to accommodate the diversity of graph pattern accesses, along with a directed heat diffusion model that captures heat conduction and superposition effects to guide data allocation. For request routing, we develop a stepwise layered routing strategy that performs progressive expansion over the layered graph to efficiently retrieve the required data.

GeoLayer: Towards Low-Latency and Cost-Efficient Geo-Distributed Graph Stores with Layered Graph

Feng Yao, Xiaokang Yang, Shufeng Gong, Song Yu, Yanfeng Zhang, Ge Yu

IEEE International Conference on Data Engineering (ICDE) 2026

In this paper, we propose GeoLayer, a geodistributed graph storage framework that jointly optimizes graph replica placement and pattern request routing. We first construct a latency-aware layered graph architecture that decomposes the graph topology into multiple layers, aiming to reduce the decision space and computational complexity of the optimization problem, while mitigating the impact of network heterogeneity in geo-distributed environments. Building on the layered graph, we introduce an overlap-centric replica placement scheme to accommodate the diversity of graph pattern accesses, along with a directed heat diffusion model that captures heat conduction and superposition effects to guide data allocation. For request routing, we develop a stepwise layered routing strategy that performs progressive expansion over the layered graph to efficiently retrieve the required data.

GETL: An Extract-Transform-Load Framework Across Graph Models in Graph Warehouse
GETL: An Extract-Transform-Load Framework Across Graph Models in Graph Warehouse

Feng Yao, Xiaokang Yang, Shufeng Gong, Qian Tao, Yanfeng Zhang, Wenyuan Yu, Ge Yu

IEEE Transactions on Knowledge and Data Engineering (TKDE) 2026

In this paper, we propose GETL, a generalized graph ETL framework capable of automatically identifying graph model schemas and performing seamless data conversion among RDF, RDF-star, labeled property graph, and the relational model. This is attributed to GETL’s unified graph representation model, constructed as nested pairs, offering powerful capabilities in graph representation and model compatibility. Additionally, we develop a unified programming interface to support complex graph transformation tasks. It is built upon the Gremlin syntax and provides strong expressive capabilities. Finally, our evaluation demonstrates that GETL outperforms state-of-the-art solutions in terms of model conversion efficiency and data manipulation language (DML) intelligibility.

GETL: An Extract-Transform-Load Framework Across Graph Models in Graph Warehouse

Feng Yao, Xiaokang Yang, Shufeng Gong, Qian Tao, Yanfeng Zhang, Wenyuan Yu, Ge Yu

IEEE Transactions on Knowledge and Data Engineering (TKDE) 2026

In this paper, we propose GETL, a generalized graph ETL framework capable of automatically identifying graph model schemas and performing seamless data conversion among RDF, RDF-star, labeled property graph, and the relational model. This is attributed to GETL’s unified graph representation model, constructed as nested pairs, offering powerful capabilities in graph representation and model compatibility. Additionally, we develop a unified programming interface to support complex graph transformation tasks. It is built upon the Gremlin syntax and provides strong expressive capabilities. Finally, our evaluation demonstrates that GETL outperforms state-of-the-art solutions in terms of model conversion efficiency and data manipulation language (DML) intelligibility.

GoGraph: Accelerating Graph Processing through Incremental Reordering
GoGraph: Accelerating Graph Processing through Incremental Reordering

Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

IEEE Transactions on Knowledge and Data Engineering (TKDE) 2025

In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order.

GoGraph: Accelerating Graph Processing through Incremental Reordering

Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

IEEE Transactions on Knowledge and Data Engineering (TKDE) 2025

In this work, we first establish a correlation between vertex processing order and the number of iterations, providing an opportunity to reduce the number of iterations. We propose a metric function to evaluate the effectiveness of vertex processing order in accelerating iterative computations. Leveraging this metric, we propose a novel graph reordering method, GoGraph, which constructs an efficient vertex processing order. Additionally, for evolving graphs, we further propose a metric function designed to evaluate the effectiveness of vertex processing orders in response to graph changes and provide three optional methods for dynamically adjusting the vertex processing order.

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing
GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li, Qian Tao, Song Yu, Shufeng Gong, Yanfeng Zhang, Feng Yao, Wenyuan Yu, Ge Yu, Jingren Zhou

Proceedings of the International Conference on Vary Large Data Bases (VLDB) 2025

Existing disaggregated databases simply couple CC either with the execution layer or the storage layer, which limits the performance and elasticity of these systems. This paper proposes Concurrency Control as a Service (CCaaS), which decouples CC from databases, building an execution-CC-storage three-layer decoupled database, allowing independent scaling and upgrades for improved elasticity, resource utilization, and development agility.

GastCoCo: Graph Storage and Coroutine-Based Prefetch Co-Design for Dynamic Graph Processing

Hongfu Li, Qian Tao, Song Yu, Shufeng Gong, Yanfeng Zhang, Feng Yao, Wenyuan Yu, Ge Yu, Jingren Zhou

Proceedings of the International Conference on Vary Large Data Bases (VLDB) 2025

Existing disaggregated databases simply couple CC either with the execution layer or the storage layer, which limits the performance and elasticity of these systems. This paper proposes Concurrency Control as a Service (CCaaS), which decouples CC from databases, building an execution-CC-storage three-layer decoupled database, allowing independent scaling and upgrades for improved elasticity, resource utilization, and development agility.

RAGraph: A Region-Aware Framework for Geo-Distributed Graph Processing
RAGraph: A Region-Aware Framework for Geo-Distributed Graph Processing

Feng Yao, Qian Tao, Wenyuan Yu, Yanfeng Zhang, Shufeng Gong, Qiange Wang, Ge Yu, Jingren Zhou

Proceedings of the VLDB Endowment (PVLDB) 2024

In this paper, we propose RAGraph, a Region-Aware framework for geo-distributed graph processing. At the core of RAGraph, we design a region-aware graph processing framework that allows advancing inefficient global updates locally and enables sensible coordination-free message interactions. RAGraph also contains an adaptive hierarchical message interaction engine to switch interaction modes adaptively based on network heterogeneity and fluctuation, and a discrepancy-aware message filtering strategy to filter important messages.

RAGraph: A Region-Aware Framework for Geo-Distributed Graph Processing

Feng Yao, Qian Tao, Wenyuan Yu, Yanfeng Zhang, Shufeng Gong, Qiange Wang, Ge Yu, Jingren Zhou

Proceedings of the VLDB Endowment (PVLDB) 2024

In this paper, we propose RAGraph, a Region-Aware framework for geo-distributed graph processing. At the core of RAGraph, we design a region-aware graph processing framework that allows advancing inefficient global updates locally and enables sensible coordination-free message interactions. RAGraph also contains an adaptive hierarchical message interaction engine to switch interaction modes adaptively based on network heterogeneity and fluctuation, and a discrepancy-aware message filtering strategy to filter important messages.

Towards Efficient Graph Processing in Geo-Distributed Data Centers
Towards Efficient Graph Processing in Geo-Distributed Data Centers

Feng Yao, Qian Tao, Shengyuan Lin, Yanfeng Zhang, Wenyuan Yu, Shufeng Gong, Qiange Wang, Ge Yu, Jingren Zhou

IEEE Transactions on Parallel and Distributed Systems (TPDS) 2024

This work investigates the problem of data placement for graph processing in geo-distributed data centers. The key idea is to migrate boundary vertices with relatively low contributions to algorithm convergence, thereby enabling the relocated boundary vertices to generate and propagate more influential messages and improving the utilization of scarce network resources. Specifically:(1) We introduce a vertex contribution metric to quantify a vertex’s ability to generate and propagate influential messages, which reflects its contribution to algorithm convergence; (2) We propose a contribution-driven boundary migration algorithm that incorporates both contribution metrics and network heterogeneity, enabling the efficient identification and migration of high-contribution vertices near boundaries; (3) Experimental results demonstrate that our algorithm achieves 1.23× to 2.7× performance improvement and reduces WAN costs by 14.7% to 49.4% in geo-distributed graph processing systems.

Towards Efficient Graph Processing in Geo-Distributed Data Centers

Feng Yao, Qian Tao, Shengyuan Lin, Yanfeng Zhang, Wenyuan Yu, Shufeng Gong, Qiange Wang, Ge Yu, Jingren Zhou

IEEE Transactions on Parallel and Distributed Systems (TPDS) 2024

This work investigates the problem of data placement for graph processing in geo-distributed data centers. The key idea is to migrate boundary vertices with relatively low contributions to algorithm convergence, thereby enabling the relocated boundary vertices to generate and propagate more influential messages and improving the utilization of scarce network resources. Specifically:(1) We introduce a vertex contribution metric to quantify a vertex’s ability to generate and propagate influential messages, which reflects its contribution to algorithm convergence; (2) We propose a contribution-driven boundary migration algorithm that incorporates both contribution metrics and network heterogeneity, enabling the efficient identification and migration of high-contribution vertices near boundaries; (3) Experimental results demonstrate that our algorithm achieves 1.23× to 2.7× performance improvement and reduces WAN costs by 14.7% to 49.4% in geo-distributed graph processing systems.

Fast Iterative Graph Computing with Updated Neighbor States
Fast Iterative Graph Computing with Updated Neighbor States

Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

IEEE International Conference on Data Engineering (ICDE) 2024

In this paper, we propose a graph reordering method, GoGraph, which can construct a well-formed vertex processing order effectively reducing the number of iteration rounds and, consequently, accelerating iterative computation. Before delving into GoGraph, a metric function is introduced to quantify the efficiency of vertex processing order in accelerating iterative computation. This metric reflects the quality of the processing order by counting the number of edges whose source precedes the destination. GoGraph employs a divide-and-conquer mindset to establish the vertex processing order by maximizing the value of the metric function.

Fast Iterative Graph Computing with Updated Neighbor States

Yijie Zhou, Shufeng Gong, Feng Yao, Hanzhang Chen, Song Yu, Pengxi Liu, Yanfeng Zhang, Ge Yu, Jeffrey Xu Yu

IEEE International Conference on Data Engineering (ICDE) 2024

In this paper, we propose a graph reordering method, GoGraph, which can construct a well-formed vertex processing order effectively reducing the number of iteration rounds and, consequently, accelerating iterative computation. Before delving into GoGraph, a metric function is introduced to quantify the efficiency of vertex processing order in accelerating iterative computation. This metric reflects the quality of the processing order by counting the number of edges whose source precedes the destination. GoGraph employs a divide-and-conquer mindset to establish the vertex processing order by maximizing the value of the metric function.

All publications