TY - GEN
T1 - Task scheduling using a block dependency DAG for block-oriented sparse Cholesky factorization
AU - Lee, Heejo
AU - Kim, Jong
AU - Hong, Sung Je
AU - Lee, Sunggu
N1 - Funding Information:
This research was supported in part by the Ministry of Education of Korea through its BK21 program toward the Electrical and Computer Engineering Division at POSTECH.
PY - 2000
Y1 - 2000
N2 - Block-oriented sparse Cholesky factorization decomposes a sparse matrix into rectangular sub-blocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributed-memory multiprocessor system than the customary column-oriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributed-memory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consists of two stages: early-start clustering, and affined cluster mapping. The early-start clustering stage is used to cluster tasks with preserving the earliest start time of a task without limiting parallelism. After task clustering, the affined cluster mapping stage allocates clusters to processors considering both communication cost and load balance. Experimental results on the Fujitsu parallel system show that the proposed task scheduling approach outperforms other processor mapping methods.
AB - Block-oriented sparse Cholesky factorization decomposes a sparse matrix into rectangular sub-blocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributed-memory multiprocessor system than the customary column-oriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributed-memory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consists of two stages: early-start clustering, and affined cluster mapping. The early-start clustering stage is used to cluster tasks with preserving the earliest start time of a task without limiting parallelism. After task clustering, the affined cluster mapping stage allocates clusters to processors considering both communication cost and load balance. Experimental results on the Fujitsu parallel system show that the proposed task scheduling approach outperforms other processor mapping methods.
KW - Block-oriented Cholesky factorization
KW - Directed acyclic graph
KW - Parallel sparse matrix factorization
KW - Task scheduling
UR - http://www.scopus.com/inward/record.url?scp=0012650251&partnerID=8YFLogxK
U2 - 10.1145/338407.338535
DO - 10.1145/338407.338535
M3 - Conference contribution
AN - SCOPUS:0012650251
SN - 1581132409
SN - 9781581132403
T3 - Proceedings of the ACM Symposium on Applied Computing
SP - 641
EP - 648
BT - Proceedings of the 2000 ACM Symposium on Applied Computing, SAC 2000
T2 - 2000 ACM Symposium on Applied Computing, SAC 2000
Y2 - 19 March 2000 through 21 March 2000
ER -