TY - GEN
T1 - Construction and performance characterization of parallel interior point solver on 4-way intel Itanium 2 multiprocessor system
AU - Koka, P.
AU - Suh, T.
AU - Smelyanskiy, M.
AU - Grzeszczuk, R.
AU - Dulong, C.
PY - 2004
Y1 - 2004
N2 - In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium® 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune ™ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.
AB - In recent years the interior point method (IPM) has became a dominant choice for solving large convex optimization problems for many scientific, engineering and commercial applications. Two reasons for the success of the IPM are its good scalability on existing multiprocessor systems with a small number of processors and its potential to deliver a scalable performance on systems with a large number of processors. The scalability of a parallel IPM is determined by several key issues such as exploiting parallelism due to sparsity of the problem, reducing communication overhead and proper load balancing. In this paper we present an implementation of a parallel linear programming IPM workload and characterize its scalability on a 4-way Itanium® 2 system. We show a speedup of up to 3-times for some of the datasets. We also present a detailed micro-architectural analysis of the workload using VTune ™ performance analyzer. Our results suggest that a good IPM implementation is latency-bound. Based on these findings, we make suggestions on how to improve the performance of the IPM workload in the future.
UR - http://www.scopus.com/inward/record.url?scp=19644379079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19644379079&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:19644379079
SN - 0780388283
SN - 9780780388284
T3 - Proceedings of the 2004 7th Annual IEEE International Workshop on Workload Characterization, WWC-7
SP - 73
EP - 80
BT - Proceedings of the 2004 7th Annual IEEE International Workshop on Workload Characterization, WWC-7
T2 - Proceedings of the 2004 7th Annual IEEE International Workshop on Workload Characterization, WWC-7
Y2 - 25 October 2004 through 25 October 2004
ER -