TY - JOUR
T1 - MARS
T2 - A multi-level array representation for simulation data
AU - Kim, Minsoo
AU - Suh, Ilhyun
AU - Chung, Yon Dohn
N1 - Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2A2A05069318).
Funding Information:
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2A2A05069318 ). Minsoo Kim received his B.S. degree in Computer and Information Science from Korea University, Sejong, Korea, in 2015 and he is currently a Ph.D. candidate in the Department of Computer Science and Engineering at Korea University. His research interests include array database and distributed/parallel processing of large-scale data. Ilhyun Suh received his B.S. degree in Computer Science from Korea University, Seoul, Korea, in 2014, and he is currently a Ph.D. candidate in the Department of IT Convergence at Korea University. His research interests include array database, distributed/parallel processing of large-scale data and analytical processing. Yon Dohn Chung received his B.S. degree in Computer Science from Korea University, Seoul, Korea, in 1994. He received his M.S. and Ph.D. degrees in Computer Science from KAIST, Daejeon, Korea, in 1996 and 2000, respectively. He was an Assistant Professor in the Department of Computer Engineering at Dongguk University, Seoul, Korea, from 2003 to 2006. He joined the faculty of the Department of Computer Science and Engineering at Korea University, Seoul, Korea, in 2006, where he is currently a Professor. His research interests include array database, distributed/parallel processing of large-scale data, spatial databases, and data privacy.
Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2020/10
Y1 - 2020/10
N2 - In the numerical simulation domain, owing to the huge size of data and the complexity of implementing the domain specific applications, a database-centric approach for handling multidimensional simulation data is gaining considerable attention. Array databases provide an optimized set of features to support administrating multidimensional data; representing simulation data with an array can be an optimal choice. Generally, query performance on sparsely filled arrays, especially when empty cells are placed between adjacent elements, can be poor. In this context, previous studies focused on the compact representation of simulation data by reducing the number of empty cells between adjacent elements as possible. However, these methods inevitably lose the original spatial structure of elements (i.e., the relative distance and direction among elements), making it impossible to utilize the built-in multidimensional operators provided by array databases. In this paper, we propose MARS, a multi-level array representation for simulation data. MARS utilizes multiple level arrays with various resolutions to cope with the two addressed problems. In the MARS representation, elements tend to be concentrated into dense array regions, where each region is selectively stored in one of the level arrays that most reduces the empty cells between adjacent elements. Unlike existing methods, MARS retains the spatial structure of elements, and thus no additional efforts to reorganize the original spatial structure for query processing is required. We built MARS on top of SciDB and implemented a specialized command line tool for MARS. We present methods and optimized operators for query processing over MARS. We evaluate the performance of MARS using two real-world numerical simulation datasets.
AB - In the numerical simulation domain, owing to the huge size of data and the complexity of implementing the domain specific applications, a database-centric approach for handling multidimensional simulation data is gaining considerable attention. Array databases provide an optimized set of features to support administrating multidimensional data; representing simulation data with an array can be an optimal choice. Generally, query performance on sparsely filled arrays, especially when empty cells are placed between adjacent elements, can be poor. In this context, previous studies focused on the compact representation of simulation data by reducing the number of empty cells between adjacent elements as possible. However, these methods inevitably lose the original spatial structure of elements (i.e., the relative distance and direction among elements), making it impossible to utilize the built-in multidimensional operators provided by array databases. In this paper, we propose MARS, a multi-level array representation for simulation data. MARS utilizes multiple level arrays with various resolutions to cope with the two addressed problems. In the MARS representation, elements tend to be concentrated into dense array regions, where each region is selectively stored in one of the level arrays that most reduces the empty cells between adjacent elements. Unlike existing methods, MARS retains the spatial structure of elements, and thus no additional efforts to reorganize the original spatial structure for query processing is required. We built MARS on top of SciDB and implemented a specialized command line tool for MARS. We present methods and optimized operators for query processing over MARS. We evaluate the performance of MARS using two real-world numerical simulation datasets.
KW - Array databases
KW - Query processing
KW - Scientific data
UR - http://www.scopus.com/inward/record.url?scp=85076227947&partnerID=8YFLogxK
U2 - 10.1016/j.future.2019.11.010
DO - 10.1016/j.future.2019.11.010
M3 - Article
AN - SCOPUS:85076227947
VL - 111
SP - 419
EP - 434
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
SN - 0167-739X
ER -