Efficient handling of lots of simulation data files

  • Roman Diviš ,
  • Zdeněk Novotný
  • University of Pardubice, Studentská 95, Pardubice, 532 10, Czech Republic 
  • University of Pardubice, Studentská 95, Pardubice, 532 10, Czech Republic 
Cite as
 Divis R., and Novotny Z. (2022).,Efficient handling of lots of simulation data files. Proceedings of the 34th European Modeling & Simulation Symposium (EMSS 2022). , 043 . DOI: https://doi.org/10.46354/i3m.2022.emss.043

Abstract

Saving information to files is the most basic and simplest way to store data, so it is often used in simple simulators and simulation tools as the first choice for logging information about the simulation process and its results. Computer simulations often involve simulating a significant number of replications and accumulating large numbers of files. Today's filesystems are still not capable of efficiently storing and processing millions of files. This paper presents alternatives that allow for more efficient storage, transfer, and analysis of data, with an emphasis on easy migration or implementation from the initial data files. Simple approaches such as using TAR or ZIP archives to sophisticated approaches involving Parquet file, S3-like object storage (e.g., MinIO, OpenIO) will be compared.

References

  1. Alagiannis, I., Borovica-Gajic, R., Branco, M., Idreos, S., Ailamaki, A. (2015). NoDB: Efficient Query Execution on Raw Data Files. J. Commun. ACM,12:112-121.
  2. Buyl, P., Colberg, P. H., Hofling, F. (2014). H5MD: A structured, efficient, and portable file format for molecular data. J. Computer Physics Communications, 6:1546-1553.
  3. Diviš R., Kavička A, (2015). Design and development of a mesoscopic simulator specialized in investigating capacities of railway nodes. Proceedings of the European Modeling and Simulation Symposium, 52-57.
  4. Ng, M. H., Johnston, S., Murdock, S., Wu, B., Tai, K., Fangohr, H., Cox, S., Essex, J. W., Sansom, M., Jeffreys, P. (2004) Efficient data storage and analysis for generic biomolecular simulation data. 3rd UK e-Science Programme All Hands Meeting (AHM 2004), Nottingham, UK, 443-450.
  5. Ovsiannikov, M., Rus, S., Reeves, D., Sutter, P., Rao, S., Kelly, J. (2013). The Quantcast File System. Proc. VLDB Endow, 11:1092-1101.
  6. So, G. (2016). Comparing Performance of Parquet and JSON Files for a Particle Swarm Optimization Algorithm on Apache Spark. Ms. Thesis California State University, Northridge.