Usefulness of HIGH-LEVEL parallel compositions in genomics

  • Mario Rossainz-López 
  • Sarahí Zúñiga-Herrera 
  • Iván Olmos Pineda 
  • Ivo Pineda-Torres 
  • Manuel Capel-Tuñón 
  • a,b,c,d Faculty of Computer Science, Autonomous University of Puebla, San Claudio Avenue and South 14th Street, San Manuel, Puebla, Puebla, 72000, México
  • Software Engineering Department, College of Informatics and Telecommunications ETSIIT, University of Granada, Daniel Saucedo Aranda s/n, Granada 18071, Spain
Cite as
Rossainz-López M., Zúñiga-Herrera S., Olmos Pineda I., Pineda-Torres I., Capel-Tuñón M. (2020). Usefulness of HIGH-LEVEL parallel compositions in genomics. Proceedings of the 32nd European Modeling & Simulation Symposium (EMSS 2020), pp. 1-9. DOI: https://doi.org/10.46354/i3m.2020.emss.001

Abstract

This work shows the use of parallel objects to build High Level Parallel Compositions or HLPC and their usefulness in genomics through four case studies related to sequencing DNA chains. The first two case studies are combinatorial optimization problems: grouping fragments of DNA sequences and the parallel exhaustive search (PES) of RNA strings that help the sequence and assembly of DNAs in the construction of gnomes. The third case study shows the implementation of a Convolutional Neuronal Network as a Parallel Object Composition to solve the problem of the recognition of DNA sequences from a database with 4 types of hepatitis C virus (type 1, 2, 3 and 6). The results of this classification were obtained in terms of percentages of training precision and validation precision. The fourth and final case study shows the problem of sequence typing (STP) as a form of DNA sequence classification. It is particularized in a proposal for a parallel solution to find conserved regions of sequences that help discriminate between different types of hepatitis C virus, through the creation of a decision tree using HLPC. We show the algorithms that solves these problems using modeling and parallel simulation, their design and implementation as HLPC and the performance metrics in their parallel execution using multicores, video accelerator card and CPU-SET or processors with shared-distributed memory.

References

  1. Aldinucci, M., Danelutto, M., Kilpatrick, P. and Torquati, M. 2014. FastFlow: high-level and
    efficient streaming on multi-core. Programming Multi-core and Many-core Computing Systems, Wiley
  2. Barrientos Martínez R.E., Cruz Ramírez N., Acosta Meza H.G., et-al, 2009. Árboles de Decisión como herramienta en el diagnóstico médico, Revista Médica de la Universidad Veracruzana, Volumen 9, Número 2, Veracruz, México.  
  3. Calvo D. 2015. Red Neuronal Convolucional (CNN). Data Scientist. http://www.diegocalvo.es/red-neuronalconvolucional/
  4. Capel, M., & Troya, J. M., 1994. An Object-Based Tool and Methodological Approach for Distributed 
  5. Collins A.J. 2011. Automatically Optimising Parallel Skeletons. MSc thesis in Computer Science, School of Informatics University of Edinburgh, UK.
  6. Corradi A, Leonardo L, Zambonelli F., 1995. Experiences toward an Object-Oriented Approach to Structured Parallel Programming. DEIS technical report no. DEIS-LIA-95-007.
  7. Danelutto M. and Torquati M, 2014. Loop parallelism: a new skeleton perspective on data parallel patterns. Parallel Distributed and Network-based Processing, Torino, Italy
  8. DanishAli S. and Farooqui 2013. Approximate Multiple Pattern String Matching using Bit Parallelism. International Journal of Computer Applications, Volume 74, No.19, pp. 47–51.
  9. Ernsting S. and Kuchen H. 2012. Algorithmic skeletons for multi-core, multi-GPU systems and clusters, Int. J. of High-Performance Computing and Networking, Vol. 7, No. 2, pp.129–138.
  10. Fujimoto R.M. 2000. Parallel and Distributed Simulation Systems. Wiley, Hoboken (2000).
  11. Hansen B., 1993. Model Programs for Computational Science: A programming methodology for multicomputers. Concurrency (Chichester, England), 5(5).
  12. Lee R.C.T., Tseng S.S., Chang R.C., Tsai Y.T., 2007. Introducción al diseño y análisis de algoritmos, un enfoque estratégico. Mc Graw Hill.
  13. Levitin A., 2003. The Design of Analysis of Algorithms. Wesley
  14. Marcelo A., Apolloni J., Kavka C., et-al 2000. Entrenamiento de Redes Neuronales. Universidad
    Nacional de San Luís. WICC 2000. Argentina.
  15. Marturet R., Alferez E.S., 2018. Evaluación de Redes Neuronales Convolucionales para la clasificación de imágenes histológicas de cancer color rectar mediante transferencia de aprendizaje. Master en Bioinformática y Bioestadística. Universitat Oberta de Catalunya. España
  16. Masoudi-Nejad, A., Narimani, Z. and Hosseinkhan, N. 2013. Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology. 
  17. Myoupo, J.F. and Tchendji, V.K. (2014). Parallel dynamic programming for solving the optimal
    search binary tree problem on CGM, International Journal of High Performance Computing and
    Networking, Vol. 7, No. 4, pp.269–280.
  18. Pareek C., Smoczynski R. and Tretyn A. 2011. Sequencing technologies and genome sequencing, Journal of Applied Genetics, Vol. 25, No. 4, pp.41–3435.
  19. Pearson W.R., Lipman D.J., 1988. Improved tools for biological sequence comparison. In Proceedings of the National Academy of Sciences of the United States of America 85,
    http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
  20. Peña A.J., Claver J.M., Sanjuan A., Arnau V. (2014). Análisis Paralelo de Secuencias ADN mediante el uso de GPU y CUDA, ResearchGate. https://www.researchgate.net/publication/228857228.
  21. Rossainz, M., 2005. Una Metodología de Programación Basada en Composiciones Paralelas de Alto Nivel (HLPCs). Universidad de Granada, PhD dissertation, 02/25/2005.
  22. Rossainz, M., Capel M., 2008. A Parallel Programming Methodology using Communication Patterns named CPANS or Composition of Parallel Object. 20TH European Modeling & Simulation Symposium.Campora S. Giovanni. Italy.
  23. Rossainz-López Mario, Capel-Tuñón Manuel, PinedaTorres Ivo, Olmos-Pineda Ivan, Olvera-López Arturo, 2018. Use of Parallel Patterns of Communication between Processes for search of
    Sequences DNA and RNAi Strings. Research in Computing Science: Applications of Language &
    Knowledge Engineering. Volume 148, Number 3, ISSN: 1870-4069. México.
  24. Roosta, S., 1999. Parallel Processing and Parallel Algorithms. In Theory and Computation. Springer.
  25. Sanjuan A., Arnau V., Claver J.M. 2008. Análisis Paralelo de Secuencias ADN sobre computadores con multiples cores. Actas de las XIX Jornadas de Paralelismo, Castellón, España.
  26. Steuwer M., Kegel P. and Gorlatch S. 2011. SkelCL a portable skeleton library for high-level GPU
    programming. Proceedings of the 16th IEEE Workshop on High -Level Parallel Programmin
    Models and Supportive Environments, May, Anchorage, AK, USA.
  27. Torquati, M., Aldinucci, M. and Danelutto, M. (2015) FastFlow Testimonials, Computer Science
    Department, University of Pisa, Italy
  28. Wilkinson, B., Allen, M. 2000. Parallel Programming. Techniques and Applications Using Networked Workstations and Parallel Computers, Prentice Hall.
  29. Wood V., Gwilliam R., Rajandream M.A., et al. 2003. The genome sequence of Schizosaccharomyces pombe. Nature.
  30. Yang X.Y., Ripoll A., Marin I., Luque E. 2008. Genomic-scale analysis of DNA Words of Arbitrary Length by Parallel Computation. NIC Series, Vol. 33, 623–630.