Data-based herbs contamination prediction and harvest reccomendation

  • Stefan Anlauf  ,
  • Andreas Haghofer, 
  • Karl Dirnberger, 
  • Stephan Winkler
  • a,b,d FFoQSI GmbH, Technopark 1C, 3430, Tulln, Austria
  • a,b,d University of Applied Sciences Upper Austria, Bioinformatics, Softwarepark 11, 4232 Hagenberg, Austria
  • c Österreichische Bergkräutergenossenschaft, Thierberg 1, 4192 Hirschbach, Austria
  • d Johannes Keppler Universität, Computer Science, Altenberger Straße 69, 4040 Linz, Austria
Cite as
Anlauf S., Haghofer A., Affenzeller N., Winkler S. (2020). Data-based herbs contamination prediction and harvest reccomendation. Proceedings of the 6th International Food Operations and Processing Simulation Workshop (FoodOPS 2020), pp. 1-6. DOI: https://doi.org/10.46354/i3m.2020.foodops.001
 Download PDF

Abstract

The quality of freshly harvested herbs is heavily influenced by multiple factors, namely weather conditions, harvesting, transport, drying, storage, and many more. Our main goal here is to identify models that are able to predict spore contaminations on different types of herbs on the basis of these factors as well as to find optimal processing parameters, which shall lead to lower contaminations of herbs as well as lower costs for contamination prevention represents. The here presented workflow utilizes two different approaches, which in combination shall lead to a reliable contamination prediction and prevention mechanism. For the prediction part we learn ensembles of machine learning models using the processing parameters as features to predict the risk for spore contamination a priori of labor analysis data. Using tree-based modelling algorithms we already achieved a spore contamination prediction accuracy of 86.21% for the herb nettle. In Addition to that, we use descriptive statistics to provide information on the relevant parameters which could be responsible for the occurred contamination. Here we already achieve a p-value smaller than 0.01 for a few processing parameters. In the future we want to expand this workflow by improving the modelling process using different modelling algorithms. Additionally, we are working on an online life system, which combine these two methods, to not only present a farmer the information whether a contamination is probably, but also provide him the information which processing parameters lead to a contamination and how they should be affected to lower the risk.

References

  1. Sivakumar, Mannava V K and Motha, Raymond P. (2007). Managing Weather and Climate Risks in Agriculture. ISBN 978- 3540727446.
  2. Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica (Ljubljana)
  3. Streiner, David L. (2002). The case of the missing data: Methods of dealing with dropouts and other research vagaries. Canadian Journal of Psychiatry
  4. Acock, Alan C. (2005). Working with missing values. Journal of Marriage and Family pp. 1012-1028
  5. Hall, Ma (1999). Correlation-based feature selection for machine learning. Diss. The University of Waikato. ISBN 978-0874216561
  6. Dietterich, Thomas G. (2000) Ensemble methods in machine learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). ISBN 354-0677046
  7. Webb, Geoffrey I. and Sammut, Claude and Perlich, Claudia and Horvath, Tamas and Wrobel, Stefan and Korb, Kevin B. and Noble, William Stafford and Leslie, Christina and Lagoudakis, Michail G. and Quadrianto, Novi and Buntine, Wray L. and Quadrianto, Novi and Buntine, Wray L. and Getoor, Lise and Namata, Galileo and Getoor, Lise and Han, Xin Jin, Jiawei and Ting, Jo-Anne and Vijayakumar, Sethu and Schaal, Stefan and Raedt, Luc De. (2011). Leave-One-Out Cross-Validation. Encyclopedia of Machine Learning. ISBN 978-0387307688 pp. 600 - 601
  8. Chen, Tianqi and Guestrin, Carlos (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
  9. Breiman, Leo (2001). Random forests. Machine Learning pp. 5-32.
  10. Pearson, Karl (1992). On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling. Breakthroughs in Statistics: Methodology and Distribution pp. 11-28
  11. Gooch, Jan W. (2011). Mann-Whitney U Test. Encyclopedic Dictionary of Polymers
  12. Schumacker, Randall and Tomek, Sara (2013). F-Test. Understanding Statistics Using R pp. 187 -207
  13. Pedregosa, Fabian and Varoquaux, Gael and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Edouard (2011). Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. pp. 2825 - 2830