Identification of Similarities and Clusters of Bread Baking Recipes Based on Data of Ingredients

  • Stefan Anlauf ,
  • Melanie Lasslberger, 
  • Rudolf Grassmann, 
  • Johannes Himmelbauer, 
  • Stephan Winkler
  • a,b,e  University of Applied Sciences Upper Austria, Bioinformatics, Softwarepark 11, 4232 Hagenberg, Austria
  • a,e Johannes Keppler Universität, Computer Science, Altenberger Straße 69, 4040 Linz, Austria
  • backaldrin International The Kornspitz Company GmbH, Kornspitzstraße 1, 4481 Asten, Austria
  • Software Competence Center Hagenberg, Softwarepark 32a, 4232 Hagenberg, Austria
Cite as
Anlauf S., Lasslberger M., Grassmann R., Himmelbauer J., and Winkler S. (2022).,Identification of Similarities and Clusters of Bread Baking Recipes Based on Data of Ingredients. Proceedings of the 8th International Food Operations and Processing Simulation Workshop (FoodOPS 2022). , 002 . DOI: https://doi.org/10.46354/i3m.2022.foodops.002
 Download PDF

Abstract

We define the similarity of bakery recipes and identify groups of similar recipes using different clustering algorithms. Our analyses are based on the relative amounts of ingredients included in the recipes. We use different clustering algorithms to find the optimal clusters for all recipes, namely k-means, k-medoid, and hierarchical clustering. In addition to standard similarity measures we define a similarity measure using the logarithm of the original data to reduce the impact of raw materials that are used in large quantities.
Clustering recipes based on their ingredients can improve the search for similar recipes and therefore help with the time-consuming process of developing new recipes. Using the k-medoid method, we can separate 1271 recipes into six different clusters. We visualize our results via dendrograms that represent the hierarchical separation of the recipes into individual groups and sub-groups.

References

  1. Abdi, H. and Williams, L. J. (2010). Principal component analysis. Wiley interdisciplinary reviews: computational
    statistics, 2(4):433–459.
  2. Barlow, H. (1989). Unsupervised Learning. Neural Computation, 1(3):295–311.
  3. Davies, D. and Bouldin, D. (1979). A cluster separation  measure. Pattern Analysis and Machine Intelligence, IEEE
    Transactions on, PAMI-1:224 – 227.
  4. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3):241–254
  5. Kicherer, H., Dittrich, M., Grebe, L., Scheible, C., and Klinger, R. (2018). What you use, not what you do: Automatic classification and similarity detection of recipes. Data and Knowledge Engineering, 117:252–263.
  6. Park, H.-S. and Jun, C.-H. (2009). A simple and fast algorithm for k-medoids clustering. Expert Systems with
    Applications, 36(2, Part 2):3336–3341.
  7. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
    Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.
    (2012). Scikit-learn: Machine learning in python. CoRR, abs/1201.0490.
  8. Rogovschi, N., Kitazono, J., Grozavu, N., Omori, T., and Ozawa, S. (2017). t-distributed stochastic neighbor
    embedding spectral clustering. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 1628–
    1632. IEEE.
  9. Sinaga, K. P. and Yang, M.-S. (2020). Unsupervised kmeans clustering algorithm. IEEE Access, 8:80716–
    80727.
  10. Su, H., Lin, T.-W., Li, C.-T., Shan, M.-K., and Chang, J. (2014). Automatic recipe cuisine classification by ingredients. UbiComp ’14 Adjunct, page 565–570, New York, NY, USA. Association for Computing Machinery.