Machine Learning-Based Comparative Analysis of Dimensionality Reduction and Clustering Techniques: Evidence from MNIST and Rice Datasets

Authors

  • Imran Ullah Department of Computer Science, Hazara University Mansehra, Dhodial, Pakistan Author
  • Maryam Javaid Department of Computer Science, the University of Lahore, Sargodha Campus, Pakistan Author
  • Urooj Tariq Department of Computer Science, Abbottabad University of Science and Technology, Abbottabad, Pakistan Author

Keywords:

Machine Learning, MNIST, Clustering, Expectation Maximization, Principal Component Analysis, Independent Component, Randomized Projection, Manifold Learning

Abstract

Unsupervised learning techniques play a vital role in discovering latent structures within high-dimensional data without relying on labelled information. This study presents a comparative evaluation of dimensionality reduction and clustering methods applied to the MNIST handwritten digit dataset and the Commeo–Osmancik Rice dataset. Linear and non-linear dimensionality reduction techniques, including Principal Component Analysis (PCA), Independent Component Analysis (ICA), Randomized Projections (RP), and t-distributed Stochastic Neighbour Embedding (t-SNE), are analysed in combination with K-Means and Expectation Maximization (EM) clustering algorithms. Experimental results demonstrate that PCA and t-SNE preserve discriminative information more effectively than ICA and RP. On the MNIST dataset, classification accuracy improves from approximately 82–85% in the original feature space to 91–94% after applying PCA or t-SNE, while ICA and RP achieve accuracies in the range of 86–89%. For the Rice dataset, accuracy increases from around 84% to 92–95% using PCA- and t-SNE-based representations. K-Means clustering consistently outperforms EM, providing an additional 3–6% accuracy gain. Incorporating clustering labels into a Multi-Layer Perceptron classifier further improves accuracy by 2–4% and reduces training loss from approximately 0.45–0.50 to 0.25–0.30. These findings highlight the effectiveness of combining dimensionality reduction and clustering for enhanced unsupervised learning performance.

REFERENCES

[1] Y. LeCun, C. Cortes, and C. J. Burges, "MNIST handwritten digit database," ATT Labs, 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist

[2] Kaggle, "Rice dataset Commeo and Osmancik," 2024. [Online]. Available: https://www.kaggle.com/datasets/muratkokludataset/rice-dataset-commeo-and-osmancik

[3] H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, vol. 24, no. 6, pp. 417–441, 1933.

[4] A. Hyvärinen and E. Oja, "Independent component analysis: Algorithms and applications," Neural Networks, vol. 13, no. 4–5, pp. 411–430, 2000.

[5] L. van der Maaten and G. Hinton, "Visualizing data using t-SNE," Journal of Machine Learning Research, vol. 9, pp. 2579–2605, Nov. 2008.

[6] S. Dasgupta, "Experiments with random projection," in Proc. 16th Conf. Uncertainty in Artificial Intelligence (UAI), 2000, pp. 143–151.

[7] J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proc. 5th Berkeley Symp. Mathematical Statistics and Probability, 1967, vol. 1, no. 14, pp. 281–297.

[8] A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society: Series B, vol. 39, no. 1, pp. 1–38, 1977.

[9] Analytics Vidhya, "Top 12 dimensionality reduction techniques," 2018. [Online]. Available: https://www.analyticsvidhya.com/blog/2018/08/dimensionality-reduction-techniques-python/

[10] Javatpoint, "Introduction to dimensionality reduction technique," 2024. [Online]. Available: https://www.javatpoint.com/dimensionality-reduction-technique

[11] TensorFlow, "MNIST dataset overview," 2024. [Online]. Available: https://www.tensorflow.org/datasets/catalog/mnist

[12] M. Koklu, "Rice image dataset," Kaggle, 2024. [Online]. Available: https://www.kaggle.com/datasets/muratkokludataset/rice-image-dataset

[13] M. Miyade, "Rice-image-dataset," GitHub, 2019. [Online]. Available: https://github.com/miyade2019/Rice-Image-Dataset

[14] Freecodecamp, "8 clustering algorithms in machine learning that all data scientists should know," 2020. [Online]. Available: https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/

[15] Mendeley Data, "An image dataset of rice varieties," 2024. [Online]. Available: https://data.mendeley.com/datasets/3mn9843tz2/3

[16] G. E. Hinton and S. T. Roweis, "Stochastic neighbor embedding," in Advances in Neural Information Processing Systems, vol. 15, 2002, pp. 833–840.

[17] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006.

[18] Y. Lin, K. Wang, X. Sun, G. Xu, and X. Wang, "A remote sensing image dataset for cloud removal," arXiv preprint arXiv:1901.00600, 2019.

[19] Encord, "Top 12 dimensionality reduction techniques for machine learning," Encord Blog, 2024. [Online]. Available: https://encord.com/blog/dimentionality-reduction-techniques-machine-learning/

[20] Javatpoint, "Clustering in machine learning," 2024. [Online]. Available: https://www.javatpoint.com/clustering-in-machine-learning

[21] Datarundown, "6 different types of clustering: All you need to know," 2024. [Online]. Available: https://datarundown.com/types-of-clustering/

[22] Papers With Code, "RICE dataset," 2024. [Online]. Available: https://paperswithcode.com/dataset/rice

[23] GitHub, "BUPTLdy/RICE_DATASET," 2024. [Online]. Available: https://github.com/BUPTLdy/RICE_DATASET

[24] Kaggle, "MNIST dataset," 2024. [Online]. Available: https://www.kaggle.com/datasets/hojjatk/mnist-dataset

[25] GitHub, "MNIST dataset," 2024. [Online]. Available: https://github.com/cvdfoundation/mnist

[26] Analytics Vidhya, "An introduction to clustering and different methods of clustering," 2016. [Online]. Available: https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-clustering-and-different-methods-of-clustering/

[27] Medium, "Linear vs non-linear dimensionality reduction: PCA and Kernel PCA," [Online]. Available: https://medium.com/@abhishek8694/linear-vs-non-linear-dimensionality-reduction-pca-and-kernel-pca-10490f345ba9

[28] M. Asif and S. Ullah, “Determinants of support for federalism vs. centralization: A survey of public opinion in Punjab and Khyber Pakhtunkhwa (KP),” Social Science Review Archives, vol. 4, no. 1, pp. 2791–2807, 2026, doi: 10.70670/sra.v4i1.1843.

[29] M. Asif and S. Ullah, “Performance voting vs. identity voting: An analysis of electoral behaviour in Pakistani districts,” Journal of Applied Linguistics and TESOL (JALT), vol. 9, no. 1, pp. 213–226, 2026, doi: 10.63878/cjssr.v4i1.2079.

[30] M. Asif, A. Ali, and F. A. Shaheen, “Assessing the effects of artificial intelligence in revolutionizing human resource management: A systematic review,” Social Science Review Archives, vol. 3, no. 4, pp. 2887–2908, 2025, doi: 10.70670/sra.v3i3.1055.

[31] D. Mohiuddin, “Adaptive marketing systems and consumer feedback loops: Implications for market development in emerging economies,” Journal of Business Insight and Innovation, vol. 5, no. 1, pp. 37–48, 2026.

[32] D. Mohiuddin, “HR tech adoption in digital banking: Implications for workforce development and financial sector growth in emerging economies,” Journal of Business Insight and Innovation, vol. 4, no. 2, pp. 77–90, 2025.

[33] D. Mohiuddin and D. N. Farhan, “Artificial intelligence in marketing: Ethical challenges and solutions for consumers and society,” Journal of Business Insight and Innovation, vol. 4, no. 1, pp. 73–87, 2025.

[34] D. Mohiuddin, “Algorithmic hyper-personalization: The double-edged sword of predictive personalization—An empirical investigation,” Journal of Engineering and Computational Intelligence Review, vol. 2, no. 2, pp. 82–94, 2024.

[35] D. Mohiuddin, “Consumer perceptions and trust in AI-generated advertising: An experimental study in the Pakistani context,” Apex Journal of Social Sciences, vol. 3, no. 1, pp. 53–68, 2024.

Downloads

Published

26-04-2026

How to Cite

Machine Learning-Based Comparative Analysis of Dimensionality Reduction and Clustering Techniques: Evidence from MNIST and Rice Datasets. (2026). Journal of Engineering and Computational Intelligence Review, 4(1), 12-24. https://jecir.com/index.php/jecir/article/view/39

Share

Similar Articles

21-30 of 37

You may also start an advanced similarity search for this article.