MACHINE LEARNING CLASSIFICATION OF STARS, GALAXIES, AND QUASARS

Authors

  • Yulun Winston Wu 11th Grade, Northfield Mount Hermon School, Gill, United States

DOI:

https://doi.org/10.20319/mijst.2021.63.102122

Keywords:

Logistic Regression, Decision Tree, Stars, Galaxies, Quasars, Classification

Abstract

The objective of this study was to create a predictive model to classify stars, galaxies, and quasars, along with comparing different classification models to find the superior one. I hypothesized that it was possible to successfully train a machine learning model to classify stars, galaxies, and quasars using astronomical data provided by the Sloan Digital Sky Survey (SDSS). A multinomial logistic regression model has been trained and tested. It had an accuracy of 0.87, a weighted average precision, recall, and an f-1 score of 0.87, and a cross-validation accuracy score of 0.8664. The next model, a decision tree, had an accuracy of 0.99, weighted average precision, recall, and an f-1 score of 0.99, a cross-validation accuracy score of 0.99, and a cross-validation accuracy score of 0.9858. The decision tree model had significantly superior performance compared to the logistic regression model and was a good fit and accurate classifier for stars, galaxies, and quasars, proving my hypothesis to be correct. The model from this study could be used as a reliable classification tool for a wide variety of astronomical purposes to accelerate the expansion of the sample sizes of stars, galaxies, and especially quasars.

References

Abolfathi, B., Aguado, D. S., Aguilar, G., Allende Prieto, C., Almeida, A., Ananna, T. T., … Zou, H. (2018, April). The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the Extended Baryon Oscillation Spectroscopic Survey and from the Second Phase of the Apache Point Observatory Galactic Evolution Experiment. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2018ApJS..235...42A/abstract

Blanton, M. R., Bershady, M. A., Abolfathi, B., Albareti, F. D., Allende Prieto, C., Almeida, A., … Zou, H. (2017, July). Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2017AJ....154...28B/abstract

Carrasco, D., Barrientos, L. F., Pichara, K., Anguita, T., Murphy, D. N. A., Gilbank, D. G., … López, S. (2015, August 24). Photometric classification of quasars from RCS-2 using Random Forest⋆. Astronomy & Astrophysics. https://doi.org/10.1051/00046361/201525752

Chang, M., Dalpatadu, R. J., Phanord, D., & Singh, A. K. (2018). A Bootstrap Approach For Improving Logistic Regression Performance In Imbalanced Data Sets. MATTER: International Journal of Science and Technology, 4(3), 11–24. https://doi.org/10.20319/mijst.2018.43.1124

Chauhan, G. (2018, October 10). All about Logistic regression. Medium. https://towardsdatascience.com/logistic-regression-b0af09cdb8ad

Clarke, A. O., Scaife, A. M. M., Greenhalgh, R., & Griguta, V. (2020, May 21). Identifying galaxies, quasars, and stars with machine learning: A new catalogue of classifications for 111 million SDSS sources without spectra. arXiv.org. https://doi.org/10.1051/0004-6361/201936770

Doi, M., Tanaka, M., Fukugita, M., Gunn, J. E., Yasuda, N., Ivezić, Ž., … French Leger, R. (2010, April). Photometric Response Functions of the Sloan Digital Sky Survey Imager. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2010AJ....139.1628D/abstract

Dorpe, S. V. (2018, December 13). Preprocessing with sklearn: a complete and comprehensive guide. Medium. https://towardsdatascience.com/preprocessing-with-sklearn-a-complete-and-comprehensive-guide-670cb98fcfb9

Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K., & Schneider, D. P. (1996, April). The Sloan Digital Sky Survey Photometric System. NASA/ADS. https://ui.adsabs.harvard.edu/abs/1996AJ....111.1748F/abstract https://doi.org/10.1086/117915

Gunn, J. E., Carr, M., Rockosi, C., Sekiguchi, M., Berry, K., Elms, B., … Brinkman, J. (1998, December). The Sloan Digital Sky Survey Photometric Camera. NASA/ADS. https://ui.adsabs.harvard.edu/abs/1998AJ....116.3040G/abstract

Gunn, J. E., Siegmund, W. A., Mannery, E. J., Owen, R. E., Hull, C. L., Leger, R. F., … Wang, S.-i. (2006, April). The 2.5 m Telescope of the Sloan Digital Sky Survey. NASA/ADS. https://ui.adsabs.harvard.edu/abs/2006AJ....131.2332G/abstract

Huilgol, P. (2019, August 24). Accuracy vs. F1-Score. Medium. https://medium.com/analytics-vidhya/accuracy-vs-f1-score-6258237beca2

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

Kat, S. (2019, August 10). Logistic Regression vs. Decision Tree - DZone Big Data. dzone.com. https://dzone.com/articles/logistic-regression-vs-decision-tree

Kohli, S. (2019, November 18). Understanding a Classification Report For Your Machine Learning Model. Medium. https://medium.com/@kohlishivam5522/understanding-a-classification-report-for-your-machine-learning-model-88815e2ce397

Lee, C. (2016, October 3). Logistic Regression versus Decision Trees. The Official Blog of BigML.com.https://blog.bigml.com/2016/09/28/logistic-regression-versus-decision-trees/

Logan, C. H. A., & Fotopoulou, S. (2020, January 23). Unsupervised star, galaxy, QSO classification - Application of HDBSCAN. Astronomy & Astrophysics. https://www.aanda.org/articles/aa/full_html/2020/01/aa36648-19/aa36648-19.html

McKinney, W., & others. (2010). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference (Vol. 445, pp. 51–56). K https://doi.org/10.25080/Majora-92bf1922-00a

Patil, P. (2018, May 23). What is Exploratory Data Analysis? Medium. https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15

Pedregosa, F., Varoquaux, Ga"el, Gramfort, A., Michel, V., Thirion, B., Grisel, O., … others. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830.

Ramadhan, M. M., Sitanggang, I. S., & Anzani, L. P. (2017). Classification Model For Hotspot Sequences As Indicator For Peatland Fires Using Data Mining Approach. MATTER: International Journal of Science and Technology, 3(2), 588–597. https://doi.org/10.20319/mijst.2017.32.588597

Redd, N. T. (2018, February 24). Quasars: Brightest Objects in the Universe. Space.com. https://www.space.com/17262-quasar-definition.html

SDSS SkyServer . Galaxies and QSOs. SDSS SkyServer DR14. http://skyserver.sdss.org/dr14/en/astro/galaxies/galaxies.aspx

SDSS SkyServer. Redshifts. SDSS SkyServer DR12. https://skyserver.sdss.org/dr12/en/proj/advanced/hubble/redshifts.aspx

SDSS SkyServer. Stars and Nebulae. SDSS SkyServer DR14. http://skyserver.sdss.org/dr14/en/astro/stars/stars.aspx

SDSS. Camera. SDSS. https://www.sdss.org/instruments/camera/

SDSS. Measures of Flux And Magnitude. SDSS. http://www.sdss3.org/dr8/algorithms/magnitudes.php

SDSS. Understanding SDSS Imaging Data. SDSS. http://www.sdss3.org/dr9/imaging/imaging_basics.php

Shaikh, R. (2018, November 26). Cross Validation Explained: Evaluating estimator performance. Medium.https://towardsdatascience.com/cross-validation-explained-evaluating-estimator-performance-e51e5430ff85

Viquar, M., Basak, S., Dasgupta, A., Agrawal, S., & Saha, S. (2018, April). Machine Learning in Astronomy: A Case Study in Quasar-Star Classification. ResearchGate. https://www.researchgate.net/publication/324536351_Machine_Learning_in_Astronomy_A_Case_Study_in_Quasar-Star_Classification. https://doi.org/10.1007/978-981-13-1501-5_72

Waskom, M., Botvinnik, Olga, O'Kane, Drew, Hobson, Paul, Lukauskas, Saulius, Gemperline, David C, … Qalieh, Adel. (2017). mwaskom/seaborn: v0.8.1 (September 2017). Zenodo. https://doi.org/10.5281/zenodo.883859

Yoong, T. (2018, December 15). Predicting Stars, Galaxies & Quasars with Random Forest Classifiers in Python. Medium. https://towardsdatascience.com/predicting-stars-galaxies-quasars-with-random-forest-classifiers-in-python-edb127878e43

Zhang, Y., Zhao, Y., Zheng, H., & Wu, X. (2013, January). Classification of Quasars and Stars by Supervised and Unsupervised Methods. 2013IAUS..288..333Z Page 333. https://doi.org/10.1017/S1743921312017176

Downloads

Published

2021-01-05

How to Cite

Wu, Y. W. (2021). MACHINE LEARNING CLASSIFICATION OF STARS, GALAXIES, AND QUASARS . MATTER: International Journal of Science and Technology, 6(3), 102–122. https://doi.org/10.20319/mijst.2021.63.102122