Manuscript received October 12, 2023; revised November 28, 2023; accepted December 10, 2023; published May 15, 2024
Abstract—Cancer is a common severe disease today, and this type of disease has a high mortality rate. Therefore, a cancer diagnosis is an essential tool because most patients die due to a lack of early diagnosis and treatment. To better diagnose cancer, people use gene sequencing techniques and microarray techniques to replace traditional tumor morphology because cancer is a genomic disease. As people widely utilize gene sequencing technology, many gene data aggregately form a cancer gene database to help people manage these data better. Through large-scale cancer genomics datasets, people have used different Machine Learning algorithms to create cancer prediction models. The prediction model created by these algorithms has higher accuracy and a lower error rate than other cancer classification methods. Therefore, a large amount of genetic data can be obtained using different gene detection techniques to detect cancer gene expression. People will perform unique analyses and processing of these data to obtain valuable information. These large amounts of data are aggregated into specialized cancer gene databases such as TCGA that can be used to train ML algorithms to obtain the best predictive models.
Keywords—machine learning, TCGA, cancer genomics
Cite: Ruiyi Li, "Machine Learning Based Cancer Classification Using Gene Expression Data," International Journal of Machine Learning vol. 14, no. 2, pp. 48-53, 2024.
Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).