Manuscript received January 10, 2023; revised February 20, 2023; accepted April 27, 2023.
Abstract—Hard disk drive manufacturing is complicated and
involves several steps of assembling and testing. Poor yield in
one step can result in fail product of the whole lot. Accurate
yield prediction is thus important to product monitoring and
management. This paper presents a novel idea of data
preparation and modeling to predict yield in the process of hard
disk drive production. Data balancing technique based on
clustering and re-sampling is introduced to make the proportion
of the pass and fail products comparable. Then, we propose a
strategy to aggregate manufacturing data to be in a reasonable
group size and efficient for the subsequent step of yield
predictive model creation. Experimental results reveal that
grouping data into a constant size of 10,000 records can lead to
the more accurate yield prediction as compared to the intuitive
idea of weekly grouping.
Index Terms—Data balancing, data aggregation, yield
prediction, hard disk drive manufacturing, machine learning
Nittaya Kerdprasop and Kittisak Kerdprasop are with the School of
Computer Engineering, Suranaree University of Technology, Thailand.
Anusara Hirunyawanakul is with the Data Science and Computation
School, King Mongkut's University of Technology North Bangkok, Rayong
Campus, Thailand.
Paradee Chuaybamroong is with the Department of Environmental
Science, Thammasat University, Thailand.
*Correspondence: nittaya@sut.ac.th (N.K.)
Cite: Nittaya Kerdprasop*, Anusara Hirunyawanakul, Paradee Chuaybamroong, and Kittisak Kerdprasop, "Data Balancing and Aggregation Strategy to Predict Yield in Hard Disk Drive Manufacturing," International Journal of Machine Learning vol. 13, no. 4, pp. 181-184, 2023.
Copyright @ 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).