Data cleaning
Data cleaning FIG : Data Cleaning Data cleaning, also known as data cleansing or data pre-processing, is the process of detecting and correcting or removing corrupt, inaccurate, or incomplete records from a dataset. It is a crucial step in machine learning as the accuracy and effectiveness of a model depend on the quality of the data that it is trained on. The data cleaning process typically involves several steps: Handling missing data: Incomplete data records can lead to biased or incorrect results. One way to handle missing data is to delete records with missing values. Another approach is to fill in the missing values with reasonable estimates, such as the mean or median of the available data. Handling outliers: Outliers are data points that deviate significantly from the rest of the data. They can distort the results and should be handled carefully. One approach is to remove them from the dataset, but in some cases, they may be important and should be kept. Handling duplicates: Du...