Pre processing
-
Upload
vijay-kumar -
Category
Documents
-
view
42 -
download
4
Transcript of Pre processing
![Page 1: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/1.jpg)
Data preprocessingData preprocessing
![Page 2: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/2.jpg)
• Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.
• Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors.
Data Preprocessing
![Page 3: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/3.jpg)
Data Preprocessing
• Data preprocessing is a proven method of resolving such issues.
• Data preprocessing prepares raw data for further processing.
• Data preprocessing is used database-driven applications such as customer relationship management and rule-based applications (like neural networks).
![Page 4: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/4.jpg)
Number of data preprocessing techniques
• Data cleaning• Data integration• Data transformation• Data reduction
![Page 5: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/5.jpg)
Data Preprocessing Techniques
![Page 6: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/6.jpg)
• Data cleaning : can be applied to remove noise and correct inconsistencies in the data.
• Data integration :merges data from multiple sources into a coherent data store, such as a data warehouse.
• Data transformations :such as normalization, may be applied.
• Data reduction : can reduce the data size by aggregating, eliminating redundant features, or clustering ,for instance.
Data Preprocessing Techniques
![Page 7: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/7.jpg)
• routines work to “clean” the data by filling in missing values, smoothing noisy data, identifying or removing outliers, and resolving inconsistencies.
• If users believe the data are dirty, they are unlikely to trust the results of any data mining that has been applied to it.
• Although most mining routines have some procedures for dealing with incomplete or noisy data, they are not always robust.
Data cleaning
![Page 8: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/8.jpg)
Data cleaning• Therefore, a useful preprocessing step is to
run your data through some data cleaning routines.
![Page 9: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/9.jpg)
• Include data from multiple sources in your analysis.
• This would involve integrating multiple databases, data cubes, or files, that is, data integration.
• Yet some attributes representing a given concept may have different names in different databases, causing inconsistencies and redundancies.
Data integration
![Page 10: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/10.jpg)
• Having a large amount of redundant data may slow down or confuse the knowledge discovery process.
• Clearly, in addition to data cleaning, steps must be taken to help avoid redundancies during data integration.
• Typically, data cleaning and data integration are performed as a preprocessing step when preparing the data for a data warehouse.
Data integration
![Page 11: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/11.jpg)
Data integration• Additional data cleaning can be performed to
detect and remove redundancies that may have resulted from data integration.
![Page 12: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/12.jpg)
• Getting back to your data, you have decided, say, that you would like to use a distance based mining algorithm for your analysis, such as neural networks, nearest-neighbor classifiers, or clustering.
• methods provide better results if the data to be analyzed have been normalized, that is, scaled to a specific range such as [0.0, 1.0].
Data Transformation
![Page 13: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/13.jpg)
Data Transformation• You soon realize that data transformation
operations, such as normalization and aggregation, are additional data preprocessing procedures that would contribute toward the success of the mining process.
![Page 14: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/14.jpg)
• Data reduction obtains a reduced representation of the data set that is much smaller in volume, yet produces the same (or almost the same) analytical results.
• There are a number of strategies for data reduction.
• These include data aggregation , attribute subset selection , dimensionality reduction and numerosity reduction.
DATA REDUCTION
![Page 15: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/15.jpg)
• Data can also be “reduced” by generalization with the use of concept hierarchies, where low-level concepts, such as city for customer location, are replaced with higher-level concepts, such as region or province or state.
• A concept hierarchy organizes the concepts into varying levels of abstraction.
• Data discretization is a form of data reduction that is very useful for the automatic generation of concept hierarchies from numerical data.
DATA REDUCTION
![Page 16: Pre processing](https://reader036.fdocument.pub/reader036/viewer/2022082606/557ee5c7d8b42a34628b47de/html5/thumbnails/16.jpg)
Thank you