DBMiner - A System for Mining Knowledge from Large Data Sources 이진숙 인공지능 연구실...

40
DBMiner - A System for Mining Knowle dge from Larg e Data Sources 이이이 이이이이 이이이 이이 3 이이 [email protected] 1999 이 12 이 8 이 이

Transcript of DBMiner - A System for Mining Knowledge from Large Data Sources 이진숙 인공지능 연구실...

DBMiner

- A System for Mining Knowledge from Large Data Sources

이진숙 인공지능 연구실 석사 3 학기[email protected]

1999 년 12 월 8 일 수

Content

Introduction Architecture Functionalities DMQL and Interactive Data Mining Implementation of DBMiner Demonstration Conclusion Future work Reference

Introduction ( 1/3 ) - Who ?

Data Mining Research Group

Intelligent Database Systems Research Lab.

Simon Fraser University

British Columbia, Canada

http://db.cs.sfu.ca/

- Version DBMiner 2.0 (Enterprise) - ? A mini-version of DBMiner E1.1 **

- table 크기는 1000 개의 행 ( 레코드 ) 로 cube 크기는 3 dimension 으로 제한됨

Introduction - why? ( 2/3 ) 배경

-> Data warehousing

multiple sources 로부터 large warehouses 로 데이터를 통합하는 것

-> Data mining (knowledge discovery in databases)

Extraction of interesting knowledge

Motive : Data explosion problem

DBMiner : 관계형 DB 와 데이터 웨어하우스를 위한 OLAP 마이닝 시스템 * Goal : 대량의 RDB 에서 multiple-level knowledge 의 Interactive mining 을 위해 개발됨

Introduction ( 3/3 ) - overview DBMiner 의 특징 >

• Data Mining 과 Data Warehousing 기술의 통합

• A generalization-based data mining tool for knowledge discovery from large relational data sources.

• Multiple data mining function modules

• 강력한 DMQL

• Mining knowledge at multiple concept levels.

• Data warehousing and OLAP capabilities.

• Integrated

• Interactive - GUI 제공

Architecture

General architecture of DBMiner

Graphical User Interface

SQL server Discovery Modules

Data Concept HierarchyConcept Hierarchy

Modules of DBMiner

Knowledge Discovery Modules of DBMiner

SummarizerSummarizer ComparatorComparator ClassifierClassifier

AssociationAssociationRule MinerRule Miner

Time SeriesTime SeriesAnalyzer Analyzer

PredictorPredictor

ClusterClusterAnalyzer Analyzer

Meta-RuleMeta-RuleGuidedGuided

FutureFuture ModulesModules

Major modules ( E 1.0 에서 )

Data warehouse construction module -> for automatic dimension generation and data cube creation 3-D cube view of the data warehouse 3-D boxplot (statistical) view of the data warehouse OLAP-based data summarizer Associator -> for mining association rules Classifier -> for data classification and decision tree 생성 Predictor -> for regression analysis and predictive modeling

Functionalities of modules (1/2)

Characterizer ( generalize )

사용자가 명시한 자료집합의 일반적 특성을 요약

Comparator (mines )

discriminant rule 의 집합을 찾는 것

Classifier (analyze)

training data 집합을 분석하고 , 각 클래스에 대한 모델을 생성

Associator (discover)

association rules 의 집합을 찾는 것

Functionalities of modules (2/2)

Meta-pattern guided miner

명세화된 meta 규칙 형태를 찾는 데이터마이닝 메커니즘

Predictor ( predict )

가능한 값을 예측

Cluster analyzer ( grouping )

선택된 데이터 집합을 그룹화

Time -series analyzer

Future modules

DMQL and Interactive Data Mining

DMQL (Data Mining Query Language) 제공 Graphical User Interface 제공

목적 - 복합 단계 지식의 상호 대화적인 마이닝을 위해

query process -> 관련데이타의 수집 : relational query -> data 일반화 (generalize) : attribute-oriented induction -> 출력 : 관계일반화 , feature table 일반화 , 일반화된 룰의 복합적인 형식 , pie, bar charts, curves 등을 제공가능

Implementation of DBMiner (1/4)

Data generalization : DBMiner 의 핵심기능 ( = Summarization,Characterization)

데이터 일반화에서 고려되는 두가지 데이터 구조 - 일반화된 관계 구조 VS 다차원 데이터 큐브 구조

목적 : 효과적인 구현을 위해서 저장공간을 줄임 , 빠른 access, 비용감소

roll-up , drill down 등을 통한 multi-concept level -> data generalized

* Generalized relation

- 속성집합과 aggregate 속성집합으로 구성된 한 릴레이션

Implementation of DBMiner (2/4)

Multiple-level characterization

data characterization

- summarize and characterize

목적 : multi-level knowledge mining 하기 위해

응용기술 - progressive deepening (drill-down)

- progressive generalization (roll-up)

Implementation of DBMiner (3/4)

판별 규칙의 탐색

복합 단계 association

-> inter-attribute association

-> intra-attribute association

Meta-rule guided mining

-> meta-rule (mata-pattern) ; 명세화된 constraint .

-> 목적 : 많은 종류의 규칙들의 마이닝을 안내하기 위해 사용 .

Implementation of DBMiner (4/4)

Classification

-> 목적 : 각 클래스를 모델링하거나 설명을 하기 위해 -> Decision-tree method 를 사용 ( ID3, C4.5, 통계적 방법 , 신경망 , rough set 과 같은 )

-> classfier

Prediction

-> 데이터 값 , 값의 분산

Clustering ; 각 cluster 는 공통 특성을 공유 -> unsupervised learning

-> 클래스의 한 집합으로 데이터 집합을 분할하는 과정

The Major System Components of DBMinerThe Major System Components of DBMiner

The Warehouse Workspace (Browser)

– Building a data warehouse– Browsing a data cube

The Mining Wizard

The Data Mining Modules

The Warehouse WorkspaceThe Warehouse Workspace

Importing dataTable browsingDimension creationDimension browsingCube buildingCube browsing

Dimension CreationDimension Creation

Create dimensions or measurements by selecting appropriate table columnsContext sensitive menus appear

Building a Data CubeBuilding a Data Cube

Adding dimensions to a cube Adding measurements to a cube Deleting/Modifying cube elements The “Build” command

Browsing a Data CubeBrowsing a Data Cube

Powerful visualization OLAP capabilities Interactive manipulation

Manipulating the View

EnlargingShrinking

OLAP Manipulations

Rolling upDrilling down

DicingDicing through to a “Subcube”

Double click on a particular cell e.g. cell =

Product (Environmental Line)Revenue (0-2000)Location (Far East)

The Mining WizardThe Mining WizardIntelligent wizards to guide your exploration!

Data Cube Aggregation for Summarization

sum

0-20K20-40K 60K- sum

Comp_Method

… ...

sum

Database

Amount

Province

Discipline

40-60KB.C.

PrairiesOntario

All AmountComp_Method, B.C.

Each dimension contains a hierarchy of values for one attributeA cube cell stores aggregate values, e.g., count, sum, max, etc.A “sum” cell stores dimension summation values.Sparse-cube technology and MOLAP/ROLAP integration.

Demonstration

Configuration

- Windows 95 에서 - DBMiner Educational Demo version 1.1

- Sample DB , local warehouse 를 이용

Conclusion (1/3)< DBMiner 의 주요기술 >

• OLAP technology

• Multi-level and multiple mining modules

• Interactive OLAP-based mining and visual graphical display

• A data mining query language DMQL and mining in both relational databases and data warehouses.

Conclusion (2/3)

< DBMiner 의 응용용도 >

• To query, report and analyze ( 관계형 데이터베이스나 데이터웨어하우스를 )

• Ideally suited for -> 이익과 성장 분석 -> 전략적 관리 -> 고객관계 관리 -> 자산관리 -> business management -> decision support efforts ( Business process reengineering(BRP) , total quality management (TQM) )

Conclusion (3/3)

< DBMiner 의 주요특징 >

OLAP

Attribute 기반 귀납법

통계분석

복합 단계 지식을 마이닝

몇가지 흥미있는 데이터마이닝 기술을 결합

사용자에게 친숙한 interactive 데이터마이닝 환경을 제공

Future Work

새로운 종류의 지식을 마이닝 - evolution, deviation, pattern-matching

GeoMiner : Spatial Data Mining

Library Miner

Multimedia Miner

WebMiner : WWW Data Mining

Reference J. Han, J. Chiang, S. Chee, J. Chen, Q. Chen, S. Cheng, W. Gong, M. Kamber, K.Koperski,

G. Liu, Y. Lu, N. Stefanovic, L. Winstone, B. Xia, O. R. Zaiane, S. Zhang, H. Zhu,

``DBMiner: A System for Data Mining in Relational Databases and Data Warehouses'',

Proc. CASCON'97: Meeting of Minds, Toronto, Canada, November 1997.

Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzysztof Koperski, Deyi Li, Yijun Lu, Amynmohamed Rajan, Nebojsa Stefanovic, Betty Xia, Osmar R. Zaiane,

`` DBMiner: A System for Mining Knowledge in Large Relational Databases “

Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96) ,

Portland, Oregon, August 1996, pp. 250-255.

The Data Mining Research Group,

``Introduction to DBMiner and Data Mining and Warehousing Concepts''

(Microsoft PowerPoint version), Boeing Workshop, Seattle, Washington, December 1997.

http://db.cs.sfu.ca/ School of Computing Science , Simon Fraser University

DBMiner Information and Demo: http://db.cs.sfu.ca/DBMiner