Case Study: UNIDO 11.4.20081METIS 2008, Luxembourg: Valentin Todorov Case Study: UNIDO Valentin...
-
Upload
linette-eaton -
Category
Documents
-
view
220 -
download
1
Transcript of Case Study: UNIDO 11.4.20081METIS 2008, Luxembourg: Valentin Todorov Case Study: UNIDO Valentin...
Case Study: UNIDO
11.4.2008 1METIS 2008, Luxembourg: Valentin Todorov
Case Study: UNIDO
Valentin Todorov
UNIDO
METIS 2008 (Luxembourg, 9-11 April 2008)
Case Study: UNIDO
11.4.2008 2METIS 2008, Luxembourg: Valentin Todorov
Outline
âą Introduction and Overviewâą Statistical Metadata Systems and the Statistical Cycleâą Statistical Metadata in each phase of the Statistical Cycleâą Systems and Design Issuesâą Organizational and Cultural Issues
Case Study: UNIDO
11.4.2008 3METIS 2008, Luxembourg: Valentin Todorov
About UNIDO
âą UNIDO was set up in 1966 âą Became a specialized agency of the UN in 1985âą Promote industrialization throughout the developing world âą 172 Member States (as of 3 December 2007)âą Headquarters in Viennaâą Represented in 35 developing countries
Case Study: UNIDO
11.4.2008 4METIS 2008, Luxembourg: Valentin Todorov
About Statistics in UNIDO
âą Service Module âIndustrial Governance and Statisticsâ:â monitor, benchmark and analyse their industrial performance and
capabilities â formulate, implement and monitor strategies, policies and
programmes to improve the contribution of industry to productivity growth and the achievement of the UN Millennium Development Goals (MDGs)
âą Building capabilities in industrial statistics - providing technical assistance to: â Introduce best practice methodologies and software systems â Enhance the quality and consistency of the
industrial statistics databases
Case Study: UNIDO
11.4.2008 5METIS 2008, Luxembourg: Valentin Todorov
About the Organisation
All statistical activities are carried out by the Research and Statistics Branch â PCF/RST
Case Study: UNIDO
11.4.2008 6METIS 2008, Luxembourg: Valentin Todorov
Overall strategy and metadata management principles
âą Conceptual development was initiated in 1999âą An integrated data and data documentation (metadata)
frameworkâą A smooth migration policy - must not disrupt established
UNIDO data servicesâą Stepwise development in the context of a migration project
of the statistical databases from an IBM mainframe to a client/server platform
âą Backed by the UNIDO Quality Assurance Framework
Case Study: UNIDO
11.4.2008 7METIS 2008, Luxembourg: Valentin Todorov
Overall strategy (cont.)
âą Following the International Recommendations for Industrial Statistics (2008)
âą Common formats and nomenclatures for exchange and sharing of statistical data and metadata- SDMX
âą Availability of the metadata in three languages (English, French and Spanish)
âą Based on a formal framework - the proposed information system architecture comprises two cubes, one for statistical data and another for the metadata interrelated by a set of shared dimensions - see Froeschl et al. (2002), Froeschl and Yamada (2000)
Case Study: UNIDO
11.4.2008 8METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Process
âą Initialisationâ Pre-filling of the out-going UNIDO General Industrial Statistics
Questionnaire with previously reported statistical data and metadata (non-OECD countries)
â Excel formatâ In the appropriate language - English, French or Spanishâ Automated using the available data and metadata
âą Data Collection â NSO: the completed and returned to UNIDO by the NSO
questionnaires (excel format, rarely hard copy) are entered into the system and are ready for further validation and processing
â OECD: Data for OECD member countries (excel format) are ready for further validation and processing
Case Study: UNIDO
11.4.2008 9METIS 2008, Luxembourg: Valentin Todorov
UNIDO Statistical Processâą Transformation/Processing
â The data collected from the primary or secondary sources are further transformed to a ready-to use data sets
â The data transformation is done in five stages, which not only constitute an operational framework for UNIDO statisticians, but also provides additional description of statistics (generated metadata which are attributed to each data item)
â After undergoing the complete processing phase the incoming and generated data and metadata are stored in the databases
âą Disseminationâ International Yearbook of Industrial Statisticsâ INDSTAT and IDSB CD productsâ Web Country Statistics (Country Brief)â Ad hock requests by internal and external users
Case Study: UNIDO
11.4.2008 10METIS 2008, Luxembourg: Valentin Todorov
Mapping of the UNIDO cycle phases to these developed by the METIS group
METIS UNIDO
Need Need [optional]
Develop and design Develop and design [optional]
Build Initialisation
Collect Data Collection
Process Transformation/Processing
Analyse Analysis
Disseminate Dissemination
Archive -
Evaluate Evaluation
Case Study: UNIDO
11.4.2008 12METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
âą ADMIN â provides administrative services, like user and authorisation management, logging and auditing of the system, backup and restore managementâ outside of the life cycle
âą Nomenclature Explorer - maintenance of the core definitional metadata (not related to particular data sets or items) â outside of the life cycle
âą Questionnaire - management of the pre-filling and distributing of the questionnaires â used in the Initialisation phase
Case Study: UNIDO
11.4.2008 13METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
âą Data Wizard â the main data and metadata maintenance tool â Used in the Data Collection and Transformation phasesâ Provides services for
âą Reading in the data and metadata from the returned back questionnaire (Excel)
âą Initial validation of the read in data and storing in the database (at stage 1)
âą Maintenance of the data and metadata âą Screening âą Aggregation and further data validations and transformations
Case Study: UNIDO
11.4.2008 14METIS 2008, Luxembourg: Valentin Todorov
ISDE: Publication applicationsâą Yearbook â a complex set of applications for production
of the International Yearbook of Industrial Statistics â aggregation, layout, â PDF file generation according to pre-defined templates and other
toolsâ The final result is a publication ready PDF file of about 700 pages
âą INDSTAT CD â produce the INDSTAT type of CD products
âą IDSB CD â produce the IDSB type of CD products âą WEB â generate the necessary data and metadata for
updating the WEB dissemination database â This database is outside of the ISDE systemâ Managed by the computer section
Case Study: UNIDO
11.4.2008 15METIS 2008, Luxembourg: Valentin Todorov
ISDE Applications
âą Presentation Wizard â mainly a visualization tool which can be used in the Dissemination phase for answering ad hock requests, but because of its versatile functionality it finds a wide usage also in the Data Transformation phase
âą Other applications â in this category are included any other applications used in the process, like SAS, R, tools for compilation of Production index numbers and National Accounts data (which are outside of the scope of this document)
Case Study: UNIDO
11.4.2008 16METIS 2008, Luxembourg: Valentin Todorov
Implementation Strategy
âą Developed in the context of migration from Mainframe to a Client/Server platform
âą A stepwise approach was chosen because of the following reasons:â The project was not urgentâ The software test and sustaining of the new system - in-houseâ Only limited resources/funds were availableâ The staff was very willing to participate in the projectâ The goal was not only to migrate the system but rather to develop
a completely new one and the requirements were not yet completely specified
â A key requirement was that the established UNIDO data services must not be disrupted
Case Study: UNIDO
11.4.2008 17METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps
âą Step 1: High level architecture design, Data model, physical C/S database, definitional metadata toolâ Rigorous analysis of the existing system and development of a
data model - as generic as possible in order to be able to accommodate any subsequent changes
â Based on the data model a loader application was developed which allowed in any moment to synchronize the data in Mainframe and in the Sybase database
â The development of the new metadata subsystem was initiated by implementing a tool for maintenance of the definitional metadata
â Thus a kind of proof of concept was successfully completed
Case Study: UNIDO
11.4.2008 18METIS 2008, Luxembourg: Valentin Todorov
Implementation Steps
âą Step 2: Reference metadata, dissemination applications â A capture/maintenance tool was developed â The description/methodological metadata â Word, Excel - were
entered into the systemâ The Mainframe footnote database (data-item level metadata) was
importedâ Thus the complete process of maintenance of the available
metadata was migrated to the Client/Server platform â Data dissemination applications were developed which allowed to
produce the recurrent statistical publications/products from the Mainframe system and from the Client/Server platform in parallel - an ideal acceptance test for the new applications by just comparing the results
Case Study: UNIDO
11.4.2008 19METIS 2008, Luxembourg: Valentin Todorov
Implementation Stepsâą Example: International Yearbook of Industrial Statistics
â From the Mainframe was produced as a camera-ready line printer output which was glued together with many MS Word and MS Excel documents
â From the Client/Server system a page numbered PDF file of about 700 pages is automatically generated
âą Step 3: Pre-filled questionnaire, data capturing and maintenanceâ Pre-filling of the questionnaire - for a second time from the new
Client/Server data- and metadata-baseâ Development of the data capturing/maintenance tools - now in the
phase of final acceptance testingâ From June 2008 - only the Client/Serve system will be usedâ Ultimate decoupling of the new system from the Mainframe
Case Study: UNIDO
11.4.2008 20METIS 2008, Luxembourg: Valentin Todorov
Metadata classification
No formal metadata classification, but according to their usage and their role in the statistical production process we distinguish roughly between:
âą Structural or definitional metadata: refer to metadata that act as identifiers and descriptors of the data (and metadata)
âą Reference metadata: describe the properties and quality of the statistical data
âą System metadata: used to drive automated processing throughout the phases of the lifecycle
Case Study: UNIDO
11.4.2008 21METIS 2008, Luxembourg: Valentin Todorov
Metadata in the lifecycleâą In each phase of the lifecycle the structural/definitional
metadata are used âą The structural metadata are created/updated relatively
independently from the lifecycleâ Add a new country (e.g. Serbia and Montenegro recently)â Currency change (e.g. Slovenia, Malta and Cyprus recently)â Country groupings: two more countries joined EU (Bulgaria and
Romania)
âą No metadata are created in the first and last phase (Initialisation and Dissemination) but it is possible that corrections are performed
Case Study: UNIDO
11.4.2008 22METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Initialisationâą Pre-filling of the out-going UNIDO General Industrial
Statistics Questionnaire with previously reported statistical data and metadata
âą System metadata: drive the automated processing â Template for the questionnaireâ Languageâ ISIC revisionâ Output format (unit exponent, digits)
âą Operational metadata: stage 1 data used for pre-fillingâą Descriptive, methodological, implicit metadata used for
pre-filling into the questionnaire
Case Study: UNIDO
11.4.2008 23METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection
âą After receiving back the completed questionnaires, they are entered (automatically) in the system for validation and further processing
âą Together with the data the received metadata are entered into the system
âą The provided metadata are sometimes not described from the viewpoint of international comparability but rather from the viewpoint of national standards. In such cases the UNIDO statistical staff re-describes/rearranges the provided metadata into explicit information for the deviation from the international standard
Case Study: UNIDO
11.4.2008 24METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Data Collection (cont.)
âą Metadata can be attached to each data itemâ âMissing because of confidentiality reasonsâ or â combinations of ISIC codes like â1511 includes 1512â
âą Data for OECD member countriesâ collected through joint OECD/UNIDO questionnaire and â transmitted to UNIDO (Excel format)â do not contain metadata (extracted from other OECD
publications)
Case Study: UNIDO
11.4.2008 25METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Transformation
âą The metadata collected from the NSOs together with the data undergoes the same transformation process as the data and is complemented by metadata generated by the transformation process
âą The data transformation is done in five stages - additional description of the data
âą At the same time Source and Method metadata are maintained for each data item
âą If appropriate, re-description of the provided metadata from viewpoint of international comparability is performed
Case Study: UNIDO
11.4.2008 26METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
âą International Yearbook of Industrial Statisticsâ the main UNIDO statistical productâ the latest yearbook released in 2008 covered the data for the
period from 1995 to latest yearâ The country data was updated for 74 countries and is compiled
from the Stage 1 and Stage 2
âą CD products, which might include data from all stages described earlier - www.unido.org/statistics
âą Country Brief - statistics by selected variables from the different UNIDO databases for each member state which are posted in UNIDO web-site: http://www.unido.org/statistics
Case Study: UNIDO
11.4.2008 27METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 28METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 29METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 30METIS 2008, Luxembourg: Valentin Todorov
Metadata in the life cycle: Dissemination
Case Study: UNIDO
11.4.2008 31METIS 2008, Luxembourg: Valentin Todorov
Systems and Design Issues
âą Client/Server architecture build on .Net technologyâą Centralised database:
â Sybase ASE 12.5 on Linuxâ Test and production databases
âą Client (desktop) applications developed using MS Visual studio in C#
âą Commonality through using shareable component libraries â C#
âą Other tools:â SAS, R, STATA
âą Development tools
Case Study: UNIDO
11.4.2008 32METIS 2008, Luxembourg: Valentin Todorov
Organizational and Cultural Issues
âą No specialised metadata roles are necessaryâ processing of metadata and data are tightly coupledâ responsibilities are organized by country
âą No special training for the staff was necessaryâ all statisticians participated actively in the specification and the
development of the systemâ the system testing was performed by parallel runs on the
Client/Server and Mainframe
âą Nevertheless a complete set of documentation and training materials is being preparedâ unifying the terminology and the information about the systemâ induction training of new colleaguesâ operational and maintenance concept documents
Case Study: UNIDO
11.4.2008 37METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizard
View/EditQuestionnaire
Case Study: UNIDO
11.4.2008 38METIS 2008, Luxembourg: Valentin Todorov
Example:DataWizard
View/EditMetadata
Case Study: UNIDO
11.4.2008 39METIS 2008, Luxembourg: Valentin Todorov
Example: R Graphics
Histogram
Sepal.Width
Den
sity
2.0 2.5 3.0 3.5 4.0
0.0
0.4
0.8
1.2
setosa versicolor
4.5
5.5
6.5
7.5
BoxplotS
epal
.Wid
th
setosa versicolor
4.5
5.5
6.5
7.5
4.5 5.5 6.5 7.5
2.0
3.0
4.0
Sepal.Length
Sep
al.W
idth
Bagplot
-2 -1 0 1 2
2.0
3.0
4.0
Normal Q-Q Plot
norm quantiles
Sep
al.W
idth
Scatter Plot Matrix
SepalLength
SepalWidth
PetalLength
setosa
SepalLength
SepalWidth
PetalLength
versicolor
SepalLength
SepalWidth
PetalLength
virginica
Three
Varieties
of
Iris
Case Study: UNIDO
11.4.2008 40METIS 2008, Luxembourg: Valentin Todorov
Example: Implicit metadataâą For example several industry categories can be combined and
reported together by a given country for a given indicator and yearsâą In the questionnaire returned by the NSOs such a combination is
expressed in the following way
âŠ1511 Processing/preserving of meat 1234 a/1512 Processing/preserving of fish ⊠a/1513 Processing/preserving of fruit & vegetables ⊠a/⊠REMARKS: a/ 1511 includes 1512 and 1513
âą âExcludeâ for other country specific classification discrepancies
âą âSubstituteâ for synonyms
âą Aggregations
Case Study: UNIDO
11.4.2008 41METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - I
Case Study: UNIDO
11.4.2008 42METIS 2008, Luxembourg: Valentin Todorov
Example: System metadata in the Initialisation phase - II
Case Study: UNIDO
11.4.2008 43METIS 2008, Luxembourg: Valentin Todorov
Example: Descriptiveand methodologicalmetadata used in the Initialisation/Data Collection phase
Case Study: UNIDO
11.4.2008 44METIS 2008, Luxembourg: Valentin Todorov
Example: Metadataattached to each data item used or created in the Initialisation and Data Collection phase
Case Study: UNIDO
11.4.2008 47METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages
âą Stage 1 â responses to national questionnaires. Detection and if possible correction of obvious reporting errorsâ Used for re-filling the following edition of the questionnaireâ Data are considered official
âą Stage 2 â incorporation of published national data. Inconsistent data are corrected using supplementary information from national publicationsâ Published in International Yearbook of Industrial Statisticsâ Data are considered official
Case Study: UNIDO
11.4.2008 48METIS 2008, Luxembourg: Valentin Todorov
Operational Framework: Stages (cont.)
âą Stage 3 â disaggregation of data. Data are adjusted to eliminate the departures from the level of ISIC aggregationâ using national and international sourcesâ using supplementary data
âą Stage 4 â automatic disaggregation and interpolation. Missing data are estimated applying related proportion or interpolation whenever applicableâ For ISIC 3-digit only
âą Stage 5 â estimation of provisional data for the latest yearsâ Selected variables only
Case Study: UNIDO
11.4.2008 49METIS 2008, Luxembourg: Valentin Todorov
Reference metadata
âą Implicit metadata â a special class of metadata arising throughout the specific usage of other metadata. Typical example are the ISIC combinations
âą Operational Metadata â generated by the process of data transformation and attributed to the respective data itemsâ a stage indicator reflecting the data itemâs credibilityâ âSourceâ and âMethodsâ metadata, describing the source of the
data item and methods applied for its generation
Case Study: UNIDO
11.4.2008 50METIS 2008, Luxembourg: Valentin Todorov
Reference metadata (cont.)
âą Descriptive and Methodological metadata â received from the primary data reporters and than are further processed together with the data.â During this processing additional metadata can be added.â Can be attached to all possible levels ranging from the complete
data set down to individual data items.
Case Study: UNIDO
11.4.2008 51METIS 2008, Luxembourg: Valentin Todorov
System metadataâą Used to drive automated processing throughout the
phases of the life cycle.â layout definitions for the yearbook (for each country, for each
edition of the yearbook).â country lists, used in the automatic generation of the PDF.â installation and packaging lists, directories, templates, etc. for
creation of the CD product.â specific for the application where they are used.