Model for Asia Countries using Global Competitiveness Index (GCI) Data

Date of analysis: April 1, 2008

Table of Contents

  1. Problem statement
  2. Data processing
    2.1. Data sets
    2.2. Dependent variables and Independent variables
  3. Analysis
    3.1 Scatter plot diagram
    3.2 Boxplot: Data distribution
    3.3 Decision Tree
  4. Summary

Problem statement

The World Economic Forum annually publishes the Global Competitiveness Index (GCI) data from around the world. We obtained the GCI data from 43 countries for a total of six years from 2002 to 2007. When considering countries with high ranked countries are advanced countries, I can find valuable insights of the advanced countries. This can be used as a model to suggest a direction for developing countries to become advanced countries. In this analysis, I focus on Asia countries.

Data processing

Data sets

GCI_data_with_ranking.xlsx
year2002_dataset.xlsx
year2003_dataset.xlsx
year2004_dataset.xlsx
year2005_dataset.xlsx
year2006_dataset.xlsx
year2007_dataset.xlsx

Dependent variables and Independent variables

I excluded China, Mongolia, and Luxembourg, because they contained some of missing values. Therefore, there are a total of 40 countries in the data each year. The names of 13 Asia countries are listed as follows: Bangladesh, India, Indonesia, Japan, Korea (South), Malaysia, Taiwan, Thailand, Vietnam, Singapore, Philippines, Sri Lanka, and Hong Kong.

The following variables were collected for a total of 78 instances (13 countries x 6 years). The GCI values ​​should be normalized in order to cross-compare data by year. Therefore, each GCI value is normalized such that the average value of the overall score of each year is 10.

Dependent Variables (DVs)

  • overall_score
    • absolute value and normalized
  • ranking
    • relative value (1 ~ 40)

Independent Variables (IVs)
Each country instance has 11 structured attributes. The three attributes of the Basic Requirement (BR), the Efficiency Enhancer (EE), and the Innovation and Sophistication Factors (IF) are the higher level attributes and the remaining attributes are the lower level attributes.

Level 1Level 2
Basic Requirement (BR)Institutions
Infrastructure
Macroeconomic Stability
Efficiency Enhancers (EE)Higher Education and Training
Market Efficiency
Technical Readiness
Innovation and Sophistication Factors (IF)Business Sophistication
Innovation

Analysis

Scatter plot diagram

All three independent variables, BR, EE, and IF, affect the dependent variables of the overall score and the ranking.

For 13 Asian countries, I plotted six scatter plot diagrams with overall score and ranking on Y-axis and BR, EE, and IF on X-axis.

Scatter plot diagrams for Asia countries on overall_score
Each diagram showed an ellipse shape in which the value the overall score is increased (Y-axis) when BR, EE, and IF is increased (X-axis).

  • (X-axis: Basic Requirements (BR), Y-axis: overall_score) research_asiamodelplot1

  • (X-axis: Efficiency Enhancers (EE), Y-axis: overall_score) research_asiamodelplot2

  • (X-axis: Innovation and Sophistication Factors (IF), Y-axis: overall_score) research_asiamodelplot3

Scatter plot diagrams for Asia countries on ranking
Each diagram showed an ellipse shape in which the value the ranking is decreased (Y-axis) when BR, EE, and IF is increased (X-axis). Please note that in the case of ranking, the smaller the number represent the more advanced countries.

  • (X-axis: Basic Requirements (BR), Y-axis: ranking) research_asiamodelplot4

  • (X-axis: Efficiency Enhancers (EE), Y-axis: ranking) research_asiamodelplot5

  • (X-axis: Innovation and Sophistication Factors (IF), Y-axis: ranking) research_asiamodelplot6

Boxplot: Data distribution

Infrastructure, Technical Readiness, Institutions, and Innovation have a large distribution of data.

research_asiamodelboxplot

Decision Tree

K-means clustering

  • Advanced countries
    Asia top 6 countries (Cluster K=2):Taiwan, HongKong, Japan, Korea, Singapore, Malaysia
  • Middle-level countries
    Asia second-top 5 countries (Cluster K=3):Sri Lanka, India, Thailand, Vietnam, Indonesia
  • Lowest-level countries
    Asia low-ranked 2 countries (Cluster K=1):Bangladesh, Philippines

I use K-means clustering to distinguish the data set and find the characteristics for each cluster. The size of K starts from 2 and clustering is repeated until the lift value is no longer increased. I choose the cluster K = 3 because the lift value is increased, and the corresponding precision and recall values are also reasonable.

  • Metrics
 Predicted yesPredicted no
Trueab
Falsecd
- precision = a/(a+c)  
- recall = a/(a+b)  
- lift = a/(a+c) / (a+b)/(a+b+c+d)
  • cluster K = 2
cluster #instances #Countriesliftprecisionrecall
143Bangladesh, India, Thailand, Vietnam, Philippines, Sri Lanka, Indonesia
(7/13)
(42/43) / (42/78) = 1.81(42/43) * 100 = 97.6(%)(42/42) * 100 = 100(%)
235Japan, Korea, Malaysia, Singapore, Taiwan, HongKong
(6/13)
(35/35) / (36/78) = 2.17(35/35) * 100 = 100(%)(35/36) * 100 = 97.2(%)
  • cluster K = 3
cluster #instances #Countriesliftprecisionrecall
115Bangladesh, Philippines
(2/13)
(10/15) / (12/78) = 4.33(10/15) * 100 = 66.7(%)(15/12) * 100 = 100(%)
235Japan, Korea, Malaysia, Singapore, Taiwan, HongKong
(6/13)
(35/35) / (36/78) = 2.17(35/35) * 100 = 100(%)(35/36) * 100 = 97.2(%)
328India, Thailand, Vietnam, Sri Lanka, Indonesia
(5/13)
(25/28) / (30/78) = 2.32(25/28) * 100 = 89.2(%)(28/30) * 100 = 93.3(%)

CRT (Classification and Regression Tree)

To become the advanced countries, the Infrastructure is the most important factor. Innovation and Sophistication Factors (IF) determines the boundaries between the lowest-level and middle-level countries.

  • CRT tree based on the clusters of K = 1, 2, 3
    Significance-level for splitting nodes: 0.05, merging: 0.05 Minimum number of cases - parent node: 20, child node: 10 research_asiamodelCRT
    The Node 2 represents the Asia top 6 countries, so it can be said that the infrastructure is very critical to become the advanced countries. (Infrastructure> 10.055). The rest of the countries are allocated at Node 1 in which the Innovation and Sophistication Factors (IF) determines the boundaries between the lowest-level and middle-level countries.

  • Model summary
    research_asiamodelCRTsummary

  • Validation (10-Fold Cross Validation)
    research_asiamodelCRTvalidation

Summary

From the data set of the Global Competitiveness Index (GCI), the key factors that determine the national competitiveness and productivity have been identified. I notice that the Infrastructure is the most important factor to become the advanced countries. Innovation and Sophistication Factors (IF) is another critical factor to determine the lowest-level and middle-level countries. From the result of the analysis, to become advanced countries, I suggest developing countries should strengthen the capabilities of infrastructure, business sophistication, and innovation.