Introduction to Data Analysis using Stata analytical tool

Course Overview:

This short course is designed for students and practitioners from industry and academia, and agents from local and international non-governmental organisations (NGOs) who do not have (or have little) training in basic statistics and knowledge of data analysis. Specifically, the course deals with data management, including tabulation of data, descriptive statistics, inferential statistics, and regression analysis. It also deals with problems arising from dynamics and combining the time and space dimension in statistical data analysis. In particular, we will work with aggregated time-series data first and then aggregated time-series cross-sectional data e.g. geographic/administrative units over time (panel data).

This data structure has the advantage of allowing for testing highly general theories with a wide scope but renders data analysis more complicated because one has to consider the time-series aspects (dynamics) and cross-sectional aspects (spatial correlation/unit heterogeneity) at the same time.

The course confronts the problems arising from this complex data structure and also provides techniques to control and account for specific complications. Finally, the course will provide some introductive knowledge concerning survey data analysis. Here, we will focus more on the issue related to the analysis of poverty, inequality, and inclusive growth.

The course combines a more theoretical introduction with practical analysis of diverse data sets using STATA. Participants are encouraged to bring their own data sets and we are happy to schedule time for discussing unique time series and/or panel problems you may have.

 

Course Objectives

The course does not requires prior knowledge of inferential and is designed to further develop the understanding of statistical problems arising from the complex structure of pooled data.

The course mostly deals with questions of specification and model choice and is therefore a practical course which should enable students to link their empirical models closer to their theoretical arguments and make model choices that are adequate for the data structure at hand.

The course materials are designed to help participants to solve their own estimation problems and increase the reliability and efficiency of their statistical results. The course is targeted at social scientists and business academics with little statistical skills and a strong interest in applied empirical research and data analysis.

Facilitator: Dr Delphin Kamanda Espoir

Course schedules

Module 1: Introduction to Stata

  • Introduction to Stata interface
  • Opening data files in Stata
  • Importing data into Stata
  • Saving data files in Stata

Module 2: Data Management in Stata

  • Creating new variables
  • Labelling variables
  • Recoding variables
  • Merging datasets

Module 3: Descriptive Statistics

  • Measures of central tendency (mean, median, mode)
  • Measures of dispersion (variance, standard deviation, range)
  • Frequency distributions and histograms

Module 4: Inferential Statistics

  • Probability distributions (normal distribution, t-distribution, etc.)
  • Confidence intervals
  • Hypothesis testing (one-sample t-test, two-sample t-test, chi-squared test, etc.)

Module 5: Correlation Analysis in Stata

  • Correlation coefficient
  • Scatterplots
  • Significance testing for correlation coefficient

Module 6: Regression Analysis in Stata

  • Simple linear regression models
  • Fitting regression models in Stata
  • Interpreting regression output
  • Multiple regression models
  • Model selection techniques
  • Interpreting multiple regression output

Module 7: Binary Logistic Regression

  • Binary logistic regression models
  • Fitting binary logistic regression models in Stata
  • Interpreting logistic regression output

Module 8: Panel Data Analysis

  • Introduction to panel data
  • Fixed effects models
  • Random effects models
  • Hausman test

Module 9: Survey data analysis

  • Measures of poverty
  • Measure of inequality
  • Growth-poverty-inequality nexus

Module 10: Final Project

Participants will work on a data analysis project using real-world datasets, applying the concepts and techniques learned in the course. Project topics may be chosen from a list of options provided by the instructor or developed by the participants in consultation with the instructor. Participants will present their final projects to the class and receive feedback from their peers and the instructor.