1. Model description

LABSim is a rich dynamic microsimulation model of individual and household life course events, which means that it simulates individual units over time and allows for individual characteristics to be changed according to the processes specified within the model. One of the key innovations in LABSim is its linkage with UKMOD / EUROMOD, a static tax-benefit microsimulation model used to evaluate the immediate distributional impact of policy changes (the “morning after” effect). The static model allows ex-post evaluation of the policies put in place in the aftermath of Covid-19, as well as ex-ante evaluation of hypothetical policy changes. When combined with the dynamic model, as done in LABSim, it allows for the policy changes to be applied to an evolving  population.

Input and output data

LABSim uses 3 types of data as input:

  1. the initial population(s) to be evolved over time
  2.  donor populations from the static tax-benefit model (UKMOD) to provide data on the effects of particular policy schedules. Each year, simulated individuals and households are statistically matched to individuals and households from the donor population.
  3. estimated parameters of the processes modelled in the simulation (further described in the “Processes modelled” section)

The initial population(s) and estimated parameters of processes are based on the Understanding Society survey, Waves 1 to 8.

The donor population is obtained from the UKMOD / EUROMOD models and composed of a single text file for each policy variant for each year. For example, set 1 of policies in year 2011, set 2 in year 2011, set 1 in year 2012, etc. Only a single set of policies can be applied in any single year, although a single set of policies can be applied to multiple years. The simulation allows the policy schedule to be easily modified between runs, for example in one run set 1 of policies can apply in 2011, and in another run set 2 of modified policies can apply in the same year.  

Output produced by the model consists of an SQL database tables and / or CSV files at the individual, benefit unit, and household level, which can be linked through the unique identifiers. The output files contain the values of simulated variables for each individual unit in each year of the simulation, effectively producing a “synthetic” panel dataset.

Key modelling assumptions

Conditional independence assumption: all processes are modelled as independent; however, they are based on lagged  variables determined by other processes.

We use a partial equilibrium model of labour supply, which means that we model labour supply (worker side of the market) but not labour demand (firm side of the market).

The processes are ordered as in Figure 1, however as the simulation and the estimated processes are sampled at yearly frequency, the sequence of events is arbitrary.

Processes modelled

Figure 1 shows the structure and order of processes modelled in LABSim. Each year, the simulation begins with the Demographic module (which models ageing, leaving parental home, and retirement decision) and ends with the calculation of disposable income / consumption. In this section each of these processes is described in more detail.

Stochasticity is incorporated in the following way: i) for linear models, a Gaussian random number multiplied by the residual standard deviation of the regression is added to the calculated score; ii) for logit and probit models, the probability of an event is calculated by drawing a Boolean whose value is true with probability equal to the logit or probit transforms of the linear regression score of the corresponding model; iii) in case of multinomial logit and probit models – used to determine the outcome of random events, where the outcome is taken from a finite set of possible outcomes –  the logistic or probit transform of the linear regression scores for N-1 outcomes are compared with the Nth outcome deemed to have a score of 0. From this, relative probabilities of outcomes are created, which can then be sampled to determine which of the N outcomes occurs.

To address the issue of parameter uncertainty, regression coefficients of the model can be bootstrapped. Bootstrapping involves sampling the set of regression coefficients of a regression object from a multivariate normal distribution whose vector of expected values (means) are the set of regression coefficients estimated from the data, with the covariance matrix derived from the statistical error of the estimates. The following pages provide additional details: Regressions classes , Uncertainty analysis

Demographic module

Ageing

Every simulated year, the age of individuals increases by 1. Population alignment is then performed to adjust the number of individuals by age, gender, and region to the past data/projections from ONS. In cases where there are too many individuals in a given age-gender-region cell, individuals are removed from the simulation at random. If there are too few individuals, new individuals are created by copying existing individuals from the same age-gender-region cell at random. If there are no individuals to copy from, the age is gradually relaxed by +/- 1, until a match is found. Alignment is performed at individual level; however, we try to recreate the household characteristics based on the data of the cloned person, attempting to find partners for individuals who were partnered, and assign cloned children to cloned mothers.

Leaving parental home

Individuals who become 18 years old set up new benefit units and consider leaving their parental home, if not in education. The probability of leaving home is based on a probit model conditional on sex, age, age squared, level of education, lagged employment status, lagged household income quintile, region, and year (reflecting time trend observed in the data). Individuals who stay at home become adult children and can leave home in any subsequent year.

Retirement

Individuals above 50 years old consider retirement, with the probability based on a probit model estimated separately for couples and singles. These are based on sex, age, age squared, level of education, dummy indicating if individual is above state pension age (allowing for past and planned changes in the state pension age), lagged employment status, lagged household income quintile, lagged disability status, dummy indicating if spouse is above state pension age, lagged employment status of the spouse, interaction term between reaching pension age and lagged employment status, and year. (For single individuals, the variables relating to the status of the spouse are omitted from the model). Retired individuals do not work any hours and retirement is an absorbing state (retired individuals cannot return to work). Retirement before reaching state pension age is allowed.

Education module

Student status

Individuals aged between 16 and 29 who have always been in education consider leaving school with probability determined by a probit model conditional on sex, age, age squared, mother’s and father’s education, region, and year. Individuals who are 30 years old and still in education are forced to leave school.

Individuals aged 16 – 45 who are not students can re-enter education with probability determined by a probit model conditional on sex, age, age squared, lagged level of education, lagged employment status, lagged number of children in the household, lagged number of children aged 0 – 2 in the household, mother’s and father’s level of education, region, and year. Students are not allowed to work. Those who returned to education can leave in any subsequent year, with the possibility of increasing their level of education.

Educational achievement

Individuals leaving school (as determined by student status process above) have level of education set based on a Multinomial Probit model conditional on sex, age, age squared, mother’s and father’s level of education, region, and year. We assume that the level of education cannot decrease. 

Health module

Health status

The overall health status is based on the self-rated health (5 categories, from Poor to Excellent). Level of health status is determined according to the outcome of a weighted least squares regression, where the level of health is assumed to be linear. The prediction is conditional on sex, age, age squared, lagged household income quintile, lagged health status, region, and year for those in continuous education, and additionally on the level of education, lagged employment status, and lagged household composition for those not in education.

Disability

Any individual aged 16 and above who is not in continuous education can become disabled or long-term sick with probability given by a probit model conditional on the health status, sex, age, age squared, level of education, lagged household income quintile, lagged self-rated health status, lagged disability status, lagged household composition, region, and year.

Household composition and maternity module

Partnership formation:

Individuals above 18 who do not have a partner decide whether to enter a partnership based on the outcome of a probit model conditional on: i) sex, age, age squared, lagged household income quintile, lagged number of children, lagged number of children aged 0 – 2, lagged self-rated health status, region, and year if they are in continuous education, or ii) level of education and lagged employment status in addition to the variables listed in i) if they are not students or are students who have returned to school.

Females who are partnered and not in education consider exiting partnership with a probability determined by a probit model conditional on age, age squared, level of education, lagged personal gross non-benefit income and its square, lagged number of children, lagged number of children aged 0 – 2, lagged self-rated health status, lagged level of education of the spouse, lagged self-rated health status of the spouse, lagged difference between own and spouse’s gross, non-benefit income, lagged duration of partnership in years, lagged difference between own and spouse’s age, lagged household composition, lagged own and spouse’s employment status, region, and year.

Individuals who decide to enter a partnership are matched using either a parametric or non-parametric process. In the (default) parametric process, males are matched with females with a probability corresponding to a matching score, calculated in the following way:

matching score = earningsScore2 + ageScore2, where

earningsScore = (male's potential earnings - female's potential earnings) - female's desired difference in potential earnings, and

 ageScore = (male's age - female's age) - male's desired difference in age.

The non-parametric process aims to replicate the distribution of matches observed in the data between different types of individuals, where a type is defined as a combination of sex, region, education level, and age. The distribution of matches between different types is adjusted using an iterative proportional fitting procedure constrained by the marginal frequencies of each type observed in the simulation.

Fertility

Females aged 18 to 44 who have a partner can give birth to a child with a probability determined by a probit model conditional on: i) age, age squared, UK General Fertility Rate (in a given year, projected for future years), lagged household income quintile, lagged number of children, lagged number of children aged 0 – 2, lagged health status, and lagged partnership status for those aged 18 to 29 who were in continuous education, and ii) lagged employment status, level of education, and region for those who were not in continuous education. The inclusion of the UK’s fertility rate implies this is a model of differential fertility, where the overall change in fertility projected by the statistical authority is distributed across individuals according to their observable characteristics.

Non-labour income module

Capital income and pensions

Individuals above 16 years old receive non-employment non-benefit income (capital income) determined by an outcome of a weighted least squares regression, conditional on i) sex, age, age squared, lagged health status, lagged gross employment income, lagged capital income, region, and year if they are in continuous education,  or ii) (in addition to variables used in i)) level of education, lagged employment status, and lagged household composition.

Retired individuals receive pension income determined by an outcome of a weighted least squares regression conditional on sex, age, age squared, level of education, lagged employment status, lagged household composition, lagged health status, lagged gross employment income, lagged capital income, region, and year.

Labour supply module

We use a random utility labour supply model. This is a static labour supply model, in a sense that labour demand and wage rates are assumed fixed, but a detailed representation of the individual and household budget constraint is considered. Individuals maximise their utility from income and leisure over a restricted number of alternatives (5 for individuals, and 25 for couples as utility is determined at the benefit unit level). Utility is uniperiodal (although inter-temporal considerations might be captured by time varying characteristics such as age). The model is unitary, which means that decision making is at the level of the benefit unit – for example, one single choice involving labour supply decisions for each partner is made by couples, in order to maximise their joint utility. The utility of the household depends on disposable household income, and on the number of hours worked by each household component.  Labour market income is computed, for each of the alternatives and for each individual component of the household, by multiplying the number of hours for a predicted  hourly  wage,  as  estimated  by  means  of  a  Heckman-corrected  wage  equation. Individual labour market income is then transformed into disposable household income by following a procedure described below, in the “Disposable income and consumption” subsection.

Disposable income and consumption

Disposable income is calculated after the labour supply module, by multiplying benefit unit’s gross income (both labour income and non-labour, non-benefit income) by a ratio of disposable income to gross earnings of a closest matching donor. Donor households are obtained from the UKMOD’s population, and their disposable to gross income ratio depends on the tax-benefit schedule in place in a given year (specified and adjustable at the beginning of the simulation). Simulated benefit units are matched to donor benefit units on a number of key characteristics: labour supply hours of adult members of the benefit unit, health status, number of dependent children, region of residence, and age of adult members. We first attempt to find an exact match on all characteristics, but if there are no matching observations, we relax the requirement of exact matching characteristics one-by-one (starting with age), instead replacing it with a minimum distance procedure. For example, if no exact match can be found on all key characteristics, we will look for an exact match on labour supply, health, number of children and region, and attempt to minimise the difference in age between the simulated and donor benefit unit.

Yearly equivalised disposable income is calculated by adjusting the sum of monthly disposable income by household composition by equivalised weight, calculated using the OECD-modified equivalence scale. Yearly equivalised consumption is equal to yearly equivalised disposable income for retired individuals, and to (1 – saving rate)*yearly equivalised disposable income otherwise.

Model outputs

LABSim implements a Model-Collector-Observer structure. The model creates and manages objects and relationships between them and defines the order of events that take place in the simulation. The collector collects the data from simulated objects and computes statistics, both for use by the simulation and for analysis of the model outcomes after the simulation has completed. The observer allows user to inspect the simulation in real time and monitor several pre-defined outcome variables.

Simulated objects are persisted to an underlying database, which can be explored using a database explorer tool embedded in the simulation software. The simulated data can also be exported to a set of csv files (individuals, benefit units, households, aggregate statistics) that create a “synthetic”, forward-looking, panel dataset that can be analysed in standard statistical software.