University of Michigan, bmodene@umich.edu. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1256260. Any opinions, findings, and conclusions or recommendations expressed in (2024)

\useunder

\ul\settimeformatampmtime\mmddyyyydate

Jamie Fogel and Bernardo Modenesi

Abstract

Recent advances in the literature of decomposition methods in economics have allowed for the identification and estimation of detailed wage gap decompositions. In this context, building reliable counterfactuals requires using tighter controls to ensure that similar workers are correctly identified by making sure that important unobserved variables such as skills are controlled for, as well as comparing only workers with similar observable characteristics. This paper contributes to the wage decomposition literature in two main ways: (i) developing an economic principled network based approach to control for unobserved worker skills heterogeneity in the presence of potential discrimination; and (ii) extending existing generic decomposition tools to accommodate for potential lack of overlapping supports in covariates between groups being compared, which is likely to be the norm in more detailed decompositions. We illustrate the methodology by decomposing the gender wage gap in Brazil.

1 Introduction

Significant attention has been paid to the gap in wages between men and women. Researchers are interested in understanding how much of the gap is due to men and women performing different work using different skills, and how much is due to men and women being paid differently for similar work. A number of methods exist for trying to answer this question. These methods decompose gender wage gaps into a portion explained by differences in characteristics between men and women, and a portion explained by differences in the return to characteristics, or “discrimination”. However, all of these methods rely on three assumptions. First, they assume that unobserved determinants of earnings are independent of gender. To the extent that there exist unobserved worker characteristics that are important for determining wages and are correlated with gender, then researchers will obtain biased estimates of the return to observable characteristics. As a result, decompositions of gender wage gaps into a component explained by covariates and a component explained by the return to covariates will be incorrect. Second, they assume a functional form in order to estimate the function that maps observable characteristics into wages and thus serves as the foundation for counterfactuals that ask what men would earn if they had the same characteristics except their gender were switched to female, and vice versa. Third, they assume that the covariates for male workers and female workers share a common support. While this is likely to hold when the number of covariates is small, as more covariates are added (possibly to satisfy the independence assumption) the common support assumption becomes more likely to be violated.²²2As more covariates are added it becomes harder to find another worker who shares the same values of all covariates.

In this paper, we (i) propose a new method for identifying unobserved determinants of workers’ earnings from the information revealed by detailed data on worker–job matching patterns, (ii) non-parametrically estimate counterfactual wage functions for male and female workers, (iii) allow for a relaxation of the common support assumption, and (iv) apply our methods by decomposing the gender wage gap in Brazil using improved counterfactuals based on (i), (ii) and (iii). We find that the Brazilian gender wage gap is almost entirely explained by male and female workers who possess similar skills and perform similar tasks being paid different wages, not women possessing skills or tasks that pay relatively lower wages.

To understand the problem created by unobserved determinants of productivity, suppose that there are three types of worker characteristics that are relevant for determining wages: gender, other characteristics observable to researchers, and characteristics that are observable to labor market participants, but not to researchers. A naive wage decomposition would simply compare male wages to female wages and attribute all differences to the effect of gender. A more common approach would condition on observable characteristics like age, experience, occupation, education, and union membership and would attribute all differences in wages, conditional on these characteristics, solely to being a woman as opposed to being a man. However, this would miss the fact that even workers with identical observable covariates may perform distinct labor. As Goldin (2014) shows, male lawyers significantly outearn female lawyers largely because males are more likely to work long, inflexible hours, which leads to high wages. Therefore, if we simply compared the wages of male lawyers to the wages of female lawyers, we might mistakenly conclude that male and female lawyers receive differential pay for the same work, when in fact male and female lawyers perform different types of legal work. In other words, male and female lawyers differ in terms of covariates that are observed by labor market participants but not by researchers.

The key to our approach is identifying information about worker characteristics observable to labor market participants, but not to researchers, directly from the behavior of labor market participants. If we can identify groups of workers and groups of jobs who are similar from the perspective of labor market participants, then we can be confident that any gender wage differentials within these groups are due to differential returns to labor market activities by gender, rather than differences in the work done by male and female workers.

We employ a revealed preference approach that relies on workers’ and jobs’ choices, rather than observable variables or expert judgments, to classify workers and jobs into groups. Our key insight is that linked employer-employee data contain a previously underutilized source of information: millions of worker–job matches, each of which reflects workers’ and jobs’ perceptions of the workers’ skills and the jobs’ tasks. Intuitively, if two workers are employed by the same job, they probably have similar skills, and if two jobs employ the same worker those jobs probably require workers to perform similar tasks. However, since discrimination may lead men and women with similar skills to sort into different jobs, our method includes a correction for gender-based sorting into jobs that normalizes workers’ job match probabilities by the match probabilities for their gender.

We formalize this intuition and apply it to large-scale data using a Roy (1951) model in which workers supply labor to jobs according to comparative advantage. Workers belong to a discrete set of latent worker types defined by having the same “skills” and jobs belong to a discrete set of latent markets defined by requiring employees to perform the same “tasks.”³³3“Skills” and “tasks” should be interpreted broadly as any worker and job characteristics that determine which workers match with which jobs. Workers match with jobs according to comparative advantage, which is determined by complementarities between skills and tasks at the worker type–market level. Workers who have similar vectors of match probabilities over markets are therefore revealed to have similar skills and belong to the same worker type, and jobs that have similar vectors of match probabilities over worker types are revealed to have similar tasks and belong to the same market. Our model extends the model in Fogel and Modenesi (2023) to allow firms to have labor market power, thereby rationalizing pay heterogeneity among workers with the same skills in jobs requiring the same tasks and microfounding the correction for gender-based sorting.

Once we have clustered workers with similar skills into worker types and jobs requiring similar tasks into markets, we turn to estimating counterfactual wage functions. Traditional decomposition methods estimate counterfactual female earnings by fitting wage regressions using observations for male workers only, but generating predicted values by multiplying average female covariate values by the male regression coefficients. This approach suffers from three main issues: (i) it requires the researcher to impose a restrictive regression functional form; (ii) it does not necessarily allow for heterogeneous returns to covariates in predictions; and (iii) it does not have embedded tools to handle when workers do not share similar covariate support. Taken together, these issues can potentially bias the counterfactual estimation exercise, which is the foundation of gender wage gap decompositions. In order to circumvent these issues, we make use of a flexible matching estimator for counterfactual earnings.

We implement a matching estimator in which we match male and female workers who belong to the same worker type and are employed by jobs in the same market. In doing so, we implicitly assume that worker types and markets fully account for all factors, other than gender, that affect workers’ wages, although we also estimate specifications in which we include other observable characteristics in addition to worker types and markets. Within these matched groups, we use the male workers’ mean wages as counterfactuals for what the female workers would have earned if they were male, and vice versa. We compare our matching estimator to a standard estimator and find similar results, although in some specifications the matching estimator is clearly preferable. However, there may be some worker type–market cells with no male workers or no female workers so we introduce a correction to account for this lack of common support.

We address the issue of a lack of common covariate support between male and female workers by decomposing the gender wage gap into four components: (i) differences due to different covariate distributions between groups, i.e. the composition factor, for observations that share the same support; (ii) differences related to differential returns to covariates between groups over a common support of the covariates, i.e. the structural factor, often associated with labor market discrimination; (iii) a part due to observations from male workers being out of the female workers’ support of the covariates; and (iv) the last portion related to observations of female workers being out of the male workers’ support of the covariates. This decomposition allows us to perform counterfactuals similar to existing methods for the part of the distribution of the covariates for which male and female workers have common support, yet it still allows us to quantify how much of the gender wage gap occurs outside the region of common support and would therefore be ignored by standard decomposition methods.

We estimate our model and conduct empirical analyses using Brazilian administrative records from the Annual Social Information Survey (RAIS) that is managed by the Brazilian labor ministry. The RAIS data contain detailed information about every formal sector employment contract, including worker demographic information, occupation, sector, and earnings. Critically, these data represent a network of worker–job matches in which workers are connected to every job they have ever held, allowing us to identify job histories of workers, their coworkers, their coworkers’ coworkers, and so on. We restrict our analysis to the Rio de Janeiro metropolitan area both for computational reasons and because restricting to a single metropolitan area enables us to focus on skills and tasks dimensions of worker and job heterogeneity rather than geographic heterogeneity.

In our data, the average male worker earns a wage 16.7% higher than the average female worker. Our primary result is that almost the entire gender wage gap is attributable to male and female workers who possess similar skills and perform similar tasks being paid differently, or what is often referred to as “discrimination.” This is true at the aggregate level, and remains true when we perform wage decompositions within each worker type–market cell, indicating that this is a widespread phenomenon, not one driven by large wage differentials in small subsets of the labor market. We find that wage decompositions based on standard observable variables suffer from omitted variable bias, emphasizing the need for detailed worker and job characteristics in the form of worker types and markets. We find that wage decompositions based on linear regressions yield similar findings to those based on matching when a lack of common support is not an issue, however when male and female workers’ characteristics do not share a common support the matching estimator with corrections for a lack of common support outperforms alternatives.

Literature: The literature of decomposition methods in economics can be classified into two main branches. The first decomposes average differences in a variable of interest $Y$ — often wages — between two groups of workers. The most widespread method in this class was developed by Oaxaca (1973) and Blinder (1973). The second branch decomposes functionals of the variable of interest $Y$ – e.g. its distribution or quantile function. Given that functionals of a variable often provide more information than its average, the second group of decompositions is referred to as “detailed decompositions” (Fortin et al. 2011). A seminal paper in this group is DiNardo et al. (1996)⁴⁴4Barsky et al. (2002) develop a methodology similar to DiNardo et al. (1996), focusing on issues that arise from lack of common covariate support between the groups in the decomposition. Modenesi (2022) discusses their approach in light of alternatives to handle the lack of common support. and their methodology and inference was further generalized and improved later by Chernozhukov et al. (2013)⁵⁵5Firpo et al. (2018) later in this literature uses influence functions to propose a detailed decomposition that is invariant to the order of the decomposition.. We follow the first branch of the literature in focusing on average differences, largely because our rich set of controls introduces a curse of dimensionality that renders detailed decompositions infeasible.

Our method for handling a lack of common covariate support follows Ñopo (2008) and Garcia et al. (2009)⁶⁶6Garcia et al. (2009) and Morello and Anjolim (2021) both study the evolution of the Brazilian gender gap. Garcia et al. (2009) uses the same approach we use to handle the problem of lack of overlapping supports, and Morello and Anjolim (2021) have a similar matching methodology to decompose the gender gap. In addition to using similar methods for the decomposition, we add the skills and tasks controls derived from the labor market network, and we derive a distribution of gender gaps for different clusters of similar workers performing similar tasks.. In concurrent work we extend Ñopo (2008) to generic “detailed decompositions” (Modenesi 2022).

Our model of labor market power builds on Card et al. (2015), Card et al. (2018) and Gerard et al. (2018) but allows for significantly more granular worker and job heterogeneity. The way we model multidimensional worker–job heterogeneity relates to papers that use a skills-tasks framework in the worker-job matching literature (Autor et al. 2003; Acemoglu and Autor 2011; Autor 2013; Lindenlaub 2017; Tan 2018; Kantenga 2018). Our method for clustering workers and jobs fits into the relatively recent literature in labor economics that extracts latent information from the network structural of the labor market (Sorkin 2018; Nimczik 2018; Jarosch et al. 2019) and directly extends Fogel and Modenesi (2023) by allowing for labor market power. Methodologically, we draw from the community detection branch of network theory (Larremore et al. 2014; Peixoto 2018; 2019)⁷⁷7More precisely, we employ a variant of the SBM which makes use of network edge weights (Peixoto 2018), which are key for us to model the presence of potential discrimination in the labor market.. Our paper connects to this literature by formalizing a theoretical link between monopsonistic labor market models and the stochastic block model, providing microfoundations and economic interpretability of network theory unsupervised learning tools in order to solve economic problems.

By controlling for skills and tasks, our papers share common ground with Goldin (2014) and Hurst et al. (2021). Goldin (2014) indicates that the potential residual discrimination in the gender wage gap is due to the nature of the tasks in some occupations, by using a linear regression approach dummies for occupation interacted with the gender dummy. We add to her approach by proposing an economic model for discrimination, which provides us with both worker and job heterogeneity controls, in addition to performing the gender gap decomposition while taking into account potential violations of conventional decomposition assumptions. Hurst et al. (2021) on the other hand are assessing the black-white wage gap over time as function of changes in the taste vs statistical discrimination factors, as well as the result of workers sorting after these changes.

Roadmap: The paper proceeds as follows. Section 2 introduces a simple framework for decomposition methods. Section 3 presents our model of worker–job matching and derives from it our algorithm for clustering workers into worker types and jobs into markets. Section 4 provides greater detail on the wage gap decomposition methods we employ. Section 5 describes our data. Section 6 presents results. Finally, Section 7 concludes.

2 A framework for decomposition methods

We introduce a simple framework for decomposition methods to guide the analysis in this paper. Define the actual wage of worker $i$ employed by job $j$ as $Y_{ij}$ , and let $G_{i}$ be a dummy denoting whether worker $i$ is male. The difference between the average wage for male workers and the average wage for female workers, which we call the “overall wage gap,” can be expressed as:

\Delta:=E[Y_{ij}|G_{i}=1]-E[Y_{ij}|G_{i}=0]

(1)

The overall wage gap above can be decomposed into two factors: differences in productivity between male and female workers, usually referred to as the composition factor; and differences in pay between equally productive male and female workers, known as the structural factor. We use the potential outcomes framework in order to formally decompose the overall wage gap into these two factors. Denote by $Y_{0ij}$ the potential wage of worker $i$ employed by job $j$ when the worker is female, and $Y_{1ij}$ the potential wage of worker $i$ employed by job $j$ when the worker is male. Let $x$ be the vector of all variables that determine workers’ productivity. We assume that the worker’s gender may affect their pay, but does not directly affect their productivity. We represent the potential outcomes as functions of $x$ as follows: $Y_{gij}:=Y_{g}(x_{ij}),g\in\{0,1\}$ . Notice that $x$ has both $i$ and $j$ subscripts, as the marginal product of worker $i$ at their current job $j$ depends on both the worker’s skills and the job’s tasks. The fact that there is a different earnings function for men and women reflects the possibility that male and female workers with identical productivities may be paid differently. Furthermore, it is possible to use the dummy for gender to represent observed wages as a function of potential outcomes using a switching regression model $Y_{ij}:=G_{i}Y_{g}(x_{ij})-(1-G_{i})Y_{g}(x_{ij})$ .

At this point we are able to decompose the overall wage gap, $\Delta$ , into the composition and structural components mentioned above by adding and subtracting the quantity⁸⁸8Analogously, the overall decomposition can be performed by adding and subtracting the male counterfactual quantity $E[Y_{0}(x_{ij})|G_{i}=1]$ to $\Delta$ . The main results in this paper use the female counterfactual approach.

E[Y_{1}(x_{ij})|G_{i}=0]:=\int Y_{1}(x_{ij})dF_{G=0}(x)

from the overall wage gap $\Delta$ , where $F_{G=0}(x)$ is the productivity distribution for female workers. Intuitively, $E[Y_{1}(x_{ij})|G_{i}=0]$ is the mean earnings for a counterfactual set of workers possessing the female productivity distribution, but who are paid like men⁹⁹9Alternatively, this counterfactual term can be interpreted as the mean earnings of male workers whose productivity distribution was adjusted to match the female productivity distribution.

\Delta:=\underset{\Delta_{X}:=\text{Composition}}{\underbrace{E[Y_{ij}|G_{i}=1%]-E[Y_{1}(x_{ij})|G_{i}=0]}}+\underset{\Delta_{0}:=\text{Structural}}{%\underbrace{E[Y_{1}(x_{ij})|G_{i}=0]-E[Y_{ij}|G_{i}=0]}}

(2)

The composition portion can be rewritten as $E[Y_{1}(x_{ij})|G_{i}=1]-E[Y_{1}(x_{ij})|G_{i}=0]$ ¹⁰¹⁰10We use the representation of the observed $Y$ in terms of potential outcomes to write $E[Y_{ij}|G_{i}=1]=E[G_{i}Y_{g}(x_{ij})-(1-G_{i})Y_{g}(x_{ij})|G_{i}=1]=E[Y_{1}%(x_{ij})|G_{i}=1]$ and substitute it in $\Delta_{X}$ .. It represents the difference between what male workers actually earn and what male workers would have earned in a counterfactual scenario in which their productivity distribution was equivalent to the female productivity distribution. This quantity captures the portion of the overall wage gap attributable to differences in the composition, or distribution of productivity, between male and female workers. The structural portion is equivalent to $E[Y_{1}(x_{ij})-Y_{0}(x_{ij})|G_{i}=0]$ ¹¹¹¹11Analogously to the previous term, using the map from the potential outcomes to the observed $Y$ , we can write $E[Y_{ij}|G_{i}=0]=E[G_{i}Y_{g}(x_{ij})-(1-G_{i})Y_{g}(x_{ij})|G_{i}=1]=E[Y_{0}%(x_{ij})|G_{i}=0]$ and substitute it in $\Delta_{0}$ .. This is the difference between female earnings in a counterfactual state in which females were paid equivalently to what equally productive male workers are paid and actual average female earnings. This portion of the overall wage gap is due to structural differences in how the two genders are paid, holding productivity constant, which is why this term is often associated with a form of discrimination.

What we define as the structural component might reasonably be thought as discrimination, where labor market discrimination is defined as workers with similar productivity, performing similar tasks, and being paid differently based on observables that do not influence productivity. Other forms of discrimination may exist — including mistreatment or harassment, differential pre-job human capital accumulation opportunities, or discriminatory hiring practices — but we do not consider those in this paper. In our set up, individual discrimination occurs when the wage for worker $i$ at job $j$ is different if the individual’s gender changes, ceteris paribus, i.e. $Y_{1}(x_{ij})-Y_{0}(x_{ij})\neq 0$ . The problem is that, in order to measure this quantity, we run into the fundamental problem of causal inference: it is impossible to observe the potential wages in both states for the same individual. Therefore we must make assumptions in order to construct counterfactual values, i.e. the value of $Y_{1}$ for a female worker, or the value of $Y_{0}$ for a male worker. In this paper, we break the assumptions needed for the counterfactual estimation into two parts and we show how our approach contributes to deal with limitations in each of them.

The first assumption is that workers with the same values of $x$ are equally productive and would be paid equal wages if gender played no role in wage determination, conditional on productivity. This is equivalent to assuming that $x$ contains all factors that affect productivity and are correlated with gender. This “conditional independence/ignorability” assumption, is the basis of all decomposition methods in economics (Fortin et al. 2011), as it is a requirement for consistency of its estimates for the gap decomposition portions. However, not all factors that theoretically should be included in $x$ are observable.

A problem would arise if certain factors that contribute to worker $i$ ’s productivity in job $j$ are both unobserved by the econometrician and correlated with gender. If such factors exist, our counterfactuals would be invalid. Specifically, wage differentials due to unobserved differences in skills and tasks between male and female workers would be attributed to the effect of gender itself. For example, if women tend to have better social skills but we do not observe social skills, then we would interpret women outearning men in social skill-intensive jobs as discrimination against men, when in fact it is simply the result of differences in unobserved skills. Therefore, it is critical to come as close as possible to identifying groups of male and female workers who have exactly the same skills and perform exactly the same tasks. If we do so, then any gender wage differentials within this group are attributable to the effect of gender per se. In Section 3 we address this issue by identifying latent worker and job characteristics relevant to productivity and wage determination using the network of worker–job matches.

The second set of assumptions required to build the counterfactual $Y_{1}(x)$ for females in $\Delta$ are related to the choice of an estimation strategy for the function $Y_{1}(\cdot)$ ¹²¹²12Another approach decomposes the wage distributions, as opposed to actual wages, which would be equivalent to switching $Y$ for its distribution $F_{Y}$ , but still needing the estimation of the counterfactual $F_{Y_{1}}(y|x)$ for females (e.g. DiNardo et al. 1996 and Chernozhukov et al. 2013). We choose not to employ these decompositions in this paper as our setup does not satisfy basic conditions for decomposing distributions, such as having a low-dimensional vector of observable characteristics $x$ – given curse of dimensionality – and having the overlapping supports assumptions satisfied.. A common estimation strategy requires fitting a linear wage regression for males and using its estimated coefficients to predict wages, but inputting female workers’ covariates (Oaxaca 1973 and Blinder 1973). This approach is highly tractable, however the assumption of a linear functional form is to some extent arbitrary, and using the same regression coefficients to predict counterfactual earnings for distinct female workers (i.e. allowing no heterogeneous returns to observable characteristics) could lead to biased estimates of counterfactual earnings. An alternative approach relies on matching males to each female worker based on similar observable characteristics, and uses the wages of matched male workers in order to inform each female’s counterfactual wage. This less-parametric approach has the advantage of not imposing any functional form assumption for $Y_{1}(\cdot)$ , however it requires us to observe a sufficiently rich set of observable variables that male and female workers with the same observables may be assumed to have similar productivity. Moreover, matching methods are unreliable when we are unable to find a female worker with the same observables as a male worker, or vice versa. In Section 3 we describe a new method to enhance the set of observable characteristics available to the researcher, reducing the scope for unobserved determinants of productivity to cause biased estimates. In Section 4 we compare and contrast different methods to decompose the gender wage gap given a set of observable characteristics, circumventing issues present in counterfactual earnings estimation.

3 Revealing latent worker and job heterogeneity using network theory

In this section we present an economic model of monopsonistic wage setting, which rationalizes a wage gap between two groups of workers who have different demographic characteristics, but have the same skills and perform the same tasks. Intuitively, otherwise identical male and female workers may supply labor to individual jobs with different elasticities, and jobs respond by offering them wages with different markdowns. If one group of workers supplies labor to jobs more inelastically, then they will be paid less, holding productivity constant. Moreover, the model microfounds our network-based clustering algorithm, which identifies groups of male and female workers with similar skills who perform similar tasks, and therefore can serve as good counterfactuals for each other. The model builds on the model of the labor market developed in Fogel and Modenesi (2023), with two important differences: (i) in this paper workers have idiosyncratic preferences over individual jobs, not just markets, causing jobs to face upward-sloping labor supply curves, and (ii) firms may offer different wages to men and women, even if they have identical skills and perform identical tasks. The model defines a probability distribution that governs how workers match with jobs, forming the network of worker-job matches observed in linked employer-employee data. We use this probability distribution to assign similar workers to worker types and similar jobs to markets, using a Bayesian method based on generative network theory models, which we present after the economic model.

3.1 Economic model

We propose a model with two primary components: heterogeneous workers who supply labor and firms that produce goods by employing labor to perform tasks. Workers supply their skills to jobs, which are bundles of tasks embedded within firms. Jobs’ tasks are combined by the firms’ production functions to produce output. We assume that firms face an exogenously-determined demand for their goods¹³¹³13For an alternative version of the model with endogenous product demand, see Fogel and Modenesi (2023).. Our model of the labor market has the following components:

•
Each worker is endowed with a “worker type,” and all workers of the same type have the same skills.
•
A job is a bundle of tasks within a firm. As we discuss in Section 5, we define a job in our data as an occupation–establishment pair.
•
Each job belongs to a “market,” and all jobs in the same market are composed of the same bundle of tasks.
•
There are $I$ worker types, indexed by $\iota$ , and $\Gamma$ markets, indexed by $\gamma$ .
•
The key parameter governing worker-job match propensity is an $I\times\Gamma$ productivity matrix, $\Psi$ , where the ( $\iota,\gamma$ ) cell, $\psi_{\iota\gamma}$ denotes the number of efficiency units of labor a type $\iota$ worker can supply to a job in market $\gamma$ .¹⁴¹⁴14We can think of $\psi_{\iota\gamma}$ as $\psi_{\iota\gamma}=f(X_{\iota},Y_{\gamma})$ , where $X_{\iota}$ is an arbitrarily high dimensional vector of skills for type $\iota$ workers, $Y_{\gamma}$ is an arbitrarily high dimensional vector of tasks for jobs in market $\gamma$ , and $f()$ is a function mapping skills and tasks into productivity. This framework is consistent with Acemoglu and Autor (2011)’s skill and task-based model, and is equivalent to Lindenlaub (2017) and Tan (2018). A key difference is that Lindenlaub and Tan observe $X$ and $Y$ directly and assume a functional form for $f()$ , whereas we assume that $X$ , $Y$ , and $f()$ exist but are latent. We do not identify $X$ , $Y$ , and $f()$ directly because in our framework $\psi_{\iota\gamma}$ is a sufficient statistic for all of them.

Time is discrete, with time periods indexed by $t\in\{1,\dots,T\}$ and workers make idiosyncratic moves between jobs over time. Neither workers, households, nor firms make dynamic decisions, meaning that the model may be considered one period at a time. We do not consider capital as an input to production.

3.1.1 Firm’s problem

Each firm, indexed by $f$ , has a production function $Y_{f}(\cdot)$ which aggregates tasks from different labor markets, indexed by $\gamma$ . Firm $f$ faces exogenously-determined demand for its output, $\bar{Y}_{f}$ . The firm’s only cost is labor. As we discuss in the next subsection, firms face upward-sloping labor supply curves and therefore have wage-setting power. Firms demand labor in each market, $\gamma\in\{1,\dots,\Gamma\}$ and offer a different wage per efficiency unit of labor for each market. Firms also may offer different wages to workers in different demographic groups $g\in\{A,B\}$ (e.g. male and female workers), although type $A$ and type $B$ workers belonging to the same worker type $\iota$ are equally productive in all jobs. We define a job $j$ as a firm $f$ – market $\gamma$ pair. We define the wage per efficiency unit of labor for demographic group $g$ workers employed in job $j$ $w_{j}^{g}$ . Define $L_{j}^{g}$ as the quantity of efficiency units of labor supplied by demographic group $g$ workers to job $j$ .

The firm’s problem is to choose the quantity of labor inputs in each job for each demographic group in order to minimize costs subject to the constraint that production is greater than or equal to the firm’s exogenous product demand, $\bar{Y}_{f}$ :

\displaystyle\min_{\{{w}_{j}^{A},{w}_{j}^{B}\}_{j=1}^{\Gamma}}\sum_{j=1}^{%\Gamma}w^{A}_{j}L^{A}_{j}+w^{B}_{j}L^{B}_{j}\quad\text{s.t.}\quad Y_{f}\left(L%_{1},\ldots,L_{\Gamma}\right)\geq\bar{Y}_{f}

where $L_{j}=L^{A}_{j}+L^{B}_{j}$ is the total amount of efficiency units of labor employed by job $j$ and $Y_{f}$ is a concave and differentiable production function.

Taking the first order condition with respect to $w_{j}^{g}$ allows us to solve for the wage paid by job $j$ to workers in demographic group $g$ as a markdown relative to the marginal revenue product of labor:

\displaystyle w_{j}^{g}=

\displaystyle\underset{\text{ Markdown }}{\underbrace{\frac{e_{j}^{g}}{1+e_{j}%^{g}}}}\qquad\times\underset{\text{Marg. revenue product of labor}}{%\underbrace{\mu_{f}\frac{\partial Y_{f}}{\partial L_{j}}}}

(3)

where $\mu_{f}$ is the shadow revenue associated with one more unit of output and $e_{j}^{g}:=\frac{\partial L_{j}^{g}}{\partial w_{j}^{g}}\frac{w_{j}^{g}}{L_{j}%^{g}}$ is the labor supply elasticity of workers from group $g$ to job $j$ .

Equation (3) shows that the wage paid to demographic group $g$ workers employed in job $j$ (equivalently, employed in market $\gamma$ by firm $f$ ) is the product of a markdown and the marginal revenue product of labor in job $j$ . The markdown depends on the demographic group $g$ ’s elasticity of labor supply to job $j$ . As labor supply becomes more elastic, the markdown converges to 1 and the wage converges to the marginal product of labor. Conversely, as labor supply becomes less elastic, the wage declines further below the marginal product of labor. This equation rationalizes different demographic groups being paid different wages for the same labor: if one demographic group supplies labor more inelastically, they will be paid less.¹⁵¹⁵15We are referring to the elasticity of labor supply to a specific job $j$ , which may differ from a group’s labor supply elasticity to the overall labor market. For example, it could be the case that men supply labor more inelastically at the extensive margin, but women have stronger idiosyncratic preferences for specific jobs, making them less likely to change jobs in response to a wage differential. In this case, women would supply labor less elastically to a specific job $j$ and thus receive lower wages. The firm employs workers in both demographic groups despite paying them different wages because in order to attract the marginal worker from the lower-paid demographic group, it must raise wages for all inframarginal workers in that group. At some point the marginal cost (inclusive of the required raises for inframarginal workers) of hiring workers from the lower-paid demographic group exceeds the marginal cost of hiring workers from the higher-paid demographic group, and the firm will switch to hiring the higher-paid workers.

3.1.2 Worker’s problem

A worker belonging to worker type $\iota$ and demographic group $g\in\{A,B\}$ , has a two step decision. First, she chooses a market $\gamma$ in which to look for a job, and second she chooses a firm $f$ (and by extension a job $j$ ). The worker’s type defines their skills. Type $\iota$ workers can supply $\psi_{\iota\gamma}$ efficiency units of labor to any job in market $\gamma$ . $\psi_{\iota\gamma}$ is a reduced form representation of the skill level of a type $\iota$ worker in the various tasks required by a job in market $\gamma$ . Units of human capital are perfectly substitutable, meaning that if type 1 workers are twice as productive as type 2 workers in a particular market $\gamma$ (i.e. $\psi_{1\gamma}=2\psi_{2\gamma}$ ), firms would be indifferent between hiring one type 1 worker and two type 2 workers at a given wage per efficiency unit of labor, $w_{j}$ . Therefore, the law of one price holds within each demographic group for each job, and a type $\iota$ worker belonging to demographic group $g$ employed in a job in market $\gamma$ is paid $\psi_{\iota\gamma}w_{j}^{g}$ . Because workers’ time is indivisible, each worker may supply labor to only one job in each period and we do not consider the hours margin.

Workers choose job $j$ , equivalent to $\gamma f$ , in order to maximize utility, which is the sum of log earnings $\log(\psi_{\iota\gamma}w_{j}^{g})$ and an idiosyncratic preference for job $j$ , $\varepsilon_{ij}^{g}$ :

\displaystyle j^{*}=

\displaystyle\arg\max_{j}\quad\log(\psi_{\iota\gamma}w_{j}^{g})+\varepsilon_{%ij}^{g}.

We assume that $\varepsilon_{ij}^{g}$ follows a nested logit distribution with parameter $\nu_{\gamma}^{g}$ , with the $\gamma$ subscript indicating that nests are defined by $\gamma$ :

\displaystyle\varepsilon_{ij}^{g}\sim NestedLogit(\nu_{\gamma}^{g})

It follows from this assumption about the distribution of $\varepsilon_{ij}^{g}$ that the probability that worker $i$ belonging to worker type $\iota$ and demographic group $g$ matches with job $j$ in market $\gamma$ is¹⁶¹⁶16Details for the derivation of the choice probability in the Appendix A.:

\displaystyle P(j=j^{*}|j\in\gamma,i\in\iota,g)

\displaystyle=\underset{\underset{\text{{1st step}: market choice}}{%\underbrace{\scriptstyle P(\gamma=\gamma^{*}|i\in\iota,j\in\gamma,g)}}}{%\underbrace{\frac{\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}{\sum_{\gamma}%\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}}}\underset{\underset{\text{{2nd %step}: job choice}}{\underbrace{\scriptstyle P(j=j^{*}|i\in\iota,j\in\gamma,%\gamma=\gamma^{*},g)}}}{\underbrace{\frac{(\psi_{\iota\gamma}w_{j}^{g})^{\frac%{1}{\nu_{\gamma}^{g}}}}{\sum_{j\in\gamma}(\psi_{\iota\gamma}w_{j}^{g})^{\frac{%1}{\nu_{\gamma}^{g}}}}}}

(4)

where $I^{g}_{\iota\gamma}:=\sum_{j\in\gamma}(\psi_{\iota\gamma}w_{j}^{g})^{\frac{1}{%\nu_{\gamma}^{g}}}$ , also referred to as the inclusive value, is the expected utility a type $\iota$ worker faces when choosing market $\gamma$ . Intuitively, the nested logit assumption decomposes the job choice probability into a first stage in which the worker chooses a market and then a second stage in which the worker chooses a job conditional on their choice of a market.

3.2 Identifying worker types and markets

3.2.1 Deriving the likelihood

Now that we have derived the probability of worker $i$ matching with job $j$ from the primitives of our model, the next step is using this probability as the basis for a maximum likelihood procedure that assigns workers to worker types and jobs to markets based on the observed set of worker–job matches. This procedure builds on Fogel and Modenesi (2023), by allowing workers in the same worker type but different demographic groups to have different vectors of match probabilities over jobs.

We decompose the choice probability in equation (4) into a component that depends only on variation at the $\iota,\gamma,g$ level and a component that depends on wages at individual jobs:

\displaystyle P(j=j^{*}|j\in\gamma,i\in\iota,g)

\displaystyle=\underset{\underset{\iota-\gamma-g\text{ component}\quad}{%\underbrace{=:\Omega_{\iota\gamma}^{g}}}}{\underbrace{\frac{\exp(I_{\iota%\gamma}^{g})^{\nu_{\gamma}^{g}-1}}{\sum_{\gamma}\exp(I_{\iota\gamma}^{g})^{\nu%_{\gamma}^{g}}}\psi_{\iota\gamma}^{\frac{1}{\nu_{\gamma}^{g}}}}}\underset{%\underset{\quad j-g\text{ component}}{\underbrace{=:d_{j}^{g}}}}{\underbrace{%\vphantom{\frac{\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}-1}}{\sum_{\gamma}%\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}\psi_{\iota\gamma}^{\frac{1}{\nu_%{\gamma}^{g}}}}(w_{j}^{g})^{\frac{1}{\nu_{\gamma}^{g}}}}}.

(5)

The first term reflects workers choosing markets according to comparative advantage, while the second captures the fact that some jobs in market $\gamma$ require more workers than others (due to exogenous product demand differences), and since jobs face upward-sloping labor supply curves, they must pay higher wages to attract greater numbers of workers. Isolating the group-level ( $\iota,\gamma,g$ ) variation from the idiosyncratic job-level variation allows us to cluster workers into worker types and jobs into markets on the basis of having the same group-level match probabilities, as we discuss below.

The choice probabilities we have discussed thus far refer to a single job search for worker $i$ . In reality, we may observe workers searching for jobs multiple times, and each of these searches is informative about the latent worker skills and job tasks that define worker types $\iota$ and markets $\gamma$ . We incorporate repeated searches by assuming that workers periodically receive exogenous separation shocks which arrive following a Poisson process. Upon receiving a separation shock, the worker draws a new $\varepsilon_{ij}^{g}$ shock and repeats the job choice process described above. Assuming that $Poisson$ -distributed exogenous separations happen at a rate $d_{i}^{g}$ for the individual worker $i$ , then the expected number of times she will match with job $j$ throughout our sample period is given by

\displaystyle d_{i}^{g}\cdot P(j=j^{*}|j\in\gamma,i\in\iota,g)=\Omega_{\iota%\gamma}^{g}d_{i}^{g}d_{j}^{g}.

(6)

Equation 6 forms the basis of our algorithm for clustering workers into worker types and jobs into markets, but before proceeding we must define some notation. Let $N_{W}$ and $N_{J}$ denote the number of workers and jobs, respectively, in our data. Define $A_{ij}$ as the number of times that worker $i$ is observed to match with job $j$ . Further, define $\bm{A}$ as the matrix with typical element $A_{ij}$ . $\bm{A}$ is a $N_{W}\times N_{J}$ matrix and represents the full set of worker–job matches observed in our data. As discussed previously, each individual worker belongs to a latent worker type denoted by $\iota$ and each job belongs to a latent market denoted by $\gamma$ . The list of all latent worker type and market assignments is stored in the $(N_{W}+N_{J})\times 1$ vector denoted by $\bm{b}$ , known as the node membership vector. We define $\bm{g}$ as the $N_{W}\times 1$ vector containing each worker’s demographic group affiliation. The matrix of worker–job matches $\bm{A}$ and workers’ demographic groups $\bm{g}$ are the data we use to cluster workers and jobs, while the node membership vector $\bm{b}$ is the latent object identified by the maximum likelihood procedure we discuss below.

Following equation (6), the expected number of matches between a worker–job pair, $A_{ij}$ , can be written as¹⁷¹⁷17It is worth mentioning that: (i) the information $i\in\iota,j\in\gamma$ is contained in $\bm{b}$ ; and (ii) $A_{ij}$ is the number of matches between worker $i$ and job $j$ , which makes the event that $j=j^{*}|i$ equivalent to the event that $A_{ij}=1$ . These two facts allow us to use more succinct notation that directly links theoretical objects in our model to data: $P(j=j^{*}|j\in\gamma,i\in\iota,g)=P(A_{ij}=1|\bm{b},g)$ , which we know the distributional form for. This connects notations from the economic model to the network model, but it still lacks the precise definition of the likelihood of interest, $P(\bm{A},\bm{g}|\bm{b})$ , where $A_{ij}$ can assume values other than just $1$ .

\displaystyle E[A_{ij}|\bm{b},g]=\Omega_{\iota\gamma}^{g}d_{i}^{g}d_{j}^{g}.

(7)

We prove in Appendix C that our assumption of Poisson-distributed exogenous separation shocks implies that $A_{ij}$ follows a Poisson distribution:

\displaystyle A_{ij}|\bm{b},g\sim Poisson(\Omega_{\iota\gamma}^{g}d_{i}^{g}d_{%j}^{g})

(8)

Finally, we incorporate equation (8) above to fully characterize the likelihood of our data as a function of the unknown parameters, by applying Bayes rule:

\displaystyle P(A_{ij},g|\bm{b})=\underset{Poisson(\Omega_{\iota\gamma}^{g}d_{%i}^{g}d_{j}^{g})}{\underbrace{P(A_{ij}|\bm{b},g)}}\underset{\alpha_{\iota%\gamma}^{g}}{\underbrace{P(g|\bm{b})}},

(9)

where $\alpha_{\iota\gamma}^{g}\equiv P(g|\bm{b})$ is the fraction of type $\iota$ workers employed in market $\gamma$ jobs who belong to the demographic group $g$ . Equation 9 corresponds to a commonly-used method from network theory known as the bipartite degree-corrected stochastic block model with edge weights (SBM). The SBM clusters nodes in a network (workers and jobs) into groups (worker types and markets) based on patterns of connections between nodes.¹⁸¹⁸18Larremore et al. (2014) lays out the advantages of using bipartite models over using one-sided network projections to fit SBMs; Karrer and Newman (2011) presents the methodology for degree-correction as it enhances significantly the ability of the SBM to fit large scale real world networks; and Peixoto (2018) deal with weighted SBM inference, which is how we accommodate discrimination influencing matches within the SBM.. The main parameter of interest is the set of assignments of workers to worker types and jobs to markets contained in $\bm{b}$ , while all of the other parameters are nuisance parameters that can be straightforwardly determined after $\bm{b}$ is defined (Karrer and Newman 2011). The next step is to maximize the likelihood defined in equation 9, which we address in the next subsection.

3.2.2 A Bayesian approach to recovering worker types and markets

In order to make the estimation of worker types and markets feasible, together with using a principled method for choosing the number of clusters, we employ Bayesian methods from the network literature (Peixoto 2017). We can rewrite equation (9) as

	$\displaystyle P(\bm{b}\|A_{ij},g)\quad\propto$	$\displaystyle\qquad P(A_{ij},g\|\bm{b})P(\bm{b})$
	$\displaystyle=$	$\displaystyle\quad\underset{Poisson(\Omega_{\iota\gamma}^{g}d_{i}^{g}d_{j}^{g}%)}{\underbrace{P(A_{ij}\|\bm{b},g)}}\underset{\alpha_{\iota\gamma}^{g}}{%\underbrace{P(g\|\bm{b})}}\underset{\text{Prior}}{\underbrace{P(\bm{b})}}$		(10)

Maximizing the posterior distribution means assigning individual workers to worker types $\iota$ and jobs to markets $\gamma$ . The basic intuition follows from and is described in greater detail in Fogel and Modenesi (2023): workers belong to the same worker type if they have approximately the same vector of match probabilities over jobs, while jobs belong to the same market if they have approximately the same vector of match probabilities over workers. The key difference in this paper is that workers in the same worker type $\iota$ may belong to different demographic groups $g$ and each worker type–demographic group pair may face its own wage and therefore have its own match probability. Equation (3.2.2) allows for this by allowing the match probabilities $P(A_{ij},g|\bm{b})$ to depend on the workers’ demographic group $g$ in addition to the worker types and markets stored in $\bm{b}$ .

If worker types are defined by having common vectors of match probabilities over jobs, but match probabilities are allowed to vary by demographic group within a worker type, how do we know that type $\iota$ workers in group $A$ belong to the same worker type as type $\iota$ workers in group $B$ ? The answer is embedded in equation (3.2.2). The $\alpha_{\iota\gamma}^{g}$ term in equation (3.2.2) adjusts workers’ match probabilities so that they are relative to their own gender. Suppose women are significantly underrepresented in construction jobs and overrepresented in nursing jobs, and vice versa for men. Once we incorporate this adjustment, we would assign workers to a construction-intensive worker type if they are disproportionately likely to match with construction jobs, relative to other workers of their gender. Once we adjust the raw match probabilities to account for this selection, we obtain identical adjusted match probability vectors for this group of men and this group of women, causing us to assign them to the same worker type, $\iota$ .

Equation (3.2.2) assumes that we know the number of worker types and markets a priori, however this is rarely the case in real world applications. Therefore we must choose the number of worker types and markets, $I$ and $\Gamma$ respectively. We do so using the principle of minimum description length (MDL), an information theoretic approach that is commonly used in the network theory literature. MDL chooses the number of worker types and markets to minimize the total amount of information necessary to describe the data, where the total includes both the complexity of the model conditional on the parameters and the complexity of the parameter space itself. MDL will penalize a model that fits the data very well but overfits by using a large number of parameters (corresponding to a large number of worker types and markets), and therefore requires a large amount of information to encode it. MDL effectively adds a penalty term in our objective function, such that our algorithm finds a parsimonious model. See Fogel and Modenesi (2023) for greater detail.

Equation (3.2.2) defines a combinatorial optimization problem. If we had infinite computing resources, we would test all possible assignments of workers to worker types and jobs to markets and choose the one that maximizes the likelihood in equation (3.2.2), however this is not computationally feasible for large networks like ours. Therefore, we use a Markov chain Monte Carlo (MCMC) approach in which we modify the assignment of each worker to a worker type and each job to a market in a random fashion and accept or reject each modification with a probability given as a function of the change in the likelihood. We repeat the procedure for multiple different starting values to reduce the chances of finding local maxima. We implement the procedure using a Python package called graph-tool. (https://graph-tool.skewed.de/. See Peixoto (2014) for details.) Now that we have dealt with the issue of important worker and job characteristics being unobserved, we turn our attention to estimating counterfactuals for wage gap decompositions.

4 Wage gap decomposition

This section lays out the estimation strategies we use to decompose the Brazilian gender wage gap, while circumventing some of the issues associated with conventional decomposition methods. We decompose the gender wage gap into the quantities listed in equation (2): the composition component $E[Y_{1}(x_{ij})|G_{i}=1]-E[Y_{1}(x_{ij})|G_{i}=0]$ and the structural component $E[Y_{1}(x_{ij})-Y_{0}(x_{ij})|G_{i}=0]$ . The quantity $E[Y_{g}(x_{ij})|G_{i}=g]=E[Y_{ij}|G_{i}=g]$ , $g\in\{0,1\}$ can be consistently and straightforwardly estimated since it is directly observable. The challenge is estimating the counterfactual wage function $E[Y_{1}(x_{ij})|G_{i}=0]$ , given that the potential outcome $Y_{1}(x_{ij})$ is not observed for female workers. Estimating $E[Y_{1}(x_{ij})|G_{i}=0]$ requires us to use data on male workers to estimate a relationship between observable characteristics $x_{ij}$ and male earnings $Y_{1}$ and then extrapolate this relationship to female workers.

In this paper, we consider two approaches to estimating counterfactual wage functions. The first is the commonly-used Oaxaca-Blinder decomposition, which we henceforth refer to as OB (Oaxaca 1973; Blinder 1973). For the OB decomposition, we estimate two linear regressions — one for the set of male workers and another for the set of female workers — to estimate the functionals $Y_{1}(\cdot)$ and $Y_{0}(\cdot)$ , respectively, as denoted in equation (11). Values for $E[Y_{g}(x_{ij})|G_{i}=g]$ are obtained by averaging out the fitted values of the respective linear regressions. Estimates for the counterfactual $E[Y_{1}(x_{ij})|G_{i}=0]$ are obtained by using the coefficients from the linear regression fitted for males, $\hat{\beta}_{G=1}$ , and multiplying them by the average female covariates, $\bar{x}_{G=0}$ , as defined in equation (11). This is equivalent to producing fitted values for the males’ regression, while inputting females’ covariates.

		OB regressions:	$\displaystyle Y_{g}(x_{ij})=x_{ij}^{T}\beta_{G=g}+\epsilon_{gij},\qquad g\in\{%0,1\}$		(11)
		OB counterfactual estimate:	$\displaystyle\widehat{E[Y_{1}(x_{ij})\|G_{i}=0]}:=\bar{x}_{G=0}^{T}\hat{\beta}_%{G=1},\quad\bar{x}_{G=0}:=\sum_{i\|G_{i}=0}\frac{x_{ij}}{n}$

As discussed in section 2, the OB decomposition has several important limitations. Although highly tractable, OB imposes potentially restrictive assumptions on $Y_{1}(x_{ij})$ . First, it assumes that its expectation is linear in $x_{ij}$ . Although linear regressions allow for flexible transformations of its covariates, the functional form is still a somewhat arbitrary researcher choice. Second, by using a linear regression to estimate the potential outcome function, $Y_{1}(\cdot)$ , as in equation (11), it uses the same functional form to compute counterfactuals for all male workers. In other words, it imposes the same average returns to covariates for all workers, which would create biases in the counterfactual estimation if returns to worker characteristics are heterogeneous. The third limitation of the OB is related to the overlapping supports assumption, also referred to as the common supports assumption. This assumption imposes that the support of $x$ for one of the genders has to fully overlap with the support of $x$ for the other gender, and is imposed by almost all decomposition methods in economics (Fortin et al. 2011). The overlapping supports assumption is imposed to ensure that the counterfactual function $Y_{1}(x)$ estimated using male data, $x_{G_{i}=1}$ , is only used to predict counterfactual earnings for females whose values of $x$ lie within the male support of $x$ . When this condition is not satisfied in the data, observations that are outside of the common support are typically trimmed or given virtually zero weight in the estimation process, potentially eliminating significant numbers of workers from the analysis and making the analysis representative of only a subset of the population (Modenesi 2022). This is particularly salient when $x$ lies in a high-dimensional space, as is the case in our application with high-dimensional worker types and markets.

Our preferred decomposition strategy relies on matching male and female workers with similar observable characteristics and using matched workers of different genders as counterfactuals for each other. This approach was initially proposed by Ñopo (2008) and was further extended by Modenesi (2022). Not only does this approach avoid the strong functional form assumptions made by OB, it includes a framework for handling a lack of common support. In this paper, we choose to use the original estimation strategy laid out by Ñopo (2008), given its tractability especially for a high-dimensional set of covariates like ours, and we refer to it as the matching decomposition henceforth.

The matching decomposition has two main components: (i) matching observations and (ii) relaxing the overlapping supports assumption. First, counterfactual female earnings $Y_{1}(x_{ij})|G_{i}=0$ — what female workers would have earned if their gender were changed to male but nothing else about them changed — are obtained by exact matching each female to one or more male workers with similar observable characteristics and then taking a sample average of the matched males¹⁹¹⁹19In this paper we coarsened a few variables such as years of education and age, and we use the coarsened version of these variables instead to perform the exact matching. This serves the purpose of matching more individuals, giving more statistical power to the method, since workers with just e.g. 1 year difference in age, ceteris paribus, are roughly the same in terms of productivity.. This method for building counterfactuals is non-parametric, assuming no functional form for $Y_{1}(\cdot)$ , it exerts no extrapolations out of the support of $x$ and it avoids using data from all workers to build counterfactuals for a specific worker. The matching decomposition handles the lack of common support issue by allowing unmatched workers, i.e. outside of the common support of $x$ , to contribute to the overall observed gap. In the matching decomposition, we add two terms, $\Delta_{M}$ and $\Delta_{F}$ , to the expression for the overall wage gap $\Delta$ in equation (1) which captures the contributions of unmatched male and female workers, respectively. The resulting expression is

\displaystyle\Delta=

\displaystyle E[Y_{ij}|G_{i}=1]-E[Y_{ij}|G_{i}=0]=:\Delta_{X}+\Delta_{0}+%\Delta_{M}+\Delta_{F},

(12)

where

	$\displaystyle\Delta_{X}:=E\left[Y_{ij}\|Matched,G_{i}=1\right]-E\left[Y_{1}(x_{%ij})\|Matched,G_{i}=0\right]$
	$\displaystyle\Delta_{0}:=E\left[Y_{ij}\|Matched,G_{i}=1\right]-E\left[Y_{1}(x_{%ij})\|Matched,G_{i}=0\right]$
	$\displaystyle\Delta_{M}:=\left\{E\left[Y_{ij}\|Unmatched,G_{i}=1\right]-E\left[%Y_{ij}\|Matched,G_{i}=1\right]\right\}P\left(Unmatched\|G_{i}=1\right)$
	$\displaystyle\Delta_{F}:=\left\{E\left[Y_{ij}\|Matched,G_{i}=0\right]-E\left[Y_%{ij}\|Unmatched,G_{i}=0\right]\right\}P\left(Unmatched\|G_{i}=0\right)$

Notice that if all observations are matched the $\Delta_{M}$ and $\Delta_{F}$ terms vanish and this method collapses back to the original decomposition we have in equation (2). The terms $\Delta_{X}$ and $\Delta_{0}$ still have the same interpretation as discussed in Section 2 — composition and structural, respectively — but now only similar workers of one gender are used to build counterfactuals for the other gender, using an agnostic functional form for the counterfactual function. The extra terms $\Delta_{M}$ and $\Delta_{F}$ measure the contribution of unmatched male and female workers to the overall observed gender gap. Each of them measures the difference between matched and unmatched workers of a given gender, weighted by the proportion of unmatched workers within that gender²⁰²⁰20Precise definitions of each of the terms in the NP decomposition can be found in the appendix section B. For example, if unmatched male workers have an average log wage that is 0.2 higher than the average log wage for matched male workers and 10% of male workers are unmatched, then $\Delta_{M}=0.2\times 0.1=0.02$ .

To understand how the matching decomposition handles a lack of common support, consider male workers employed as professional football players. These workers will not be matched to female workers and therefore would be omitted from the analysis if we simply restrict it to the region of common support. However, the male workers do contribute meaningfully to the overall gender wage gap because they earn significantly more than the average female worker. The matching decomposition would handle this by including these workers in the $\Delta_{M}$ term. Intuitively, it would say that some of the gender wage gap can be decomposed within the region of common support, while some of it is explained by male workers outside the region of common support earning more than male workers within the region of common support, and similarly for female workers.

Our preferred specifications in this paper use the matching decomposition in conjunction with the latent skills and tasks clusters revealed by our network methodology developed in Section 3. Since we define labor market gender discrimination as workers with similar skills performing similar tasks with similar productivity but being paid differently based on gender, our worker type–market clusters serve as natural cells within which workers are considered as equivalent in terms of productivity. With the matching decomposition we are able to ensure that only similar workers are used when estimating counterfactual earnings, mitigating counterfactual biases, and also avoid dropping unmatched workers from the estimation procedure as mentioned above. Although the original matching decomposition is not considered to be a “detailed decomposition” by the literature of decompositions in economics, in combination with our network clusters, it is possible to compute an economically principled distribution of the gender gap (and its components) for a vast amount of cells of workers in the labor market, mapping how discrimination is spread in different parts of the market.

5 Data

5.1 Administrative Brazilian data

We use the Brazilian linked employer-employee data set RAIS. The data contain detailed information on all employment contracts in the Brazilian formal sector, going back to the 1980s. The sample we work with includes all workers between the ages of 25 and 55 employed in the formal sector in the Rio de Janeiro metro area at least once between 2009 and 2018. These workers are defined as matching with the unemployment (or informal sector) in years we do not observe them. We also exclude the public sector because institutional barriers make flows between the Brazilian public and private sectors rare, as well as the military. Finally, we exclude the small number of jobs that do not pay workers on a monthly basis.

Our wage variable is the real hourly log wage in December, defined as total December earnings divided by hours worked. We deflate wages using the national inflation index. We exclude workers who were not employed for the entire month of December because we do not have accurate hours worked information for such workers. We define a job as an occupation-establishment pair. This implicitly assumes that all workers employed in the same occupation at the same establishment are performing approximately the same tasks.

Our data contain 4,578,210 unique workers, 289,836 unique jobs, and 7,940,483 unique worker–job matches. The average worker matches with 1.73 jobs and the average job matches with 27.4 workers. 42% of workers match with more than one job during our sample. Figure 1(b) presents histograms of the number of matches for workers and jobs, respectively. In network theory parlance, these are known as degree distributions.

University of Michigan, bmodene@umich.edu. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1256260. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This research is also supported by the Alfred P. Sloan Foundation through the CenHRS project at the University of Michigan. This work is done in partnership with the Brazilian Institute of Applied Economic Research (IPEA). We thank John Bound, Abigail Jacobs, Matthew Shapiro, Mel Stephens, and Sebastian Sotelo for advice and guidance throughout this project. We also thank Charlie Brown, Zach Brown, Raj Chetty, Ying Fan, John Friedman, Florian Gunsilius, Nathan Hendren, Dhiren Patki, Rafael Pereira, Matthew Staiger, Dyanne Vaught, and Jean-Gabriel Young for helpful comments and discussions. We also received helpful feedback from seminar participants at the University of Michigan, Labo(u)r Day, the Urban Economics Association, Networks 2021, Yale University, Duke University, the Federal Reserve Bank of Boston, Opportunity Insights, and JAM. (1)

Our network-based classification algorithm identifies 187 worker types ( $\iota$ ) and 341 markets ( $\gamma$ ). Figure 2(b) presents histograms of the number of workers per worker type and jobs per market. The average worker belongs to a worker type with 20,896 workers and the median worker belongs to a worker type with 14,211 workers. The average job belongs to a market with 1,156 jobs and the median job belongs to a market with 1,127 jobs.

6 Results

6.1 Aggregate wage gap decomposition

Table 1 presents the results of performing gender wage decompositions using each of our two methods: OB and matching. For each method, we have three specifications. The first, presented in columns (1) and (4), estimates counterfactual earnings distributions using a standard set of observable characteristics: experience, education, race, industry and union status. The second, presented in columns (2) and (5), estimates counterfactual earnings distributions using the worker types and markets identified by the SBM. The third specification, presented in columns (3) and (6) uses both standard observable characteristics and worker types and markets. The first row of each column presents the overall wage gap: the average male worker earns 16.7 percent more than the average female worker in our sample. The second row presents the wage gap that would exist if male and female workers with the same productivity were paid equivalently but the observed differences between the distributions of male and female productivity — as proxied by observable characteristics and/or worker types and markets — remained, the composition component. The third row presents the wage gap that would exist if male and female workers had identical productivity distributions, but the observed earnings differences conditional on productivity remained, the structural component. The fourth and fifth rows present the wage gap explained by male and female workers outside the region of common support, respectively. For the OB method the composition and structural components add up to the overall wage gap; for the matching method the overall wage gap equals the sum of the composition and structural components and the components due to a lack of common support.

The qualitative stories told by both the OB method and the matching method are similar. When we define counterfactual earnings using observable characteristics (columns 1 and 4), we find that if male and female workers with the same productivity were paid similarly, then female workers would significantly outearn male workers (structural effect): by 12.7% using the OB method and 8.8% using the matching method. By contrast, female workers would be paid significantly less if they possessed the male’s productivity distribution (composition effect): 29.4% less using the OB method and 25.6% less using the matching method. When we define counterfactuals using worker types and markets instead of observable characteristics (columns 2 and 5) we find that the wage gap would nearly disappear if male and female workers with the same productivity were paid similarly. By contrast, the wage gap that would exist if male and female workers had the same productivity distribution — 17.9% according to OB and 17.8% according to matching — is almost equal to the overall wage gap of 16.7%. In other words, when we compute counterfactuals using worker types and markets we find that differential pay for similar productivity explains roughly the entire gender wage gap. This tells us that the results of gender wage gap decompositions are highly sensitive to the way in which we define counterfactuals. If, as we argue, worker types and markets do a better job of capturing the latent productivity of worker–job matches than do standard observable characteristics, then these results imply that gender wage gaps are almost entirely due to similarly productive male and female workers being paid differently, not male and female workers having different productivity distributions.

Columns (3) and (6) of Table 1 use both observable characteristics and worker types and markets to form counterfactuals for the gender wage gap decompositions. The OB method finds that female workers have covariates that would imply that they would outearn male workers if equally productive workers were paid equivalently, similar to the findings when we included only observable characteristics, not worker and job types, in column (1). By contrast, the matching method finds that male workers’ covariates imply 3.4% higher earnings than female workers’ covariates and that male workers are paid 18.5% more than similarly productive female workers. Why do we observe a discrepancy between the OB and matching methods once we include observable characteristics and worker types and markets? The answer lies in the final two rows of Table 1, which present the fraction of male and female workers, respectively, for whom we are unable to find a counterfactual. Once we try to match workers on such a large set of variables, many workers are unable to be matched, and a significant part of the gender wage gap occurs among such workers. The matching method allows us to take this into account, while the OB method simply makes a linear extrapolation. However, a linear extrapolation outside the region of common support is likely to lead to incorrect inferences. Furthermore, the fact that the matching estimator yields similar results when we use worker types and markets as it does when we use worker types, markets, and other observable characteristics, but not when we use other observables alone, implies that worker types and markets capture significant determinants of productivity, and omitting them leads to incorrect inferences. This highlights the importance of using a sufficiently set of worker characteristics when estimating counterfactuals, and our method for identifying previously unobserved heterogeneity enhances our ability to do so. All of the results presented in this section correspond to the aggregate gender wage gap. In the next section, we consider heterogeneity in wage gaps within different subsets of the labor market.

	Oaxaca-Blinder			Matching
	Observables	$\iota\times\gamma$	Full model	Observables	$\iota\times\gamma$	Full model
	(1)	(2)	(3)	(4)	(5)	(6)
Gap	0.167	0.167	0.167	0.167	0.167	0.167
Composition	-0.127	-0.011	-0.084	-0.088	-0.006	0.034
structural	0.294	0.179	0.250	0.256	0.178	0.185
Males unmatched	-	-	-	0.000	-0.005	-0.076
Females unmatched	-	-	-	0.000	0.000	0.024
% of males matched	-	-	-	1.00	0.98	0.57
% of females matched	-	-	-	1.00	0.99	0.74

6.2 Wage gaps within worker type–market cells

An appealing feature of our worker types and markets is that they allow us to further decompose gender wage gaps and identify heterogeneity in gender wage gaps across the labor market. We do so by computing overall wage gaps, $\Delta$ , and then decomposing them following the matching decomposition, within each worker type–market cell.

For each worker type–market cell we decompose the overall wage gap (Row 1 of Table 1) into its four components: composition, structural, males unmatched, and females unmatched (Rows 2–5 of Table 1). Figure 3 presents kernel density plots of the resulting distributions of overall wage gaps and their four components. Several clear patterns emerge. First, the overall wage gaps $\Delta$ are almost universally positive, meaning that male workers outearning their female counterparts is a widespread phenomenon. Specifically, 91% of workers are in clusters where males outearn females. Second, the distribution of the structural component, $\Delta_{0}$ , is similar to the distribution of the overall wage gap. This suggests that the result from the aggregate decomposition in Section 6.1 that almost the entire overall gender wage gap is explained by the structural component holds within worker type–market cells as well. The fact that the structural component roughly coincides with the overall wage gap implies that the other three components — composition, males outside the common support, and females outside the common support — must contribute relatively little to the overall gender wage gap, which is confirmed by the fact that the distributions for these three components are centered close to zero and have low variances. We present the same results quantitatively in Table 2. Together, these results tell us that while there is significant variability in gender wage gaps across different worker type–market pairs, the overall qualitative pattern of male workers outearning their female counterparts, and almost all of this gap being explained by differential returns to the same skills rather than different skills, is true in the disaggregated results as well as the aggregated results.

	mean	sd	min	max	count
$\Delta$	0.215	0.240	-1.183	6.228	4791014
$\Delta_{0}$	0.196	0.172	-2.506	9.384	4791014
$\Delta_{M}$	0.016	0.134	-3.577	3.448	4783255
$\Delta_{F}$	-0.011	0.116	-2.632	4.684	4724863
$\Delta_{X}$	0.013	0.153	-1.150	2.418	4791014
Frac. Male Workers Matched	0.766	0.238	0.004	1.000	4791014
Frac. Female Workers Matched	0.875	0.199	0.008	1.000	4791014
Frac. Workers that Are Male	0.617	0.162	0.037	0.999	4791014

7 Conclusion

In this paper we reconsider the wage gap decomposition literature and make three key contributions. First, we propose a new method for identifying unobserved determinants of workers’ earnings from the information revealed by detailed data on worker–job matching patterns. The method builds on Fogel and Modenesi (2023) and provides a blueprint for incorporating observable variables into the clustering algorithm, while also relaxing the assumption of perfect competition in labor markets. Second, we non-parametrically estimate counterfactual wage functions for male and female workers and use them to decompose gender wage gaps into a composition component in which male and female workers earn different wages because they possess different skills and perform different tasks, and a structural component in which male and female workers who possess similar skills and perform similar tasks nonetheless earn different wages. Third, we address the issue of male workers’ observables characteristics falling outside the support of female workers’ observable characteristics, and vice versa, by augmenting the wage decomposition with components attributable to male and female workers, respectively, outside the region of common support.

We apply these methods to Brazilian administrative data and find that almost the entire gender wage gap is attributable to male and female workers who possess similar skills and perform similar tasks being paid differently. This is true at the aggregate level, and remains true when we perform wage decompositions within each worker type–market cell, indicating that this is a widespread phenomenon, not one driven by large wage differentials in small subsets of the labor market. We find that wage decompositions based on standard observable variables suffer from omitted variable bias, emphasizing the need for detailed worker and job characteristics in the form of worker types and markets. We find that wage decompositions based on linear regressions yield similar findings to those based on matching when a lack of common support is not an issue, however when male and female workers’ characteristics do not share a common support the matching estimator with corrections for a lack of common support outperforms alternatives.

While this paper focuses on gender wage gaps, the methods are applicable to other wage gaps, for instance race. Moreover, our strategy for using worker–job matching patterns to control for previously-unobserved, but potentially confounding, covariates may be applied in a wide variety of contexts.

References

(1)
Acemoglu and Autor (2011)Acemoglu, Daron and David Autor, “Skills, tasks and technologies:Implications for employment and earnings,” 2011, 4, 1043–1171.
Autor (2013)Autor, DavidH, “The ‘task approach’ to labor markets: an overview,”2013.
Autor et al. (2003)Autor, DavidH., Frank Levy, and RichardJ. Murnane, “The Skill Contentof Recent Technological Change: An Empirical Exploration,” TheQuarterly Journal of Economics, 2003, 118 (4), 1279–1333.
Barsky et al. (2002)Barsky, Robert, John Bound, KerwinKofi Charles, and JosephP. Lupton,“Accounting for the Black-White Wealth Gap: A Nonparametric Approach,” Journal of the American Statistical Association, 2002, 97 (459),663–673.
Blinder (1973)Blinder, AlanS., “Wage Discrimination: Reduced Form and StructuralEstimates,” The Journal of Human Resources, 1973, 8 (4),436–455.
Card et al. (2015)Card, David, AnaRute Cardoso, and Patrick Kline, “ Bargaining,Sorting, and the Gender Wage Gap: Quantifying the Impact of Firms on theRelative Pay of Women *,” The Quarterly Journal of Economics, 102015, 131 (2), 633–686.
Card et al. (2018) , , Joerg Heining, and Patrick Kline, “Firms and LaborMarket Inequality: Evidence and Some Theory,” Journal of LaborEconomics, 2018, 36 (S1), S13–S70.
Chernozhukov et al. (2013)Chernozhukov, Victor, Iván Fernández-Val, and Blaise Melly, “Inferenceon Counterfactual Distributions,” Econometrica, 2013, 81 (6),2205–2268.
DiNardo et al. (1996)DiNardo, John, NicoleM. Fortin, and Thomas Lemieux, “Labor MarketInstitutions and the Distribution of Wages, 1973-1992: A SemiparametricApproach,” Econometrica, 1996, 64 (5), 1001–1044.
Firpo et al. (2018)Firpo, SergioP., NicoleM. Fortin, and Thomas Lemieux, “DecomposingWage Distributions Using Recentered Influence Function Regressions,” Econometrics, May 2018, 6 (2), 1–40.
Fogel and Modenesi (2023)Fogel, Jamie and Bernardo Modenesi, “What is a Labor Market? ClassifyingWorkers and Jobs Using Network Theory,” 2023.
Fortin et al. (2011)Fortin, Nicole, Thomas Lemieux, and Sergio Firpo, “Chapter 1 -Decomposition Methods in Economics,” in Orley Ashenfelter and David Card,eds., Orley Ashenfelter and David Card, eds., Vol.4 of Handbookof Labor Economics, Elsevier, 2011, pp.1–102.
Garcia et al. (2009)Garcia, LuanaMarquez, HugoNopo, and Paola Salardi, “Gender andRacial Wage Gaps in Brazil 1996-2006: Evidence Using a Matching ComparisonsApproach,” Research Department Publications 4626, Inter-AmericanDevelopment Bank, Research Department May 2009.
Gerard et al. (2018)Gerard, François, Lorenzo Lagos, Edson Severnini, and David Card,“Assortative Matching or Exclusionary Hiring? The Impact of Firm Policies onRacial Wage Differences in Brazil,” Working Paper 25176, National Bureau ofEconomic Research October 2018.
Goldin (2014)Goldin, Claudia, “A Grand Gender Convergence: Its Last Chapter,” American Economic Review, April 2014, 104 (4), 1091–1119.
Hurst et al. (2021)Hurst, Erik, Yona Rubinstein, and Kazuatsu Shimizu, “Task-BasedDiscrimination,” Working Paper 29022, National Bureau of Economic ResearchJuly 2021.
Jarosch et al. (2019)Jarosch, Gregor, JanSebastian Nimczik, and Isaac Sorkin, “Granularsearch, market structure, and wages,” Technical Report, National Bureau ofEconomic Research 2019.
Kantenga (2018)Kantenga, Kory, “The effect of job-polarizing skill demands on the USwage structure,” 2018.
Karrer and Newman (2011)Karrer, Brian and MarkEJ Newman, “Stochastic blockmodels and communitystructure in networks,” Physical review E, 2011, 83 (1), 016107.
Larremore et al. (2014)Larremore, DanielB, Aaron Clauset, and AbigailZ Jacobs, “Efficientlyinferring community structure in bipartite networks,” Physical ReviewE, 2014, 90 (1), 012805.
Lindenlaub (2017)Lindenlaub, Ilse, “Sorting multidimensional types: Theory andapplication,” The Review of Economic Studies, 2017, 84 (2),718–789.
McFadden (1978)McFadden, Daniel, “Modeling the choice of residential location,” Transportation Research Record, 1978, (673).
Modenesi (2022)Modenesi, Bernardo, “Advancing Distribution Decomposition Methods BeyondCommon Supports: Applications to Racial Wealth Disparities,” 2022.
Morello and Anjolim (2021)Morello, Thiago and Jacqueline Anjolim, “Gender wage discrimination inBrazil from 1996 to 2015: A matching analysis,” EconomiA, 2021.
Nimczik (2018)Nimczik, JanSebastian, “Job Mobility Networks and Endogenous LaborMarkets,” 2018.
Ñopo (2008)Ñopo, Hugo, “Matching as a Tool to Decompose Wage Gaps,” TheReview of Economics and Statistics, 2008, 90 (2), 290–299.
Oaxaca (1973)Oaxaca, Ronald, “Male-Female Wage Differentials in Urban LaborMarkets,” International Economic Review, 1973, 14 (3), 693–709.
Peixoto (2014)Peixoto, TiagoP, “Efficient Monte Carlo and greedy heuristic for theinference of stochastic block models,” Physical Review E, 2014, 89 (1), 012804.
Peixoto (2017) , “Nonparametric Bayesian inference of the microcanonicalstochastic block model,” Physical Review E, 2017, 95 (1),012317.
Peixoto (2018)Peixoto, TiagoP., “Nonparametric weighted stochastic block models,”Phys. Rev. E, Jan 2018, 97, 012306.
Peixoto (2019)Peixoto, TiagoP, “Bayesian stochastic blockmodeling,” Advances innetwork clustering and blockmodeling, 2019, pp.289–332.
Roy (1951)Roy, AndrewDonald, “Some thoughts on the distribution of earnings,”Oxford economic papers, 1951, 3 (2), 135–146.
Sorkin (2018)Sorkin, Isaac, “Ranking firms using revealed preference,” Thequarterly journal of economics, 2018, 133 (3), 1331–1393.
Tan (2018)Tan, Joanne, “Multidimensional heterogeneity and matching in africtional labor market - An application to polarization,” 2018.
Train (2010)Train, KennethE, Discrete choice methods with simulation,Cambridge university press, 2010.

\appendixpage

Appendix A Nested Logit Choice Probability

According to Train (2010), and originally developed by McFadden (1978), maximizing the utility choosing $j$ , which is nested within a group $\gamma$

j^{*}:=\arg\max_{j}\quad W_{\gamma}+Y_{j}+\varepsilon_{j}

(13)

with $\varepsilon_{j}\sim NestedLogit(\nu_{\gamma})$ results in the following choice probability:

	$\displaystyle P(j=j^{*})$	$\displaystyle=P(\text{Choose }\gamma)P(j=j^{*}\|\gamma)$
		$\displaystyle=\frac{\exp(W_{\gamma}+\nu_{\gamma}I_{\gamma})}{\sum_{\gamma}\exp%(W_{\gamma}+\nu_{\gamma}I_{\gamma})}\frac{\exp(Y_{j})^{\frac{1}{\nu_{\gamma}}}%}{\sum_{j\in\gamma}\exp(Y_{j})^{\frac{1}{\nu_{\gamma}}}}$

where $I_{\gamma}=\log\left(\sum_{j\in\gamma}\exp(Y_{j})^{\frac{1}{\nu_{\gamma}}}\right)$ .

Our problem is similar, with workers choosing job $j$ within a market $\gamma$ in order to maximize the sum of log earnings $\log(\psi_{\iota\gamma}w_{j}^{g})$ and an idiosyncratic preference for job $j$ , $\varepsilon_{ij}^{g}$ :

\displaystyle j^{*}=

\displaystyle\arg\max_{j}\quad\log(\psi_{\iota\gamma}w_{j}^{g})+\varepsilon_{%ij}^{g}.

(14)

We also assume that $\varepsilon_{ij}^{g}\sim NestedLogit(\nu_{\gamma}^{g})$ . One of the differences from our setup to what is covered by Train (2010) is that we add extra worker indexes $\iota$ for her/his skills and $g$ for her gender and we condition our probabilities on knowing $\iota$ and $g$ . Notice that when comparing equations 13 and 14, $W_{\gamma}=0$ and $Y_{j}=\log(\psi_{\iota\gamma}w_{j}^{g})$ , which results in the following choice probabilities:

	$\displaystyle P(j=j^{*}\|j\in\gamma,i\in\iota,g)$	$\displaystyle=P(\gamma=\gamma^{}\|i\in\iota,j\in\gamma,g)P(j=j^{}\|i\in\iota,j%\in\gamma,\gamma=\gamma^{*},g)$
		$\displaystyle=\frac{\exp(\nu_{\gamma}^{g}I_{\iota\gamma}^{g})}{\sum_{\gamma}%\exp(\nu_{\gamma}^{g}I_{\iota\gamma}^{g})}\frac{\exp(\log(\psi_{\iota\gamma}w_%{j}^{g}))^{\frac{1}{\nu_{\gamma}^{g}}}}{\sum_{j\in\gamma}\exp(\log(\psi_{\iota%\gamma}w_{j}^{g}))^{\frac{1}{\nu_{\gamma}^{g}}}}\quad\text{\tiny(plugging %objects in)}$
		$\displaystyle=\frac{\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}{\sum_{\gamma%}\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}\frac{(\psi_{\iota\gamma}w_{j}^{%g})^{\frac{1}{\nu_{\gamma}^{g}}}}{\sum_{j\in\gamma}(\psi_{\iota\gamma}w_{j}^{g%})^{\frac{1}{\nu_{\gamma}^{g}}}}\quad\text{\tiny(similar to equation \ref{eq_%worker_choice})}$
		$\displaystyle=\frac{\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}{\sum_{\gamma%}\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}\frac{(\psi_{\iota\gamma}w_{j}^{%g})^{\frac{1}{\nu_{\gamma}^{g}}}}{\exp(I_{\iota\gamma}^{g})}\quad\text{\tiny(%by definition of $I_{\iota\gamma}^{g}$)}$
		$\displaystyle=\underset{\underset{\iota-\gamma-g\text{ component}\quad}{%\underbrace{=:\Omega_{\iota\gamma}^{g}}}}{\underbrace{\frac{\exp(I_{\iota%\gamma}^{g})^{\nu_{\gamma}^{g}-1}}{\sum_{\gamma}\exp(I_{\iota\gamma}^{g})^{\nu%_{\gamma}^{g}}}\psi_{\iota\gamma}^{\frac{1}{\nu_{\gamma}^{g}}}}}\underset{%\underset{\quad j-g\text{ component}}{\underbrace{=:d_{j}^{g}}}}{\underbrace{%\vphantom{\frac{\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}-1}}{\sum_{\gamma}%\exp(I_{\iota\gamma}^{g})^{\nu_{\gamma}^{g}}}\psi_{\iota\gamma}^{\frac{1}{\nu_%{\gamma}^{g}}}}(w_{j}^{g})^{\frac{1}{\nu_{\gamma}^{g}}}}}\quad\text{\tiny(%similar to equation \ref{eq_worker_choice_separation})}$

where $I_{\iota\gamma}^{g}=\log\left[\sum_{j\in\gamma}\exp(\log(\psi_{\iota\gamma}w_{%j}^{g}))^{\frac{1}{\nu_{\gamma}^{g}}}\right]=\log\left[\sum_{j\in\gamma}(\psi_%{\iota\gamma}w_{j}^{g})^{\frac{1}{\nu_{\gamma}^{g}}}\right]$ .

Appendix B Terms in the NP decomposition

The terms in the NP decomposition from equation 12 can be more formally defined as follows:

	$\displaystyle\Delta_{M}:=\left[\int_{\bar{S}_{F}}Y_{1}(x)\frac{dF_{M}(x)}{\mu_%{M}(\bar{S}_{F})}-\int_{S_{F}}Y_{1}(x)\frac{dF_{M}(x)}{\mu_{M}(S_{F})}\right]%\mu_{M}(\bar{S}_{F})$		(15)
	$\displaystyle\Delta_{X}:=\int_{S_{M}\cap S_{F}}Y_{1}(x)\left[\frac{dF_{M}(x)}{%\mu_{M}(S_{F})}-\frac{dF_{F}(x)}{\mu_{F}(S_{M})}\right]$
	$\displaystyle\Delta_{0}:=\int_{S_{M}\cap S_{F}}\left[Y_{1}(x)-Y_{0}(x)\right]%\frac{dF_{F}(x)}{\mu_{F}(S_{M})}$
	$\displaystyle\Delta_{F}:=\left[\int_{S_{M}}Y_{0}(x)\frac{dF_{F}(x)}{\mu_{F}(S_%{M})}-\int_{\bar{S}_{M}}Y_{0}(x)\frac{dF_{F}(x)}{\mu_{F}(\bar{S}_{M})}\right]%\mu_{F}(\bar{S}_{M})$

where: $F_{M}(x)$ and $F_{F}(x)$ denote the distributions of $x$ for both males and females, respectively; $\mu_{M}$ and $\mu_{F}$ measure the proportions of males and females over regions of the supports of $x$ ; and the support of $x$ for a gender $g$ , $supp_{(}X_{g})$ , is partitioned as $supp_{(}X_{g}):=S_{g}\cup\bar{S}_{g}$ , with $S_{g}\cap\bar{S}_{g}=\emptyset$ , for $g\in\{F,M\}$ .

Appendix C Proof that $A_{ij}$ follows a Poisson distribution

If an individual worker $i$ only searched for a job once, then the probability of worker $i$ matching with job $j$ would be equal to $\mathbb{P}_{ij}=\mathcal{P}_{\iota\gamma}d_{j}$ and $A_{ij}$ would follow a Bernoulli distribution:

A_{ij}\sim Bernoulli(\mathcal{P}_{\iota\gamma}d_{j}).

However, since worker $i$ searches for jobs $c_{i}\equiv\sum_{t=1}^{T}c_{it}$ times, $A_{ij}$ is actually the sum of $c_{i}$ Bernoulli random variables, and is therefore a Binomial random variable. Conditional on knowing $c_{i}$ ,

A_{ij}|c_{i}\sim Binomial(c_{i},\mathcal{P}_{\iota\gamma}d_{j}).

However, we still need to take into account the fact that $c_{i}$ is a Poisson-distributed random variable with arrival rate $d_{i}$ . Consequently, the unconditional distribution of $A_{ij}$ is Poisson as well:

A_{ij}\sim Poisson(d_{i}d_{j}\mathcal{P}_{\iota\gamma}).

We prove this fact by multiplying the conditional density of $A_{ij}|c_{i}$ by the marginal density of $c_{i}$ to get the joint density of $A_{ij}$ and $c_{i}$ , and then integrating out $c_{i}$ .

\displaystyle P(A_{ij},c_{i})=\underset{Bin(c_{i},d_{j}P_{\iota\gamma})}{%\underbrace{P(A_{ij}|c_{i})}}\quad\times\quad\underset{Poisson(d_{i})}{%\underbrace{P(c_{i})}}

Deriving the joint distribution:

\displaystyle P(A_{ij},c_{i})=

\displaystyle\binom{c_{i}}{A_{ij}}(d_{j}P_{\iota\gamma})^{A{ij}}(1-d_{j}P_{%\iota\gamma})^{c_{i}-A{ij}}\times\frac{d_{i}^{c_{i}}\exp{(-d_{i}})}{c_{i}!}

We want to find out the marginal distribution of $A_{ij}$ :

	$\displaystyle P(A_{ij})$	$\displaystyle=\sum_{c_{i}=0}^{\infty}P(A_{ij},c_{i})$
		$\displaystyle=\sum_{c_{i}=0}^{\infty}\binom{c_{i}}{A_{ij}}(d_{j}P_{\iota\gamma%})^{A{ij}}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}}\times\frac{d_{i}^{c_{i}}\exp{%(-d_{i}})}{c_{i}!}$
		$\displaystyle=\sum_{c_{i}=0}^{\infty}\frac{c_{i}!}{A_{ij}!(di-A_{ij})!}(d_{j}P%_{\iota\gamma})^{A{ij}}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}}\times\frac{d_{i}%^{c_{i}}\exp{(-d_{i}})}{c_{i}!}$
		$\displaystyle=\frac{(d_{j}P_{\iota\gamma})^{A{ij}}\exp{(-d_{i}})}{A_{ij}!}\sum%_{c_{i}=0}^{\infty}\frac{1}{(di-A_{ij})!}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}%}d_{i}^{c_{i}}$

If the summation term is equal to

\sum_{c_{i}=0}^{\infty}\frac{1}{(di-A_{ij})!}(1-d_{j}P_{\iota\gamma})^{c_{i}-A%{ij}}d_{i}^{c_{i}}=d_{i}^{A_{ij}}\exp{(d_{i}(1-d_{j}P_{\iota\gamma}))}

(16)

then $P(A_{ij})=\frac{(d_{i}d_{j}P_{\iota\gamma})^{A{ij}}\exp{(-d_{i}d_{j}P_{\iota%\gamma}})}{A_{ij}!}$ , i.e. $A_{ij}$ would be Poisson distributed:

A_{ij}\sim Poisson(d_{i}d_{j}P_{\iota\gamma})

Proving (16) is equivalent to proving the following equality:

\displaystyle 1=

\displaystyle\frac{1}{d_{i}^{A_{ij}}\exp{(d_{i}(1-d_{j}P_{\iota\gamma}))}}\sum%_{c_{i}=0}^{\infty}\frac{1}{(di-A_{ij})!}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}%}d_{i}^{c_{i}}

Proof:

	$\displaystyle d_{i}^{-A_{ij}}\exp{(-d_{i}(1-d_{j}P_{\iota\gamma}))}\sum_{c_{i}%=0}^{\infty}\frac{1}{(di-A_{ij})!}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}}d_{i}^%{c_{i}}=$
	$\displaystyle=\sum_{c_{i}=0}^{\infty}\frac{\exp{(-d_{i}(1-d_{j}P_{\iota\gamma}%))}}{(di-A_{ij})!}(1-d_{j}P_{\iota\gamma})^{c_{i}-A{ij}}d_{i}^{c_{i}-A_{ij}}$
	$\displaystyle=\sum_{c_{i}=0}^{\infty}\frac{\exp{(-d_{i}(1-d_{j}P_{\iota\gamma}%))}}{(di-A_{ij})!}(d_{i}(1-d_{j}P_{\iota\gamma}))^{c_{i}-A{ij}}$
	We assume $\lambda=d_{i}(1-d_{j}P_{\iota\gamma})$ for simplicity and we apply a change of variables $z=c_{i}-A_{ij}$
	$\displaystyle=\sum_{z=0}^{\infty}\frac{\exp{(-\lambda)}}{z!}\lambda^{z}\text{,% knowing that in our problem $c_{i}\geq A_{ij}$, i.e. $z\geq 0$}.$
	$\displaystyle=1$
	$\displaystyle\text{Since we have the p.d.f. of a Poisson r.v. inside the %summation, i.e. $z\sim Poisson(\lambda)$ }\square$

Therefore, we have

A_{ij}\sim Poisson(d_{i}d_{j}P_{\iota\gamma})\qed

Appendix D Soft assignment workers and jobs to worker types and markets

In section 3, at the maximum of our posterior in equation 3.2.2, each worker is assigned to only one skill cluster, a process of hard assignments. However, it is possible that, given the pattern of worker matches, a particular worker could be revealed to possess certain skills $\iota_{1}$ in most of her matches, and skills $\iota_{2}$ in a few other of her matches. Creating a single worker skill group to accommodate her hybrid skills might not improve model fit if there are only a few workers who exhibit similar matches. Instead, allowing her to have mixed skills $\iota_{1}$ and $\iota_{2}$ , i.e. soft assignment, with weights according to her matching history, provides further nuanced information to the researcher. In fact, we propose using the Bayesian setup in order to recover these weights.

It turns out that the posterior $P(\bm{b}|\bm{A},\bm{g})$ ultimately carries the desired measure of workers’ skill profile needed to control for workers’ unobserved skills in the wage gap estimation. Given a total of $I$ clusters of workers competing for the same jobs in the labor market network, i.e. with similar skills, the posterior distribution provides the chance of each worker to belong to a certain skill cluster, given the worker demographic group $g$ and the entire network $\bm{A}$ . More formally, for worker $i$ , her skills profile is defined as:

\displaystyle\vec{P}_{i}:=\left[P(i\in\iota_{1}|\bm{A},\bm{g})\qquad P(i\in%\iota_{2}|\bm{A},\bm{g})\qquad\cdots\qquad P(i\in\iota_{I}|\bm{A},\bm{g})%\right]^{T}

(17)

	$\displaystyle P(\bm{b}\|A_{ij},g)\quad\propto$	$\displaystyle\qquad P(A_{ij},g\|\bm{b})P(\bm{b})$
	$\displaystyle=$	$\displaystyle\quad\underset{Poisson(\Omega_{\iota\gamma}^{g}d_{i}^{g}d_{j}^{g}%)}{\underbrace{P(A_{ij}\|\bm{b},g)}}\underset{\alpha_{\iota\gamma}^{g}}{%\underbrace{P(g\|\bm{b})}}\underset{\text{Prior}}{\underbrace{P(\bm{b})}}$		(10)

	$\displaystyle\Delta_{X}:=E\left[Y_{ij}\|Matched,G_{i}=1\right]-E\left[Y_{1}(x_{%ij})\|Matched,G_{i}=0\right]$
	$\displaystyle\Delta_{0}:=E\left[Y_{ij}\|Matched,G_{i}=1\right]-E\left[Y_{1}(x_{%ij})\|Matched,G_{i}=0\right]$
	$\displaystyle\Delta_{M}:=\left\{E\left[Y_{ij}\|Unmatched,G_{i}=1\right]-E\left[%Y_{ij}\|Matched,G_{i}=1\right]\right\}P\left(Unmatched\|G_{i}=1\right)$
	$\displaystyle\Delta_{F}:=\left\{E\left[Y_{ij}\|Matched,G_{i}=0\right]-E\left[Y_%{ij}\|Unmatched,G_{i}=0\right]\right\}P\left(Unmatched\|G_{i}=0\right)$

University of Michigan, bmodene@umich.edu. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1256260. Any opinions, findings, and conclusions or recommendations expressed in (2024)

Abstract

1 Introduction

2 A framework for decomposition methods

3 Revealing latent worker and job heterogeneity using network theory

3.1 Economic model

3.1.1 Firm’s problem

3.1.2 Worker’s problem

3.2 Identifying worker types and markets

3.2.1 Deriving the likelihood

3.2.2 A Bayesian approach to recovering worker types and markets

4 Wage gap decomposition

5 Data

5.1 Administrative Brazilian data

6 Results

6.1 Aggregate wage gap decomposition

6.2 Wage gaps within worker type–market cells

7 Conclusion

References

Appendix A Nested Logit Choice Probability

Appendix B Terms in the NP decomposition

Appendix C Proof that Ai⁢jsubscript𝐴𝑖𝑗A_{ij}italic_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT follows a Poisson distribution

Appendix D Soft assignment workers and jobs to worker types and markets

References

Appendix C Proof that $A_{ij}$ follows a Poisson distribution