Revealing New Skills Trends in Emerging Economies: The Power of Online Data and NLP techniques

By Verónica Escudero and Franziska Riepl.

This blog posting was first published by the ILO in August 2024 and is available here.

In a rapidly evolving job landscape marked by technological advancements and the transition to a more sustainable economy, acquiring and adapting to the right skills has become imperative. Possessing these skills can increase the resilience to global transformations and shocks, while their absence increases the risk of unfavourable labour market outcomes, leaving individuals vulnerable to poverty and exclusion. Recognising this urgency, institutions and governments have elevated skills development and lifelong learning to policy priorities (ILO 2023; OECD 2023; UNESCO UIL 2020; European Commission 2020).

Scholarly research in Europe and the United States has extensively used skills taxonomies, which aim to capture trends in broadly applicable skills groups. This research leverages granular data from occupational classification systems, such as U.S. O-NET and ESCO, as well as vacancy data, to study skills dynamics and their effect on wages and employment (e.g., Autor, Levy, and Murnane 2003; Acemoglu and Autor 2011; Deming and Kahn 2018; Atalay et al. 2020). However, knowledge on skills dynamics outside these regions remains limited due to data constraints. Existing skills classifications and taxonomies are not easily transferable to diverse country-specific contexts. While efforts like the PIAAC and STEP skills measurement surveys (see OECD 2019 and World Bank 2014, respectively) provide some insight, they are limited in scope and coverage.

This brief presents an innovative solution to these challenges by introducing a conceptual framework and a methodology to leverage big data, originally developed in Escudero, Liepmann, and Podjanin (Forthcoming) and further refined in ongoing work at the ILO Research Department in preparation of the forthcoming World Employment and Social Outlook (WESO) Report on Lifelong Learning and Skills Dynamics. 2 The conceptual framework categorizes tasks performed on the job into skills categories and subcategories, which coupled with a natural language processing (NLP) methodology allows to extract skills information from unstructured online vacancy data.

The taxonomy captures transferable skills that can be applied across various occupations, rather than technical occupation-specific skills. It comprises 15 unique skills subcategories across cognitive, socioemotional, and manual categories, which are tailored to low- and middle-income economies and adaptable to country-specific contexts. It captures all sectors and occupations, including manual labour, and can also be applied to applicant data to study skills supply and mismatch as well as the relationship between skills and job transitions. The skills subcategories are designed to be comprehensive enough for high-level analyses while encompassing a broad set of skills across various occupations.

This methodology aims to shed light on skills dynamics in previously understudied economies, as job board data is now available across numerous countries and years. These insights will empower governments, businesses, and individuals to target skills development efforts more effectively, fostering resilient economies and promoting decent work for all.

The remainder of this brief outlines the key elements of the methodology and demonstrates its implementation with data from Uruguay and South Africa. The taxonomy and accompanying methodology form the basis for future work on skills within the ILO, with further analyses based on data from Brazil and the Russian Federation already under way.