In the past, research was undertaken to understand the employability of PhD graduates. This research mainly consisted of surveying and interviewing employers and academics about the ability of graduates for the workplace. The problem with this method of research is that it is unable to measure, evaluate and track the employer demand for PhD graduates. A research project was undertaken by ANU University and Data61 to utilise machine learning (ML) to read job advertisements to find roles that may be suited to PhD graduates, that may not otherwise be identified in the ad.

PhD programs were originally designed to prepare candidates for a life in academia. Practically, a large number of PhD graduates are finding work outside of academia. The problem is that many job advertisements do not specify  PhD as a keyword or a required qualification in the job advertisement. But the role could be suited to a PhD graduate. Job ads are a rich and robust source of data about employer needs and expectations, but without ML and natural language processing (NLP), there is no simple, accurate, and objective way to measure non-academic employer demand for PhD graduates.

The project sought to use a ‘big data’ approach by adopting machine learning (ML) and natural language processing (NLP) to ‘read’ a large set of job ads and assess the employer needs for graduates with research skills. ML and NLP was able to reveal the nature and extent of the ‘hidden job market’ for PhD graduates in a big dataset consisting of 29,693 authentic job ads.

A job ad outlines the type of work that is required and the type of person that the employer would like to hire. But the problem is that employers do not use “PhD” as a keyword in a job ad. This is evident by the fact that only 20.7% of non-academic job ads asked for a PhD qualification, yet as many as 43 per cent required a high level of research skills and capabilities that are indicative of a PhD graduates capabilities.

How machine learning is solving this problem?

Researchers had developed machine learning algorithms that scan thousands of job ads and look for jobs that would be suitable for PhD graduates. The machine reads the job ads and assesses the level of research skills that are required for each job. The machine learning algorithm identified that half of all job ads that were scanned specified a need for a high level of education including research skills. The algorithm identified that PhD graduates do have the skill sets required for many industries, particularly if the role had an emphasis on research.

How was ML Used?

This research enacted an expert-annotated Gold Standard (GS), using the Research Skills Annotation Schema, on approximately 500 ads for ML. An evaluation of the applicability of Support Vector Machines (SVMs) and Conditional Random Fields (CRFs) to automate the tasks on a test set of ads. An average swapped pairs percentage (ASP%) was used, which is a measure of the ranking error that takes values from zero for the best performance to 100 for the worst performance.

The process to develop the machine learning entailed

1) convening an expert workshop to help develop the initial ontology

2) four iterations of hand annotations by expert coders to refine an annotation schema

3) extraction of a final expert-annotated set of unique ads which was declared as the Gold Standard

4) experimenting the ML-based NLP algorithms towards learning to automate the data annotation process.

Future applications of the machine learning algorithm will be to develop a web portal to support graduates in their search for work.

Black Belt Digital can work with you to understand your business problem better. We can determine the best use of Machine Learning for your business problem.