Aim

I took the job description for the Data Scientist role at iManage and wanted to see if I could find similar sentence clusters in the description text for the job.

The original link to the job description is here

Why would I do this?

Sometimes clustering sentences is a useful approach to take when we want to get a summary of what is being talked about and identify any possible themes.

Process

The Jupyter notebook is on my GitHub here, however the broad approach I took was to:

Extract the job description from LinkedIn using Beautiful Soup and requests
Do a little bit of text cleaning
Use spaCy’s pre trained model to extract sentences
Use Tensorflow Hub to get the high dimensional vectors for each sentence (I used Google’s Universal Sentence Encoder in this instance)
Used PCA to reduce the dimensionality to n_components = 2
Fit a K Means clustering model on the data, assumed n_clusters = 3 to begin with

Results

There are three broad clusters in the job description:

iManage Values & Culture - the values and culture of iManage and how this expressed
The roles & responsibilities of the job - the overarching responsibilities and the contribution required from a Data Scientist
The Technical / Data Science skills needed - the specific technical skills associated with the role

I’ve visualised this in a Plotly Scatter plot below, feel free to tinker with it.

I hope you enjoyed this super short demo and let me know if you want to take things further Thanks! Sam

iManage Data Scientist Job Description on LinkedIn

Aim

Why would I do this?

Process

Results

Plot

iManage

Data Scientist Job Description on LinkedIn