Aim

I took the job description for the Data Scientist role at iManage and wanted to see if I could find similar sentence clusters in the description text for the job.

The original link to the job description is here

Why would I do this?

Sometimes clustering sentences is a useful approach to take when we want to get a summary of what is being talked about and identify any possible themes.

Process

The Jupyter notebook is on my GitHub here, however the broad approach I took was to:

  • Extract the job description from LinkedIn using Beautiful Soup and requests
  • Do a little bit of text cleaning
  • Use spaCy’s pre trained model to extract sentences
  • Use Tensorflow Hub to get the high dimensional vectors for each sentence (I used Google’s Universal Sentence Encoder in this instance)
  • Used PCA to reduce the dimensionality to n_components = 2
  • Fit a K Means clustering model on the data, assumed n_clusters = 3 to begin with

Results

There are three broad clusters in the job description:

  • iManage Values & Culture - the values and culture of iManage and how this expressed
  • The roles & responsibilities of the job - the overarching responsibilities and the contribution required from a Data Scientist
  • The Technical / Data Science skills needed - the specific technical skills associated with the role

I’ve visualised this in a Plotly Scatter plot below, feel free to tinker with it.

I hope you enjoyed this super short demo and let me know if you want to take things further Thanks! Sam

Plot