Aim
I took the job description for the Data Scientist role at iManage and wanted to see if I could find similar sentence clusters in the description text for the job.
The original link to the job description is here
Why would I do this?
Sometimes clustering sentences is a useful approach to take when we want to get a summary of what is being talked about and identify any possible themes.
Process
The Jupyter notebook is on my GitHub here, however the broad approach I took was to:
- Extract the job description from LinkedIn using Beautiful Soup and requests
- Do a little bit of text cleaning
- Use spaCy’s pre trained model to extract sentences
- Use Tensorflow Hub to get the high dimensional vectors for each sentence (I used Google’s Universal Sentence Encoder in this instance)
- Used PCA to reduce the dimensionality to n_components = 2
- Fit a K Means clustering model on the data, assumed n_clusters = 3 to begin with
Results
There are three broad clusters in the job description:
- iManage Values & Culture - the values and culture of iManage and how this expressed
- The roles & responsibilities of the job - the overarching responsibilities and the contribution required from a Data Scientist
- The Technical / Data Science skills needed - the specific technical skills associated with the role
I’ve visualised this in a Plotly Scatter plot below, feel free to tinker with it.
I hope you enjoyed this super short demo and let me know if you want to take things further Thanks! Sam