top of page

Objective: We will try to find out the skills and trends that are most sought in the industry right now. For this, we will scrape data from the job portal.

In this article, we will try to find answers to a few important questions, which every data science job seeker will have in mind.

  1. What are the top skills companies are looking for?

  2. What is the most desired experience level in the industry?

  3. What are the companies that are actively offering jobs in this field?

  4. What are the locations that have more openings?

 

theme.png

In this article, we will try to find answers to a few important questions, which every data science job seeker will have in mind.

  1. What are the top skills companies are looking for?

  2. What is the most desired experience level in the industry?

  3. What are the companies that are actively offering jobs in this field?

  4. What are the locations that have more openings?

 

1. Web- Scraping:

I have gathered all the relevant job information from the top job portal in India- Naukri.com, which almost every job aspirant and recruiter uses these days. I have used selenium-python for web scraping since the traditional BeautifulSoap approach somehow didn’t work well on this site.

 

1_3Bie163B1SA81VsJiow0dQ_edited.jpg

We will scrape theses five elements for each job: Role, Company name, Experience, Location and Key Skills.

 

2. Pre Processing:

Let’s do some basic preprocessing before we dive in.

​

2.1. Handling missing values:

Performed a basic cleaning of finding the missing values and dropping them.

​

2.2. Handling duplicate data:

We need to be really careful while handling duplicate data since a company might post the same requirement multiple times because the job is still open or on the other hand the company might be looking for a completely new opening with the same requirement. To keep it simple I’ve not dropped any data.

​

2.3. Tokenizing locations and skills columns

Converted all the strings to the lower case just to avoid redundancy and tokenized the locations and skills columns since there are more than one value in these columns.

This is how it looked after the preprocessing.

 

1_Dd18nwSHr5W3PdvUaj9OJA.png

3. Analysis:

Now, we just have everything to get started.

 

1_ZSCaHGjkW_VX91wClJfNwQ.png

3.1. Which location offers more openings? :

  1. If we observe the above plot, there is almost 38% of the jobs are located in Bengaluru.

  2. The top 4 cities namely Bengaluru, Mumbai, Hyderabad and Pune constitute almost 72% of total data science jobs in the country.

  3. So if you are from any of these cities, your chances of getting a Data scientist job is probably more than in other cities.

​

3.2. What Companies are actively recruiting? :

  1. Analytics Vidhya educon topped the list with almost 21% of total job listings.

  2. There are many consultancies on the list too. These consultancies usually conduct recruitment for their clients.

  3. In general, competition in job portals would be very high. Most of the time your profile might not even be viewed by the recruiter due to the huge amount of applications received. There are instances where even for a single vacancy, you will have to compete with hundreds of other applicants. It is better to know the companies who are recruiting actively so that we can apply directly through their official website which increases the probability of landing an interview.

​

 

1_2pQ93FEMocz5ifMLmCHSyw.png

3.3. What is the most desired Experience? :

  1. We can observe that companies are clearly looking for experienced candidates. There seems to be more vacancies for candidates with 5–10 years of experience. This makes sense since data scientists job involves key decision-making skills that come with experience.
    Candidates with at least 2 years of experience have fairly good opportunities.

  2. This doesn’t mean that freshers cant get in, it’s just that there are more openings for experienced candidates than freshers. Companies usually don't recruit freshers from these job portals, they will directly recruit them from campus recruitment. Freshers can always opt to work for startups to gain the necessary experience.

​

 

1_krRQ8kfU-gN-jI7lyMnX8g.png

3.4. What are the Roles in demand:

This is an important step to look into because after a few results, job portals usually start showing some other jobs that are irrelevant to the job we were searching for. Just to be assured that we are looking at the right roles, let’s check the top 10 frequently mentioned roles.

  1. If we observe in the previous section, there were more vacancies for people with more experience which leaves us a question of openings based on roles.

  2. Most of the vacancies are still termed as Data scientists. Followed by Senior Data scientist and Lead Data scientist which of course needs good previous experience.

 

1_Oa0UExBm5JQKmXopYM0pOw.png

3.5. Skills that companies are looking for:

Looks very complex right, don't worry I will break it down in the latter part. The reason I have included many skills in the plot is due to the vast areas involved in Data science.
Though we were able to depict some top skills in the above plot, it still doesn't serve the purpose of this analysis.

 

1_rVKLkwXQMh3LJBsx9ZrIAw.png

3.5.1. Must-Have Skills?:

  1. Machine learning, this is no surprise as the most important skill to have for a data scientist.

  2. Data mining and Data analysis are the key activities that every data scientist has to go through.

  3. Strong statistical modeling is required to be a better data scientist.​

  4. Companies are expecting a good knowledge of deep learning since it provides the state of the art techniques to solve some interesting real-time problems in fields like NLP and Computer Vision.

  5. Employers are expecting the candidates to have knowledge of big data technologies due to the huge rise in the amount of data recorded every day. In real-time, we might be working on huge datasets where these skills will definitely come in handy.

1_Iu3uYIsrnPUYPbKzwxF6dw.png

3.5.2. Programming Language in demand? :

  1. If you are starting out to learn Data Science, In the beginning, you'll definitely find it hard to choose the right programming language. Though there are many languages, the competition has always been among Python and R itself. Let's see what data is telling us.

  2. The industry is still in favor of Python due to its rich libraries followed by the R language.

  3. SQL is a must for every data scientist. Though it doesn't fit to be treated as programming language I still included here by taking my chances :).

  4. After python and R there seems to be good demand for SAS and C++ languages.

1_qVa5-LpGdKfR9OYOGUuiXg.png

3.5.3. Deep learning Framework to opt for? :

  1. Due to the sudden rise in deep learning, many deep learning frameworks came into the market from giants like Google and Facebook.

  2. The industry is in favor of Tensorflow over PyTorch.

  3. Keras has its good share in the market, people love it because of its simple and easy nature.

  4. Though there are many other frameworks like Caffe, Maxnet there seem to be not many openings. If not in the world, at

1_omyHXVY7RS-JvadfXPs9yw.png

3.5.4. Which big data technology has the edge?

  1. Spark tops the list. One can go for python version of spark -Pyspark.

  2. Hadoop is with almost the same opportunities as spark only with a minor difference.

  3. There are considerable openings with hive too.

1_izBjsbO8Q2B5Gb-75RIPcA.png

3.5.5. Which Cloud provider is in demand for ML?

  1. Training the models involves huge computations which can easily get very expensive. Companies are in a search for cheaper ways to get the work done, that is where these cloud platforms came into the picture.

  2. AWS tops the list followed by Azure.

  3. Companies are moving quickly towards cloud options. There are more chances that these technologies will play a major role in the coming days in Data science.

1_xunequz0bKMzys0rywLesw.png

3.5.6. Data Visualization Tool in demand?

  1. Employers are showing more interest in Tableau for data visualization.

  2. While Microsoft's Power BI is still lagging behind.

1_aTkuPPc3om37zZ1Zhrp4Xw.png

Final Words:

If you are good with all the mentioned must-have skills for a data scientist then the best approach should be to start attending the interviews and meanwhile try to fill the gaps in your understanding and learn the tools/technologies you feel will give you an edge over other candidates.
You can find the complete code on my GitHub. You can connect to me on Linkedin.
If you find this helpful or have any questions do let me know .
See you later. Happy Coding..!

bottom of page