Data Science : Data Science Roadmap

Data science is a very vital field, we must know multi-field knowledge to master in data science. Here I am going to discuss the complete roadmap of the data science field.

I have broken down this process into 9 easy steps, and also given you the idea about how you can learn those things.

image source: https://www.edureka.co/blog/how-to-become-a-data-scientist/

Disclaimer:

The roadmap defined is prepared based on my little experience in data science. This is not the be-all and end-all learning plan. The roadmap may change to better suit any specific domain/field of study. Also, this is created keeping python in mind as I personally prefer to use python.

Below are the following steps:-

1 – Good knowledge of statistics and mathematics:

Statistical methods are a central part of data science. Almost all the data science interviews predominantly focus on descriptive and inferential statistics.

Try to master these topics: Descriptive statistics, inferential statistics, Linear algebra, and Calculus.

Resources:

[Book]Practical statistics for data science (highly recommend) — A thorough guide on all the important statistical methods along with clean and concise applications/examples.

[Book] Naked Statistics — a non-technical but detailed guide to understanding the impact of statistics on our routine events, sports, recommendation systems, and many more instances.

Statistical thinking in Python — a foundation course to help you start thinking statistically. There is a second part to this course as well.

2 – Learning coding (R | Python | Julia | Scala):

Make sure you have sound programming skills. You should be an expert in at least one of the programming languages.

Specific topics include:

Common data structures (data types, lists, dictionaries, sets, and tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and working with external libraries.

Resources for python:

learnpython.org [free] — a free resource for beginners. It covers all the basic programming topics from scratch. You get an interactive shell to practice those topics side-by-side.

Kaggle [free] — a free and interactive guide to learn python. It is a short tutorial covering all the important topics for data science.

Python Course by freecodecamp on YouTube [free] — This is a 5-hour course that you can follow to practice the basic concepts.

Intermediatepython [free] — another free course by Patrick featured on freecodecamp.org.

3 – Knowledge of database:

Make sure you have good knowledge of databases like SQL, MySQL, and other structured databases. Try to get some basic idea about non-structured database like – MongoDB.

SQL scripting: Querying databases using joins, aggregations, and subqueries.

Resource:

Intro to SQL and Advanced SQL on Kaggle.

Datacamp also offers many courses on SQL.

4 – Master data wrangling, visualization, EDA, reporting, and storytelling:

The principal of data science is centered on finding appropriate data that can help you to solve the problem. You can collect data from different sources like scraping (if the website allows), API’s, Databases, Publically available repositories.

Once you have the data in your hand, you can start leveraging the data. Initially, data may have noise or we can say that data is rarely clean and formatted in the real world.

In python, there are two libraries i.e Pandas and NumPy at your disposal to go from dirty data to ready-to-analyze data.

The next stratum to master is data analysis and storytelling. Drawing insights from the data and then communicating the same to the management in simple terms and visualizations is the core responsibility of a Data Analyst.

The storytelling part requires you to be proficient with data visualization along with excellent communication skills.

Specific topics:

Exploratory data analysis — defining questions, handling missing values, outliers, formatting, filtering, univariate and multivariate analysis.

Data visualization — plotting data using libraries like matplotlib, seaborn, and plotly. Knowledge to choose the right chart to communicate the findings from the data.

Resources:

Data Manipulation using pandas [fee] —an interactive course from datacamp that can quickly get you started with manipulating data using pandas. Learn to add transformations, aggregations, subsetting, and indexing dataframes.

Kaggle pandas tutorial [free] — A short and concise hands-on tutorial that will walk you through commonly used data manipulation skills.

Data cleaning course by Kaggle.

Coursera course on Introduction to Data Science in Python [fee] — this is the first course in the Applied Data Science with Python Specialization.

Data Analysis with Python — by IBM on Coursera. The course covers wrangling, exploratory analysis, and simple model development using python.

Data Visualization — by Kaggle. Another interactive course that lets you practice all the commonly used plots.

Data Visualization in Spreadsheets, Excel, Tableau, and Power BI— pick anyone.

5 – Machine Learning and AI:

After grilling yourself through all the major aforementioned concepts, you are now ready to get started with the fancy ML algorithms.

There are three major types of learning:

Supervised Learning — Includes regression and classification problems. Study simple linear regression, multiple regression, polynomial regression, naive Bayes, logistic regression, KNNs, tree models, ensemble models. Learn about evaluation metrics.

Unsupervised Learning — Clustering and dimensionality reduction are the two widely used applications of unsupervised learning. Dive deep into PCA, K-means clustering, hierarchical clustering, and Gaussian mixtures.

Reinforcement learning — helps you build self-rewarding systems. Learn to optimize rewards, using the TF-Agents library, creating Deep Q-networks, etc.

Resources:

[book] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition — one of my all-time favorite books on machine learning. Doesn’t only cover the theoretical mathematical derivations but also showcases the implementation of algorithms through examples. You should solve the exercises given at the end of each chapter.

Machine Learning Course by Andrew Ng — the go-to course for anyone trying to learn machine learning. Hands down!

Introduction to Machine Learning — Interactive course by Kaggle.

Intro to Game AI and Reinforcement learning — another interactive course on Kaggle on reinforcement learning.

6 – Level up with big data and deep learning:

Now it’s time to level up and boost your knowledge. Learn deep learning, there are various frameworks for deep learning like TensorFlow and PyTorch.

Learn Scala, Hadoop, and Hive, these are important when you are working on a huge amount of data that time you have to work on a distributed system.

Resources:

[book]Fundamentals of Deep Learning: Designing Next-Generation Machine IntelligenceAlgorithms

[video]Krish Naik – You can follow the video tutorial.

7 – Try to get experience, practice more and more, and meet follow data science platform:

Participate on various hackathons on Kaggle, Analytic Vidhya, and other platforms.

Work on various projects, try to make an end to end project.

Try to connect with Data scientists on LinkedIn, Twitter, and other platforms, read blogs, and research papers.

8 – Try to get an internship and job:

Try to get some good opportunities like internships, search opportunities on LinkedIn, Internshala, and other platforms.

Make an attractive resume and Portfolio.

Prepare hard for interviews.

9 – Follow and engage with the data science community:

Nowadays, data science communities are emerging and spreading all over. Try to get in touch with the community and follow all those to become a great Data Scientist.

All the best for your future

HAPPY LEARNING :-)

Data Science

Monday, January 18, 2021

Data Science Roadmap

1 comment:

Data Science for Marketing and Planning

Search This Blog