Data science is a very vital field, we must know multi-field
knowledge to master in data science. Here I am going to discuss the
complete roadmap of the data science field.
I have broken down this process into 9 easy steps, and also given you the idea about how you can learn those things.
1 – Good knowledge of statistics and mathematics:
Statistical methods are a central part of data science. Almost all the data science interviews predominantly focus on descriptive and inferential statistics.
Try to master these topics: Descriptive statistics, inferential statistics, Linear algebra, and Calculus.
Resources:
[Book]Practical statistics for data science (highly recommend) — A thorough guide on all the important statistical methods along with clean and concise applications/examples.
[Book]
Naked Statistics — a non-technical but detailed guide to understanding the impact
of statistics on our routine events, sports, recommendation systems, and many
more instances.
Statistical thinking in Python — a foundation course to help you start thinking
statistically. There is a second part to this course as well.
2 – Learning coding (R | Python | Julia | Scala):
Make sure you have sound programming skills. You should be an expert in at least one of the programming languages.
Specific topics include:
Common data structures (data types, lists, dictionaries,
sets, and tuples), writing functions, logic, control flow, searching and
sorting algorithms, object-oriented programming, and working with external
libraries.
Resources for python:
learnpython.org [free] — a free resource for beginners. It covers all the basic programming topics from scratch. You get an interactive shell to practice those topics side-by-side.
Kaggle [free] — a free and interactive guide to learn python. It is a short tutorial covering all the important topics for data science.
Python Course by freecodecamp on YouTube [free] — This is a 5-hour course that you can
follow to practice the basic concepts.
Intermediatepython [free] — another free course by Patrick featured on freecodecamp.org.
3 – Knowledge of database:
Make sure
you have good knowledge of databases like SQL, MySQL, and other structured databases. Try to get some basic idea about non-structured database like –
MongoDB.
SQL scripting: Querying databases using joins, aggregations,
and subqueries.
Resource:
Intro to SQL and Advanced SQL on Kaggle.
Datacamp
also offers many courses on SQL.
4 – Master data wrangling, visualization, EDA, reporting, and storytelling:
The principal of data science is centered on finding appropriate
data that can help you to solve the problem. You can collect data from
different sources like scraping (if the website allows), API’s, Databases,
Publically available repositories.
Once you have the data in your hand, you can start
leveraging the data. Initially, data may have noise or we can say that data is
rarely clean and formatted in the real world.
In python, there are two libraries i.e Pandas and NumPy
at your disposal to go from dirty data to ready-to-analyze data.
The next stratum to master is data analysis and
storytelling. Drawing insights from the data and then communicating the same to
the management in simple terms and visualizations is the core responsibility of
a Data Analyst.
The storytelling part requires you to be proficient with
data visualization along with excellent communication skills.
Specific topics:
Exploratory
data analysis — defining questions, handling missing values, outliers,
formatting, filtering, univariate and multivariate analysis.
Data
visualization — plotting data using libraries like matplotlib, seaborn, and
plotly. Knowledge to choose the right chart to communicate the findings from
the data.
Resources:
Data Manipulation using pandas [fee] —an interactive course from datacamp that can
quickly get you started with manipulating data using pandas. Learn to add
transformations, aggregations, subsetting, and indexing dataframes.
Kaggle pandas tutorial [free] — A short and concise hands-on tutorial that will walk you
through commonly used data manipulation skills.
Data cleaning course by Kaggle.
Coursera course on Introduction to Data Science in Python [fee] — this is the first
course in the Applied Data Science with Python Specialization.
Data Analysis with Python — by IBM on Coursera. The course covers wrangling, exploratory
analysis, and simple model development using python.
Data Visualization — by Kaggle. Another interactive course that lets you practice
all the commonly used plots.
Data Visualization in Spreadsheets, Excel, Tableau, and Power BI— pick anyone.
5 – Machine Learning and AI:
After grilling yourself through all the major aforementioned
concepts, you are now ready to get started with the fancy ML algorithms.
There are three major types of learning:
Supervised
Learning — Includes regression and classification problems. Study simple linear
regression, multiple regression, polynomial regression, naive Bayes, logistic
regression, KNNs, tree models, ensemble models. Learn about evaluation metrics.
Unsupervised
Learning — Clustering and dimensionality reduction are the two widely used
applications of unsupervised learning. Dive deep into PCA, K-means clustering,
hierarchical clustering, and Gaussian mixtures.
Reinforcement
learning — helps you build self-rewarding systems. Learn to optimize rewards,
using the TF-Agents library, creating Deep Q-networks, etc.
Resources:
[book]
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
— one of my all-time favorite books on machine learning. Doesn’t only cover the
theoretical mathematical derivations but also showcases the implementation of
algorithms through examples. You should solve the exercises given at the end of
each chapter.
Machine Learning Course by Andrew Ng — the go-to course for anyone trying to learn
machine learning. Hands down!
Introduction to Machine Learning — Interactive course by Kaggle.
Intro to Game AI and Reinforcement learning — another interactive course on Kaggle on
reinforcement learning.
6 – Level up with big data and deep learning:
Now it’s time to level up and boost your knowledge. Learn
deep learning, there are various frameworks for deep learning like TensorFlow
and PyTorch.
Learn Scala, Hadoop, and Hive, these are important when you are working on a huge amount of data that time you have to work on a distributed system.
Resources:
[book]
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
— one of my all-time favorite books on machine learning. Doesn’t only cover the
theoretical mathematical derivations but also showcases the implementation of
algorithms through examples. You should solve the exercises given at the end of
each chapter.
[book]Fundamentals of Deep Learning: Designing Next-Generation Machine IntelligenceAlgorithms
[video]Krish Naik – You can follow the video tutorial.
7 – Try to get experience, practice more and more, and meet follow data science platform:
Participate on various hackathons on Kaggle, Analytic Vidhya, and other platforms.
Work on
various projects, try to make an end to end project.
Try to connect with Data scientists on LinkedIn, Twitter, and other platforms, read blogs, and research papers.
8 – Try to get an internship and job:
Try to get some good opportunities like internships, search opportunities on LinkedIn, Internshala, and other platforms.
Make an attractive resume and Portfolio.
Prepare hard for interviews.
9 – Follow and engage with the data science community:
Nowadays, data science communities are emerging and spreading all over. Try to get in
touch with the community and follow all those to become a great Data Scientist.
All the best for your future
HAPPY LEARNING :-)
Thanks a lot for sharing this wonderful content this will help the freshers to start the journey.
ReplyDelete