Monday, January 18, 2021

Data Science Roadmap

 


Data science is a very vital field, we must know multi-field knowledge to master in data science. Here I am going to discuss the complete roadmap of the data science field.

I have broken down this process into 9 easy steps, and also given you the idea about how you can learn those things.



image source: https://www.edureka.co/blog/how-to-become-a-data-scientist/

Disclaimer
The roadmap defined is prepared based on my little experience in data science. This is not the be-all and end-all learning plan. The roadmap may change to better suit any specific domain/field of study. Also, this is created keeping python in mind as I personally prefer to use python.

Below are the following steps:- 

1 – Good knowledge of statistics and mathematics:

Statistical methods are a central part of data science. Almost all the data science interviews predominantly focus on descriptive and inferential statistics.

Try to master these topics: Descriptive statistics, inferential statistics, Linear algebra, and Calculus. 

Resources:

[Book]Practical statistics for data science (highly recommend) — A thorough guide on all the important statistical methods along with clean and concise applications/examples.

[Book] Naked Statistics — a non-technical but detailed guide to understanding the impact of statistics on our routine events, sports, recommendation systems, and many more instances.

Statistical thinking in Python — a foundation course to help you start thinking statistically. There is a second part to this course as well.

2 – Learning coding (R | Python | Julia | Scala):

Make sure you have sound programming skills. You should be an expert in at least one of the programming languages.

Specific topics include:

Common data structures (data types, lists, dictionaries, sets, and tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and working with external libraries.

Resources for python:

learnpython.org [free] — a free resource for beginners. It covers all the basic programming topics from scratch. You get an interactive shell to practice those topics side-by-side.

Kaggle [free] — a free and interactive guide to learn python. It is a short tutorial covering all the important topics for data science.

Python Course by freecodecamp on YouTube [free] — This is a 5-hour course that you can follow to practice the basic concepts.

Intermediatepython [free] — another free course by Patrick featured on freecodecamp.org.

3 – Knowledge of database:

Make sure you have good knowledge of databases like SQL, MySQL, and other structured databases. Try to get some basic idea about non-structured database like – MongoDB.

SQL scripting: Querying databases using joins, aggregations, and subqueries.

Resource:

Intro to SQL and Advanced SQL on Kaggle.

Datacamp also offers many courses on SQL.

4 – Master data wrangling, visualization, EDA, reporting, and storytelling:

The principal of data science is centered on finding appropriate data that can help you to solve the problem. You can collect data from different sources like scraping (if the website allows), API’s, Databases, Publically available repositories.

Once you have the data in your hand, you can start leveraging the data. Initially, data may have noise or we can say that data is rarely clean and formatted in the real world.

In python, there are two libraries i.e Pandas and NumPy at your disposal to go from dirty data to ready-to-analyze data.

The next stratum to master is data analysis and storytelling. Drawing insights from the data and then communicating the same to the management in simple terms and visualizations is the core responsibility of a Data Analyst.

The storytelling part requires you to be proficient with data visualization along with excellent communication skills.

Specific topics:

Exploratory data analysis — defining questions, handling missing values, outliers, formatting, filtering, univariate and multivariate analysis.

Data visualization — plotting data using libraries like matplotlib, seaborn, and plotly. Knowledge to choose the right chart to communicate the findings from the data.

Resources:

Data Manipulation using pandas [fee] —an interactive course from datacamp that can quickly get you started with manipulating data using pandas. Learn to add transformations, aggregations, subsetting, and indexing dataframes.

Kaggle pandas tutorial [free] — A short and concise hands-on tutorial that will walk you through commonly used data manipulation skills.

Data cleaning course by Kaggle.

Coursera course on Introduction to Data Science in Python [fee] — this is the first course in the Applied Data Science with Python Specialization.

Data Analysis with Python — by IBM on Coursera. The course covers wrangling, exploratory analysis, and simple model development using python.

Data Visualization — by Kaggle. Another interactive course that lets you practice all the commonly used plots.

Data Visualization in Spreadsheets, Excel, Tableau, and Power BI— pick anyone.

5 – Machine Learning and AI:

After grilling yourself through all the major aforementioned concepts, you are now ready to get started with the fancy ML algorithms.

There are three major types of learning:

Supervised Learning — Includes regression and classification problems. Study simple linear regression, multiple regression, polynomial regression, naive Bayes, logistic regression, KNNs, tree models, ensemble models. Learn about evaluation metrics.

Unsupervised Learning — Clustering and dimensionality reduction are the two widely used applications of unsupervised learning. Dive deep into PCA, K-means clustering, hierarchical clustering, and Gaussian mixtures.

Reinforcement learning — helps you build self-rewarding systems. Learn to optimize rewards, using the TF-Agents library, creating Deep Q-networks, etc.

Resources:

[book] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition — one of my all-time favorite books on machine learning. Doesn’t only cover the theoretical mathematical derivations but also showcases the implementation of algorithms through examples. You should solve the exercises given at the end of each chapter.

Machine Learning Course by Andrew Ng — the go-to course for anyone trying to learn machine learning. Hands down!

Introduction to Machine Learning — Interactive course by Kaggle.

Intro to Game AI and Reinforcement learning — another interactive course on Kaggle on reinforcement learning.

6 – Level up with big data and deep learning:

Now it’s time to level up and boost your knowledge. Learn deep learning, there are various frameworks for deep learning like TensorFlow and PyTorch.

Learn Scala, Hadoop, and Hive, these are important when you are working on a huge amount of data that time you have to work on a distributed system.

Resources:

[book] Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition — one of my all-time favorite books on machine learning. Doesn’t only cover the theoretical mathematical derivations but also showcases the implementation of algorithms through examples. You should solve the exercises given at the end of each chapter.

[book]Fundamentals of Deep Learning: Designing Next-Generation Machine  IntelligenceAlgorithms

[video]Krish Naik – You can follow the video tutorial.

7 – Try to get experience, practice more and more, and meet follow data science platform:

Participate on various hackathons on Kaggle, Analytic Vidhya, and other platforms.

Work on various projects, try to make an end to end project.

Try to connect with Data scientists on LinkedIn, Twitter, and other platforms, read blogs, and research papers.    

8 – Try to get an internship and job:

Try to get some good opportunities like internships, search opportunities on LinkedIn, Internshala, and other platforms.

Make an attractive resume and Portfolio.

Prepare hard for interviews.

9 – Follow and engage with the data science community:

Nowadays, data science communities are emerging and spreading all over. Try to get in touch with the community and follow all those to become a great Data Scientist.


All the best for your future

HAPPY LEARNING :-)



Saturday, January 16, 2021

Data Science and Analytics


Data Science – A True Definition:

Data science is the study of data that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from many structural and unstructured data.
Nowadays data is everything, we can say that data can tell everything, we just need the right perspective, set of skills, and knowledge to understand the hidden figures on data.
For example – we are surrounded by a huge amount of data like structured and unstructured – so my first question is can we understand everything’s about the data that we have or we see? Yes, we can understand some basic things, but can we understand all the hidden figures?
So, data science is the field which all related to data, the much you dig into this, the much you can understand from it.


Brief: Data Science Field: 

Data Science is the field of applying advanced analytics techniques, scientific and programming principles to extract important information from data for business decision-making, strategic planning, risk management, and other uses.
The insights that data science generates help organizations increase operational efficiency, improve business opportunities, improve marketing and sales programs.


Why Data Science Important: 

Data science plays an important role in all aspects of business operations and strategies. For example, it provides information about customers that helps companies create stronger marketing campaigns and increase product sales. It manages financial risks, fraud detection. It helps block cyber-attacks and other security threats in the IT system.
On a more fundamental level, they point the way to increased efficiency and reduced costs. Data science also enables companies to create business plans and strategies that are based on an informed analysis of customer behavior, market trends, and competition, without it, businesses may miss opportunities and make flawed decisions.
Data science is a vital area, it just not belong to business problems. In healthcare, its uses include diagnosis of medical conditions, image analysis, treatment planning, and medical research. Sports teams analyze player performance and plan game strategies via data science. Government agencies and public policy organizations are also big users.


Data Science Process and Life cycle:

Data science projects involve a series of data collection, analysis, and visualization steps. There are six primary steps to understand the process and lifecycle of data science.

1. Identify a business-related hypothesis to test.
2. Gather data and prepare it for analysis.
3. Experiment with different statistical analytical models.
4. Pick the best model and run it against the data.
5. Present the results to business executives.
6. Deploy the model for ongoing use with fresh data


Benefits of Data Science: 

Generally speaking, one of data science's biggest benefits is to empower and facilitate better decision-making. Organizations that invest in it can factor quantifiable, data-based evidence into their business decisions. Ideally, such data-driven decisions will lead to stronger business performance, cost savings, and smoother business processes and workflows.
One of the advantages of data science is that organizations can find when and where their products sell best. This can help deliver the right products at the right time—and can help companies develop new products to meet their customers' needs. Personalized customer experiences.


Data Science Applications and Uses:

Common applications that data scientists engage in include predictive modeling, pattern recognition, anomaly detection, fraud detection, classification, categorization, sentiment analysis, time series analysis, and object detection as well as the development of technologies such as recommendation engines, personalization systems, and artificial intelligence (AI) tools like chatbots and autonomous vehicles and machines.
Those applications derive a wide variety of use cases in organizations, including the followings:
1. Customer lifetime value analysis
2. Fraud detection
3. Risk management
4. Stock analysis
5. Image Classification
6. Object Recognition
7. Speech Recognition
8. Natural Language Processing
9. Medical Diagnosis
10. Cybersecurity



Data Science for Marketing and Planning

Data science can be applied in marketing and planning to help organizations make better decisions by analyzing large amounts of data from va...