Data Science : March 2021

Wednesday, March 24, 2021

Decision Trees Basics

Decision tree learning is one of the predictive modeling approaches used in statistics, data mining, and machine learning. It uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves). Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Decision trees are among the most popular machine learning algorithms given their intelligibility and simplicity.

The Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too.

The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).

Let us take an example: - A men gets an offer from a company, and we are building a decision tree, does he accepts the offer or not?

Suppose the first feature is 'salary': how much salary did he get? In the above diagram, our first feature is salary, and it ranges between (50k - 80k) $. There are two conditions based on this salary range, the first is that the salary is in the middle of the range, then the men accept the offer otherwise, he rejects the offer.

So far, we have satisfied one condition, but have some more features. Our second feature is the 'office near the house': is the office near the house or not? If yes, he accepts the offer otherwise he rejects the offer.

The third feature is whether the company provides a 'cab facility' or not? If yes, he accepts the offer otherwise he rejects the offer.

Therefore, based on the features, we created a tree base structure and decided whether the proposal was accepted or not.

Important Terminology related to Decision Trees

Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets.

Splitting: It is a process of dividing a node into two or more sub-nodes.

Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node.

Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.

Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting.

Branch / Sub-Tree: A subsection of the entire tree is called a branch or sub-tree.

Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.

So far, we have covered the basic understanding of the decision tree algorithm and its terminology. In an upcoming blog, we will discuss advanced topics such as - how to split decision trees, different splitting criteria, how to optimize the performance of decision trees, and more.

Happy Learning :-)

References -

image reference - img_ref

Wikipedia reference - wiki

Wednesday, March 10, 2021

Central Limit Theorem

The Central Limit theorem defines that the mean of the given sample is the same as the mean of the population (approx). No matter how bigger the population, we can infer the statistics of the population with the help of the sample.

Central Limit Theorem Definition:

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.

Central Limit Theorem Example:

Let us take an example to understand the concept of the Central Limit Theorem (CLT):

Suppose you want to calculate the average weight of the country. The first step in this would be to measure the height of all the people individually and then add them. Then, Divide the sum of their weights by the total number of people. This way we will get the average height. But this method will not make sense for long calculations as it would be tiresome and very long.

So, we will use CTL(Central Limit Theorem) to make the calculation easy. In this method, we will randomly pick peoples from different cities and make a sample. we will make the samples city-wise and each sample will include some peoples. Then, we will follow the following steps to solve it.

Take all these samples and find the mean.

Now, Find the mean of the sample means.

This way we will get the approximate mean height of the people in the country.

We will get a bell curve shape if we will find the histogram of these sample mean heights.

Central Limit Theorem Formula:

The central limit theorem is applicable for a sufficiently large sample size (n≥30). The formula for the central limit theorem can be stated as follows:

Where,

μ = Population mean

σ = Population standard deviation

μx = Sample mean

σx= Sample standard deviation

n = Sample size

Applications of Central Limit Theorem:

Statistical Applications

If the distribution is not known or not normal, we consider the sample distribution to be normal according to CTL. This method assumes that the population given is normally distributed. This helps in analyzing data in methods like constructing confidence intervals.

To estimate the population means more accurately, we can increase the samples taken from the population which will ultimately decrease the sample means deviation.

Practical Significance

One of the most common applications of CLT is in election polls. To calculate the percentage of persons supporting a candidate which are seen on news as confidence intervals.

It is also used to measure the mean or average family income of a family in a particular region.

Saturday, March 6, 2021

Generative Adversarial Networks (GANs)

Deep neural networks are used mainly for supervised learning: classification or regression. Generative Adversarial Networks or GANs, however, use neural networks for a very different purpose: Generative modeling.

Introduction to Generative Modeling

Generative modeling is an unsupervised learning task in machine learning that involves automatically discovering and learning the regularities or patterns in input data in such a way that the model can be used to generate or output new examples that plausibly could have been drawn from the original dataset.

While there are many approaches used for generative modeling, a Generative Adversarial Network takes the following approach:

There are two neural networks: a Generator and a Discriminator. The generator generates a "fake" sample given a random vector/matrix, and the discriminator attempts to detect whether a given sample is "real" (picked from the training data) or "fake" (generated by the generator). Training happens in tandem: we train the discriminator for a few epochs, then train the generator for a few epochs, and repeat. This way both the generator and the discriminator get better at doing their jobs.

GANs, however, can be notoriously difficult to train and are extremely sensitive to hyperparameters, activation functions, and regularization.

Discriminator Network

The discriminator takes an image as input, and tries to classify it as "real" or "generated". In this sense, it's like any other neural network. We'll use a convolutional neural network (CNN) which outputs a single number output for every image. We'll use a stride of 2 to progressively reduce the size of the output feature map.

Just like any other binary classification model, the output of the discriminator is a single number between 0 and 1, which can be interpreted as the probability of the input image being real i.e. picked from the original dataset.

Generator Network

The input to the generator is typically a vector or a matrix of random numbers (referred to as a latent tensor) which is used as a seed for generating an image. The generator will convert a latent tensor of shape (128, 1, 1) into an image tensor of shape 3 x 28 x 28. To achieve this, we'll use a transposed convolution (also referred to as a deconvolution).

In our next blog, I will show you the practical concept of GANs - Anime faces generation with the help of PyTorch.

Reference -

https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/

Friday, March 5, 2021

Design Thinking Process

Design Thinking --

Design thinking is a human-centered and collaborative approach to problem-solving that is creative, iterative, and practical. In this guide, we’ll give you a detailed definition of Design Thinking, illustrate exactly what the process involves, and underline why it matters: What is the value of Design Thinking, and in what contexts are it particularly useful.

What is Design Thinking?

Design Thinking is an approach used for practical and creative problem-solving. It is based heavily on the methods and processes that designers use (hence the name), but it has actually evolved from a range of different fields — including architecture, engineering, and business. Design Thinking can also be applied to any field; it doesn’t necessarily have to be design-specific.

What is Design Thinking process?

The Design Thinking process is progressive and highly user-centric. Before looking at the process in more detail, let’s consider the four principles of Design Thinking as laid out by Christoph Meinel and Harry Leifer of the Hasso-Plattner-Institute of Design at Stanford University, California.

The Four Principles of Design Thinking

The human rule: No matter what the context, all design activity is social in nature, and any social innovation will bring us back to the “human-centric point of view”.

The ambiguity rule: Ambiguity is inevitable, and it cannot be removed or oversimplified. Experimenting at the limits of your knowledge and ability is crucial in being able to see things differently.

The redesign rule: All design is redesign. While technology and social circumstances may change and evolve, basic human needs remain unchanged. We essentially only redesign the means of fulfilling these needs or reaching desired outcomes.

The tangibility rule: Making ideas tangible in the form of prototypes enables designers to communicate them more effectively.

Based on these four principles, the Design Thinking process can be broken down into five steps or phases, as per the aforementioned Hasso-Plattner-Institute of Design at Stanford (otherwise known as d.school): Empathize, Define, Ideate, Prototype, and Test.

Let’s explore each of these in more detail.

Empathize -

Empathy provides the critical starting point for Design Thinking. The first stage of the process is spent getting to know the user and understanding their wants, needs, and objectives.

In this stage, you should identify:

Customers’ Insights: The deep motivations that make them to behave or act as they do.

Customers’ Needs: What could be really useful for them?

Define -

The second stage in the Design Thinking process is dedicated to defining the problem. You’ll gather all of your findings from empathize phase and start to make sense of them.

In this stage you should identify:

List all the conclusions reached in the previous stage.

Start defining potential solutions for them.

Ideate -

With a solid understanding of your users and a clear problem statement in mind, it’s time to start working on potential solutions. The third phase in the Design Thinking process is where creativity happens, and it’s crucial to point out that the ideation stage is a judgment-free zone! Designers will hold ideation sessions in order to come up with as many new angles and ideas as possible. There are many different types of ideation techniques like brainstorming and mind mapping.

In this stage you should:

Start figuring out different approaches that could fulfill your customers’ requirements.

You should also ensure:

Being optimistic but realistic.

Don’t lose the focus on the customer.

Prototype -

The fourth step in the Design Thinking process is all about experimentation and turning ideas into tangible products. A prototype is basically a scaled-down version of the product which incorporates the potential solutions identified in the previous stages.

In this stage you should:

Build feasible final products for your customers.

You also must ensure:

Fulfill the customers’ requirements.

Guarantee a solid working final prototype.

Test -

After prototyping comes user testing, but it’s important to note that this is rarely the end of the Design Thinking process. In reality, the results of the testing phase will often lead you back to a previous step, providing the insights you need to redefine the original problem statement or to come up with new ideas you hadn’t thought of before.

In this stage you should:

Distribute your final prototype among potential customers and receive feedback.

Go back to a certain stage (if required by the customers) in order to improve the final product.

Purpose of Design Thinking -

Now we know more about how Design Thinking works, let’s consider why it matters. There are many benefits of using a Design Thinking approach — be it in a business, educational, personal or social context. First and foremost, Design Thinking fosters creativity and innovation. As human beings, we rely on the knowledge and experiences we have accumulated to inform our actions. We form patterns and habits that, while useful in certain situations, can limit our view of things when it comes to problem-solving. Rather than repeating the same tried-and-tested methods, Design Thinking encourages us to remove our blinkers and consider alternative solutions. The entire process lends itself to challenging assumptions and exploring new pathways and ideas. Another great benefit of Design Thinking is that it puts humans first. By focusing so heavily on empathy, it encourages businesses and organizations to consider the real people who use their products and services — meaning they are much more likely to hit the mark when it comes to creating meaningful user experiences. For the user, this means better, more useful products that actually improve our lives. For businesses, this means happy customers and a healthier bottom line.

“Wicked problem” in Design Thinking -

Design Thinking is especially useful when it comes to solving “wicked problems”. The term “wicked problem” was coined by design theorist Horst Rittel in the 1970s to describe particularly tricky problems that are highly ambiguous in nature. With wicked problems, there are many unknown factors; unlike “tame” problems, there is no definitive solution. In fact, solving one aspect of a wicked problem is likely to reveal or give rise to further challenges. Another key characteristic of wicked problems is that they have no stopping point; as the nature of the problem changes over time, so must the solution. Solving wicked problems is, therefore, an ongoing process that requires Design Thinking! Some examples of wicked problems in our society today include things like poverty, hunger and climate change.

Benefits of Design Thinking at work -

Integrating Design Thinking into your process can add huge business value, ultimately ensuring that the products Your design is not only desirable for customers but also viable in terms of company budget and resources.

Significantly reduces time-to-market: With its emphasis on problem-solving and finding viable solutions, Design Thinking can significantly reduce the amount of time spent on design and development.

Improves customer retention and loyalty: Design Thinking ensures a user-centric approach, which ultimately boosts user engagement and customer retention in the long term.

Fosters innovation: Design Thinking is all about challenging assumptions and established beliefs, encouraging all stakeholders to think outside the box. This fosters a culture of innovation that extends well beyond the design team.

Design thinking methodology in action: Case study –

Problem Statement - Executives at the Eye Hospital wanted to transform the patient experience from the typically grim, anxiety-riddled affair into something much more pleasant and personal. To do this, they incorporated Design Thinking and design principles into their planning process. Here’s how they did it:

Empathize –

First, they set out to understand their target user — patients entering the hospital for treatment. The hospital CEO, managers, staff, and doctors established that most patients came into the hospital with the fear of going blind.

Define

Based on their findings from the empathize stage, they determined that fear reduction needed to be a priority. Their problem statement may have looked something like the following: “Patients coming into our hospital need to feel comfortable and at ease.”

Ideate –

Armed with a deep understanding of their patients and a clear mission statement, they started to brainstorm potential solutions. As any good design thinker would, they sought inspiration from a range of both likely and unlikely sources.

Prototype –

In the prototyping stage, the team presented the most promising ideas they had come up with so far to those in charge of caregiving at the hospital. These teams of caregivers then used these insights to design informal, small-scale experiments that could test a potential solution and see if it was worthy of wide-scale adoption.

Test –

The testing phase consisted of running the aforementioned experiments and seeing if they took off.

The Outcome –

By adopting a Design Thinking approach, the Eye The hospital was able to get to the heart of their users’ needs and find effective solutions to fulfill them.

Data Science

Wednesday, March 24, 2021

Decision Trees Basics

Wednesday, March 10, 2021

Central Limit Theorem

Central Limit Theorem Definition:

Central Limit Theorem Example:

Central Limit Theorem Formula:

Applications of Central Limit Theorem:

Statistical Applications

Saturday, March 6, 2021

Generative Adversarial Networks (GANs)

Introduction to Generative Modeling

While there are many approaches used for generative modeling, a Generative Adversarial Network takes the following approach:

Discriminator Network

Generator Network

Friday, March 5, 2021

Design Thinking Process

Data Science for Marketing and Planning

Search This Blog