We went over the basics of probability and how we can apply simple theorems to complex tasks.
In this chapter, we will explore more complicated theorems of probability and how we can use them in a predictive capacity.
Advanced topics, such as Bayes theorem and random variables, give rise to the common machine learning algorithm. This chapter will focus on more advance topics in probability theory, including the following topics of Naive Bayes theorem:
Naive Bayes theorem
Bayes theorem is the big result of Bayesian inference. Let’s see how it even comes about. Recall that we previously defined the following:
- P(A) = The probability that event A occurs
- P(A|B) = The probability that A occurs, given that B occurred
- P(A, B) = The probability that A and B occurs
- P(A, B) = P(A) * P(B|A)
That last bullet can be read as the probability that A and B occur is the probability that A occurs times the probability that B occurred, given that A already occurred.
It’s from that last bullet point that Bayes theorem takes its shape.
We know that:
P(A, B) = P(A) * P(B|A)
P(B, A) = P(B) * P(A|B)
P(A, B) = P(B, A)
P(B) * P(A|B) = P(A) * P(B|A)
Dividing both sides by P(B) gives us Bayes theorem, as shown:
How Naive Bayes Theorem Works?
Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable ‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather conditions. Let’s follow the below steps to perform it.
Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing is 0.64.
Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of the prediction.
Problem: Players will play if the weather is sunny. Is this statement is correct?
We can solve it using the above-discussed method of posterior probability.
P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P( Yes)= 9/14 = 0.64
Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher probability.
Naive Bayes uses a similar method to predict the probability of different classes based on various attributes. This algorithm is mostly used in text classification and with problems having multiple classes.
I hope you learned something today. Feel free to leave a message if you have any feedback, and share it with anyone that might find this useful.
See this pose:
- Sinan Ozdemir-Principles of Data Science (Packt)