# NaiveBayesian

An implementation of a simple machine learning algorithm–Naive Bayesian algorithm in order to judge if a given list of string is abusive or not.

## Usage:

This program is for the specific use of my android diary app Whisper,which is a private diary highlighting a function of judging if a diary is positive or not and giving the corresponding feedback.
The diary app “Whisper” on Github
Codes on Whisper will fall a little behind this one.

## Theory foundation:

Naive Bayes classifier is a simple and effective classify method based on Bayes theorem.

### Bayes theorem:

P(A|B) = P(B|A) P(A) / P(B)


Now let’s see A as a specific category C from a collection C1、C2、…、Cm,and see B as a combination of n features F1、F2、…Fn of a certain individual.
What we want to do is to deduct which class an individual belongs from his/her set of features.We now know how to apply Bayes theorem to classifying procedure,which is to calculate the maximum value of the expression below:

P(C|F1F2...Fn) = P(F1F2...Fn|C)P(C) / P(F1F2...Fn)


From which P(F1F2…Fn) can be left out,for it has the same value when C changes.The question deteriorate into calculate the maximum of:

P(F1F2...Fn|C)P(C)


### Naive Bayes theorem:

If we take a step further and assume all n features are independent,we are using Naive Bayesian theorem,and the question is simplified again:

P(F1F2...Fn|C)P(C) = P(F1|C)P(F2|C) ... P(Fn|C)P(C)


We can get every P on the right side of the equation from our training set,and the problem is solved.

## Key points and codes:

### Train vectors to feature vector:

We used a function called laplace here, and we also set the denominator to a certain value in case the value is too small and cause unpredicted situation.

### Classify the given sentence:

for (int x = 0; x < testVec.size(); x++) {
p0 += p0Vec.get(x) * testVec.get(x);
p1 += p1Vec.get(x) * testVec.get(x);

}
p1 += Math.log(pAbusive);
p0 += Math.log(1 - pAbusive);
if(p0 > -0.4 || p1>-1.4){
System.out.println("Neutral Words!");
}
else if (p1 > p0) {
System.out.println("Positive Words!");
}
else if(p0 > p1) {
System.out.println("Negative Words!");
}


Maybe you noticed that we used logarithm adding instead of probability multiply,FYI that’s to avoid overflow because the probability may be small.

### No need to train everytime

I also append a store function so that we don’t need to train the raw data everytime,when there are history dictionary and feature vector,just parse the txt file and use it,making it a lot faster and convinient.

## See the output:

Take a simple sentence “Sad my friend is gone” for example,the algorithm will use the dictioary to convert this sentence to a vector and calculate p0 and p1 using the feature vector.And this is the console output:

p0=-29.07335725097412   p1=-34.40233568665568   pA=0.421406667326145
Negative Words!