Basic concepts: Naive Bayes algorithm for classification

I think I understand Naive Bayes more or less, but I have a few questions regarding its implementation for a simple binary text classification tast.

Let's say that document D_i is some subset of the vocabulary x_1, x_2, ...x_n

There are two classes c_i any document can fall on, and I want to compute P(c_i|D) for some input document D which is proportional to P(D|c_i)P(c_i)

I have three questions

  • P(c_i) is #docs in c_i/ #total docs or #words in c_i/ #total words
  • Should P(x_j|c_i) be the #times x_j appears in D/ #times x_j appears in c_i
  • Suppose an x_j doesn't exist in the training set, do I give it a probability of 1 so that it doesn't alter the calculations?
  • For example, let us say that I have a training set of one:

    training = [("hello world", "good")
                ("bye world", "bad")]
    

    so the classes would have

    good_class = {"hello": 1, "world": 1}
    bad_class = {"bye":1, "world:1"}
    all = {"hello": 1, "world": 2, "bye":1}
    

    so now if I want to compute probability of a test string being good

    test1 = ["hello", "again"]
    p_good = sum(good_class.values())/sum(all.values())
    p_hello_good = good_class["hello"]/all["hello"]
    p_again_good = 1 # because "again" doesn't exist in our training set
    
    p_test1_good = p_good * p_hello_good * p_again_good
    

    As this question is too broad so I can only answer in a limiting way:-

    1st:- P(c_i) is #docs in c_i/ #total docs or #words in c_i/ #total words

    P(c_i) = #c_i/#total docs
    

    2nd:- Should P(x_j|c_i) be the #times x_j appears in D/ #times x_j appears in c_i.
    After @larsmans noticed..

    It is exactly occurrence of word in a document
    by total number of words in that class in whole dataset.
    

    3rd:- Suppose an x_j doesn't exist in the training set, do I give it a probability of 1 so that it doesn't alter the calculations?

    For That we have laplace correction or Additive smoothing. It is applied on
    p(x_j|c_i)=(#times x_j appears in D+1)/ (#times x_j +|V|) which will neutralize
    the effect not occurring features.
    
    链接地址: http://www.djcxy.com/p/40158.html

    上一篇: 朴素贝叶斯文本分类算法

    下一篇: 基本概念:朴素贝叶斯算法进行分类