Xg boost for multilabel classification?
Is it possible to use xgboost for multilabel classification? Now I use OneVsRestClassifier over GradientBoostingClassifier from sklearn. It works, but use only one core from my CPU. In my data I have ~45 features and the task is to predict about 20 columns with binary(boolean) data. Metric is mean average precision (map@7). If you have a short example of code it will be grate...
There are a couple of ways to do that, one of which is the one you already suggested:
1.
from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
# If you want to avoid the OneVsRestClassifier magic switch
# from sklearn.multioutput import MultiOutputClassifier
clf_multilabel = OneVsRestClassifier(XGBClassifier(**params))
clf_multilabel
will fit one binary classifier per class, and it will use however many cores you specify in params
(fyi, you can also specify n_jobs
in OneVsRestClassifier
, but that eats up more memory).
2. If you first massage your data a little by making k
copies of every data point that has k
correct labels, you can hack your way to a simpler multiclass problem. At that point, just
clf = XGBClassifier(**params)
clf.fit(train_data)
pred_proba = clf.predict_proba(test_data)
to get classification margins/probabilities for each class and decide what threshold you want for predicting a label. Note that this solution is not exact: if a product has tags (1, 2, 3)
, you artificially introduce two negative samples for each class.
上一篇: 应该在函数组件中更新组件
下一篇: Xg boost用于多标记分类?