博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
xgb+lr
阅读量:5288 次
发布时间:2019-06-14

本文共 2465 字,大约阅读时间需要 8 分钟。

#-*- coding: utf-8 -*-

from sklearn.ensemble import GradientBoostingClassifier

import numpy as np

from sklearn.preprocessing import OneHotEncoder

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix

from data_deal import data_deal

import xgboost as xgb

from sklearn.externals import joblib

from sklearn.model_selection import train_test_split

 

#data = np.random.rand(5000,10)

 

#label = np.random.randint(2, size=5000)

 

data, label = data_deal()

 

x_train,x_test,y_train,y_test = train_test_split(data,label,test_size=0.3,random_state=0)

 

#data = data[:100000]

#label = label[:100000]

'''

gbdt_model = GradientBoostingClassifier(n_estimators = 100)

 

gbdt_model.fit(data, label)

 

p = gbdt_model.apply(data)

'''

 

dtrain = xgb.DMatrix(x_train, label=y_train)

  

#dtest = xgb.DMatrix(data)

 

dtrain_x = xgb.DMatrix(x_train)

 

param={'booster':'gbtree',

    'objective': 'binary:logistic',

    'eval_metric': 'auc',

    'max_depth':5,

    'lambda':10,

    'subsample':0.8,

    'colsample_bytree':0.8,

    'min_child_weight':10,

    'eta': 0.1,

    'seed':0,

    'nthread':8,

     'silent':1}

 

evallist  = [(dtrain,'train')]

 

num_round = 300

bst = xgb.train(param, dtrain, num_round, evallist)

 

bst.save_model('xgb_test.model')

 

#bst.load_model('xgb_test.model')

 

p = bst.predict(dtrain_x, pred_leaf=True)

 

one_hot_encoder = OneHotEncoder()

 

one_hot_encoder.fit(p)

 

joblib.dump(one_hot_encoder, "one_hot_encoder.model")

 

one_hot_encoder_feature = one_hot_encoder.transform(p).toarray()

 

lr_model = LogisticRegression()

 

lr_model.fit(one_hot_encoder_feature, y_train)

 

joblib.dump(lr_model, "lr_test.model")

 

predict_label = lr_model.predict(one_hot_encoder_feature)

 

print 'train:', roc_auc_score(y_train, predict_label)

print 'train:', classification_report(y_train, predict_label)

print 'train:', confusion_matrix(y_train, predict_label)

 

'''

 

dtest_x = xgb.DMatrix(x_test)

 

bst_load = xgb.Booster({'nthread': 8})  # init model

bst_load.load_model('xgb_test.model')

 

p = bst_load.predict(dtest_x, pred_leaf=True)

 

one_hot_encoder_load = joblib.load('one_hot_encoder.model')

 

one_hot_encoder_feature = one_hot_encoder_load.transform(p).toarray()

 

lr_model_load = joblib.load('lr_test.model')

 

predict_label = lr_model_load.predict(one_hot_encoder_feature)

 

print 'test:', roc_auc_score(y_test, predict_label)

print 'test:', classification_report(y_test, predict_label)

print 'test:', confusion_matrix(y_test, predict_label)

'''

 

转载于:https://www.cnblogs.com/kayy/p/9429477.html

你可能感兴趣的文章
01.C#数据类型、排序、过滤(一章1.1-1.2)
查看>>
C++(笔)002
查看>>
js css3实现钟表效果
查看>>
Poj2795Exploring PyramidsDp
查看>>
Js实现截图功能
查看>>
display:none和visibility:hidden的区别
查看>>
web标准
查看>>
面向过程,面向对象各自优缺点
查看>>
How.To.Process.Image.Infomation.Of.Rotate.And.Flip.From.Server
查看>>
【HTML5】Web存储
查看>>
js实现拖动div,兼容IE、FireFox,暂不兼容Chrome
查看>>
把Visionpro的控件如何加载到VS中去
查看>>
kruskal重构树学习笔记
查看>>
python基础部分----基本数据类型
查看>>
Java的MVC模式简介
查看>>
预备作业01:我所期望的师生关系
查看>>
第八天学习内容 集合
查看>>
1.3 java8新特性总结
查看>>
outline和border区别
查看>>
菜鸟系列k8s——k8s集群部署(2)
查看>>