σc,i1exp(−2σc,i2(xi−μc,i2))(6)
对于样本 Z = (Height = Short, Hair = blond, Eye = brown) 而言,先计算先验概率
P
(
C
1
)
P(rm C1)
P(C1) 和
P
(
C
2
)
P(rm C2)
P(C2),
P
(
C
1
)
=
5
9
P(C1)=95,
P
(
C
2
)
=
4
9
P
(
H
e
i
h
=
S
h
o
r
t
∣
C
1
)
=
2
5
P({rm Height = Short} mid {rm C1}) = frac{2}{5}
P
(
H
e
i
g
h
t
=
S
h
o
r
t
∣
C
2
)
=
1
4
P({rm Height = Short} mid {rm C2}) = frac{1}{4}
P(Height=Short∣C2)=41;针对属性 Hair,
P
(
H
i
r
=
l
o
n
d
∣
C
1
)
=
2
5
P({rm Hair = blond}mid {rm C1}) = frac{2}{5}
P
(
H
a
i
r
=
l
o
n
d
∣
C
2
)
=
1
2
P({rm Hair = blond}mid {rm C2}) = frac{1}{2}
P(Hair=blond∣C2)=21;针对属性 Eye,
P
(
E
y
e
=
r
o
w
n
∣
C
1
)
=
3
5
P({rm Eye = brown}mid {rm C1}) = frac{3}{5}
P(Eye=brown∣C1)=53,
P
(
E
y
e
=
b
r
o
w
n
∣
C
2
)
=
0
P({rm Eye= brown}mid {rm C2}) = 0
P(Eye=brown∣C2)=0。
因此,
P
(
C
1
∣
Z
)
=
P
(
C
1
)
P
(
H
e
i
g
h
t
=
S
h
o
r
t
∣
C
1
)
P
(
H
a
i
r
=
b
l
o
n
d
∣
C
1
)
P
(
E
y
e
=
B
r
o
w
n
∣
C
1
)
=
0.0533
P({rm C1}mid {rm Z}) = P({rm C1})P({rm Height = Short} mid {rm C1})P({rm Hair = blond} mid {rm C1})P({rm Eye = Brown} mid {rm C1}) = 0.0533
P(C1∣Z)=P(C1)P(Height=Short∣C1)P(Hair=blond∣C1)P(Eye=Brown∣C1)=0.0533;
P
(
C
2
∣
Z
)
=
P
(
C
2
)
P
(
H
e
i
g
h
t
=
S
h
o
r
t
∣
C
2
)
P
(
H
a
i
r
=
b
l
o
n
d
∣
C
2
)
P
(
E
y
e
=
B
r
o
w
n
∣
C
2
)
=
0
P({rm C2}mid {rm Z}) = P({rm C2})P({rm Height = Short} mid {rm C2})P({rm Hair = blond} mid {rm C2})P({rm Eye = Brown} mid {rm C2}) = 0
P(C2∣Z)=P(C2)P(Height=Short∣C2)P(Hair=blond∣C2)P(Eye=Brown∣C2)=0;
在不考虑平滑的前提下,
P
(
E
y
e
=
b
r
o
w
n
∣
C
2
)
=
0
P({rm Eye= brown}mid {rm C2}) = 0
P(Eye=brown∣C2)=0 导致
P
(
C
2
∣
Z
)
P(rm C2mid Z)
P(C2∣Z) 为
0
0
0。所以样本 Z 被分类为 C1。
Lab Part
假设一家超市想推销意大利面。使用“Transactions.txt”中的数据作为训练数据来构建基于 C5.0 算法的决策树模型,以预测客户是否会购买意大利面。
1. 使用数据集 “Transactions.txt” 构建决策树,利用其它字段来预测 “pasta” 字段。使用 Field Ops 中的 Type 模块,将除了 COD 字段外的每个字段的 “type” 设置为 “Flag”,将 COD 字段的 “type“ 设置为 “Typeless”,将 “pasta” 字段的 “direction” 属性设置为 “out”。使用 Modeling 中的 C5.0 模块,选择 “Expert” 并将 “Pruning severity” 设置为
65
65
65,将 “Minimum records per child branch” 设置为
95
95
95。
图
5
5
5 为 Clementine 的使用截图。使用数据集 “Transaction.txt” 构建的决策树如图
6
6
6 所示。
图 6 决策树
2. 使用上面创建好的模型对 “rollout.txt” 数据中的
20
20
20 位客户中的每一位进行预测,以确定客户是否会购买意大利面。
图
7
7
7 和图
8
8
8 分别展示了数据类型配置和对 “rollout.txt” 的预测结果。
tomato souce = 1 [ Mode: 1 ]
tunny = 1 [ Mode: 1 ] => 1
tunny = 0 [ Mode: 1 ]
rice = 1 [ Mode: 1 ] => 1
rice = 0 [ Mode: 0 ]
brioches = 1 [ Mode: 1 ] => 1
brioches = 0 [ Mode: 0 ]
frozen vegetables = 1 [ Mode: 1 ] => 1
frozen vegetables = 0 [ Mode: 0 ]
coffee = 1 [ Mode: 1 ] => 1
coffee = 0 [ Mode: 0 ] => 0
tomato souce = 0 [ Mode: 0 ]
rice = 1 [ Mode: 0 ]
coffee = 1 [ Mode: 1 ] => 1
coffee = 0 [ Mode: 0 ]
biscuits = 1 [ Mode: 1 ] => 1
biscuits = 0 [ Mode: 0 ]
coke = 1 [ Mode: 1 ] => 1
coke = 0 [ Mode: 0 ] => 0
rice = 0 [ Mode: 0 ]
tunny = 1 [ Mode: 0 ] => 0
tunny = 0 [ Mode: 0 ]
oil = 1 [ Mode: 0 ] => 0
oil = 0 [ Mode: 0 ]
water = 1 [ Mode: 0 ] => 0
water = 0 [ Mode: 0 ]
milk = 1 [ Mode: 0 ] => 0
milk = 0 [ Mode: 0 ]
yoghurt = 1 [ Mode: 0 ] => 0
yoghurt = 0 [ Mode: 0 ]
coke = 1 [ Mode: 0 ] => 0
coke = 0 [ Mode: 0 ]
biscuits = 1 [ Mode: 0 ] => 0
biscuits = 0 [ Mode: 0 ]
brioches = 1 [ Mode: 0 ] => 0
brioches = 0 [ Mode: 1 ]
coffee = 1 [ Mode: 0 ] => 0
coffee = 0 [ Mode: 1 ]
frozen vegetables = 1 [ Mode: 0 ] => 0
frozen vegetables = 0 [ Mode: 1 ]
beer = 1 [ Mode: 0 ] => 0
beer = 0 [ Mode: 1 ]
juices = 1 [ Mode: 0 ] => 0
juices = 0 [ Mode: 1 ]
mozzarella = 1 [ Mode: 0 ] => 0
mozzarella = 0 [ Mode: 1 ]
crackers = 1 [ Mode: 0 ] => 0
crackers = 0 [ Mode: 1 ]
frozen fish = 1 [ Mode: 0 ] => 0
frozen fish = 0 [ Mode: 1 ] => 1
通过对某在线培训系统的标注数据集进行建模,预测其它会员期末考试的结果。数据集来自在线培训系统的日志,数据包括每个会员的在线学习行为。请尝试多种不同的模型、不同的参数,建立高质量的预测模型。
训练集有
873
873
461
461
人员 ID | 在线总时长(分钟) | 在线阅读时长(分钟) | 在线测试时长(分钟) | 全文阅读次数 | 智能阅读次数 | 知识点阅读次数 | 试题阅读次数 |
回溯原文次数 | 题库测试次数 | 仿真考试次数 | 仿真考试优秀次数 | 仿真考试良好次数 | 仿真考试合格次数 | 仿真考试不合格次数 | Class |
1. 对训练数据集进行决策树分类。将除 “人员 ID” 之外的字段设置为输入。将 “Class” 的 “direction” 设置为 “out”,“type” 设置为 ”Flag“。自定义 “pruning severity” 和 “minimum records per child branch”,然后勾选“use global pruning”。
9
9
9 所示。其中,PS 为 pruning severity,MRPCB 为 minimum records per child branch。可见,“最佳”参数组合为 PS=5,MRPCB=5。
10
10
10 所示。
11
11
11 所示。
对比 PS=5, MRPCB=5 的决策树、默认设置的神经网络和逻辑回归模型,决策树在准确率(accuracy)、召回率(recall)还是精度(precision)指标上的效果均优于其它两个模型,但这并不意味着决策树模型更适合这个数据集。
REF
原文地址:https://blog.csdn.net/weixin_46221946/article/details/134693284
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.7code.cn/show_11643.html
如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除!