Hands-on Machine Learning with Scikit-Learn,Keras & TensorFlow

读书记录(缓慢更新)

Part 1. The Fundamentals of Machine Learning

The Content of The Machine Learning Landscape

Par t 1. Th e Fund a m enta ls of Mach ine Learnin g

The Content of The Mach ine Learning Lands cape

Part 1. The Fundamentals(fundament n.基础;臀部) of Mach ine Learning 机器学习的基础
1.The Mac h ine Learning Lands cape(n.景色;形势 v.对……做景观美化) 机器学习的前景
What Is Mac h ine Learning? 什么是机器学习
Wh y Use Mac h ine Learning? 为什么使用机器学习
Ty pe s of Mac hine Learning Sy ste ms 机器学习系统的类型
Supervi sed/Unsuper vi sed(super vi se v.监督) Learning 监督/无监督学习
Batch(n.一批 v.分批处理) and Online Learning 批处理和在线学习
Instance-Based Versu s(与) Mo del-Based Learning 基于实例与基于模型的学习
Main Chal lenges of Machine Learning 机器学习的主要挑战
Ins uf fi c ient(s uf ficient a.充足的) Quantity(n.数目;大量) of Training Data 训练数据不足
Non re p resentative(re p resent v.代表) Training Data 非代表性训练数据
Poor-Quality Data 低质量数据
Irrelevant(relevant a.相关的;正确的;适宜的;有价值的) Features 无关的特征
Overfitting(overfit n.过拟合) the Training Data 过拟合训练数据
Underfitting(underfit n.欠拟合) the Training Data 欠拟合训练数据
Step ping(step n.迈步;脚步;梯级;台阶;步骤;措施;阶段;进程 v.跨步走;(短距离)移动;行走) Back 退一步?
Testing and Vali dating(validate v.批准;证实;确认……有效) 测试和验证
Hy perparameter(parameter n.界限;范围;参数;变量) Tuning(tune n.曲调;歌曲 v.调整;校音) and Model Sel ection 超参数调优和模型选择
Data Mis match(match n.比赛;对手;配偶;婚姻 v.比得上;使相配) 数据不匹配
Exercis es

The Machine Learning Lands cape

With Earl y Release ebooks(n. 电子书), you get books in their earliest form-the author’s raw and unedited content as he or she writes–so you can take advantage of(take advantage of… 利用…) these techno logies long before the off icial release of these titles. The follo wing will be Chapter 1 in the final release of the book.

When most pe ople hear “Machine Learning,” they pi cture(n. 图片;绘画;照片;肖像 v.想象;绘画;拍摄) a robot: a de p end able bu tler(n. 管家) or a deadly Terminator(终结者) de p ending on who you ask. But Machine Learning is not just a futuristic(a. 未来主义的) fantasy, it’s already here. In fa ct, it has been around for deca des in some spe ci ali zed applications(n. 申请书;应用;程序), such as Optical Character Recognition(OCR)(光学字符识别). But the first ML application that really became main stream(n.主流 a.主流的 v.使主流化), imp ro ving the lives of hundred sof milli ons of pe ople, too k over(take over 接管;控制) the world back in the 1990s: it was the spam filter(垃圾邮件过滤器 spam n.垃圾邮件 v.向..群发垃圾邮件 filter n.过滤器;滤光器;滤声器;滤波器;过滤程序 v. 过滤;渗入;透过).Not exa ctly a self-aware(a. 有自我意识的) Skynet(天网？框架), but it does technically qualify as Machine Learning(it has act ually learned so well that you seldom need to flag an email as spam any more)(但从技术上讲，它在技术上符合机器学习(它实际上已经学得很好了，你几乎不需要把电子邮件标记为垃圾邮件了). It was followed by(followed by 后面有;接着是) hundreds of ML applications that now quietly power(驱动) hundreds of products and features that you use reg ularly(regular a. 常规的 n. 常客), from better recommendations(recommend v. 建议;劝告;推荐;介绍) to voice search(接着是数百个机器学习应用程序，这些应用程序现在悄悄地为您经常使用的数百种产品和功能提供支持，从更好的推荐到语音搜索).

Where does Machine Learning start and where does it end? What exa ctly does it mean for a machine to learn something? If I download a copy of Wik ip edia(维基百科), has my com puter really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying(clarify v. 澄清;阐明) what Machine Learning is and why you may want to use it.

Then, before we set out(出发) to exp lore the Machine Learning continent(n. 大陆;洲), we will take alook at the map and learn ab out the main regions(n. 地区;地域;领域;身体部位) and the most notable(a. 显要的;值得注意的 n.显要人物;名流) lan dma rks(n. 地标;里程碑;转折点):super vi sed versus(prep. 与……相比;以……为对手) unsuper vi sed learning, online versus batch learning, instance-based versus model–based learning. Then we will look at the work flow(n. 工作流程) of a typical ML project, discu ss the main chal lenges you may face, and cover how to evaluate and fine-tune(调好？微调) a Machine Learning system.

This chapter int roduces a lot of fundamental concepts (and jargon(n. 专业术语)) that every data scientist should know by heart. It will be a high-level overview(n/v. 概述;综述) (the only chapter wi thout much code), all rather simple, but you should make sure everything is crystal-clear(a. 非常清楚的 crystal n.晶体;水晶 a.晶莹的;清澈透明的) to you before continuing to the rest(n/v. 休息 n. 剩余部分) of the book. So grab(n/v. 抓住) a coffee and let’s get started!

If you already know all the Machine Learning basics, you may want to skip(v. 跳过 n.蹦跳) directly(direct a. 直接的;径直的;坦率的 v. 给……指路;指引;引导;导演；指示;命令) to Chapter 2. If you are not sure, try to answer all the questions listed at the end of the chapter before moving on.

What Is Machine Learning?

Machine Learning is the science (and art) of program ming com puters so they can learn from data.

Here is a slightly(slight a. 轻微的;少量的 v. 怠慢;轻视 n. 冒犯;冷落) more general(a. 普遍的;一般的;常规的;大概的) definit ion(define v. 给……下定义，解释；阐明):

[Machine Learning is the] field of study that gives com puters the abi lity to learn wi thout being exp li citly programmed. —Arthur Samuel, 1959

And a more engineering-oriented(面向工程的 orient v. 朝向;面对;确定方位) one: A com puter program is said to learn from ex perience E with resp ect to some task T and some performance measure P, if its performance on T, as measu red by P, improves with experience E. —Tom Mitchell, 1997

For example, your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails. The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample). In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio(n. 比率;比例) of correctly classified emails. This particular(a. 特定的 n. 详细资料) performance(n. 表演;表现;性能 a. 高性能的) measure(n. 措施;办法;度量单位 v. 测量;估量;记录) is called accuracy(n. 准确性;准确) and it is often used in classification tasks.

If you just download a copy of Wik ip edia, your com puter has a lot more data, but it is not suddenly better at any task. Thus, it is not Machine Learning.

Why Use Machine Learning?

Consider how you would write a spam filter using traditional program ming techni‐ ques (Figure(n. 数字;数目;身材;图形;价格 v. 估计;理解;计算;用图画想象) 1-1):

1. First you would look at what spam typically looks like. You might notice that some words or phrases(phrase n. 短语;词组;惯用语;习语 v. 用……方式表达;以……措辞表达) (such as “4U,” “credit(n. 信用;信贷;赞扬;信誉;声望;余额;补助;学分 v. 把钱存入(账户);相信) card,” “free,” and “amazing”) tend to come up(接近;出现;到达) a lot in the subject(n. 主题;话题;学科;科目;课程 v. 使臣服;征服 a. 隶属的，臣服的). Perhaps you would also notice a few other patterns(pattern n. 模式;模型;样品 v. 用图案装饰;给……加上花样;模仿) in the sender’s(sender n. 发送人) name, the email’s body(电子邮件正文), and so on.

2. You would write a detection algorithm(检测算法 detect v. 察觉;检测;识别) for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected.

3. You would test your program, and repeat steps 1 and 2 until it is good enough.

Since the problem is not trivial, your program will likely become a long list of com‐ plex rules—pretty hard to maint ain.

In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good pr edi ctors of spam by detecting unusually fre‐ quent patterns of words in the spam examples co mpared to the ham examples (Figure 1-2). The program is much shorter, easier to maint ain, and most likely more accurate.

Moreover, if spammers notice that all their em ails containing “4U” are blocked, they might start writing “For U” instead. A spam filter using traditional program ming techniques would need to be updated to flag “For U” em ails. If spammers keep work‐ ing around your spam filter, you will need to keep writing new rules forever.

In contrast, a spam filter based on Machine Learning techniques automatically noti‐ ces that “For U” has become unusually frequent in spam flagged by users, and it starts flag ging them without your intervention (Figure 1-3).