读书记录(缓慢更新)

目录

Part 1. The Fundamentals of Machine Learning

The Content of The Machine Learning Landscape

The Machine Learning Landscape


Part 1. The Fundamentals of Machine Learning

The Content of The Machine Learning Landscape

Part 1. The Fundamentals(fundament n.基础;臀部) of Machine Learning 机器学习的基础
1.The Machine Learning Landscape(n.景色;形势 v.对……做景观美化) 机器学习的前景
What Is Machine Learning? 什么机器学习
Why Use Machine Learning? 为什么使用机器学习
Types of Machine Learning Systems 机器学习系统类型
  Supervised/Unsupervised(supervise v.监督) Learning 监督/无监督学习
  Batch(n.一批 v.批处理) and Online Learning 批处理在线学习
  Instance-Based Versus(与) Model-Based Learning 基于实例基于模型学习
Main Challenges of Machine Learning 机器学习的主要挑战
  Insufficient(sufficient a.充足的) Quantity(n.数目;大量) of Training Data 训练数据不足
  Nonrepresentative(represent v.代表) Training Data  非代表性训练数据
  Poor-Quality Data  低质量数据
  Irrelevant(relevant a.相关;正确的;适宜的;有价值的) Features  无关的特征
  Overfitting(overfit n.过拟合) the Training Data 过拟合训练数据
  Underfitting(underfit n.欠拟合) the Training Data 欠拟合训练数据
  Stepping(step n.迈步;脚步;梯级;台阶;步骤;措施;阶段;进程 v.跨步走;(短距离)移动;行走) Back 退一步
Testing and Validating(validate v.批准;证实;确认……有效) 测试验证
  Hyperparameter(parameter n.界限;范围;参数;变量) Tuning(tune n.曲调;歌曲 v.调整;校音) and Model Selection 超参数调优模型选择
  Data Mismatch(match n.比赛;对手;配偶;婚姻 v.比得上;使相配)  数据不匹配
Exercises

The Machine Learning Landscape

  With Early Release ebooks(n. 电子书), you get books in their earliest form-the author’s raw and unedited content as he or she writes–so you can take advantage of(take advantage of… 利用…) these technologies long before the official release of these titles. The following will be Chapter 1 in the final release of the book.

  When most people hear “Machine Learning,” they picture(n. 图片;绘画;照片;肖像 v.想象;绘画;拍摄) a robot: a dependable butler(n. 管家) or a deadly Terminator(终结者) depending on who you ask. But Machine Learning is not just a futuristic(a. 未来主义的) fantasy, it’s already here. In fact, it has been around for decades in some specialized applications(n. 申请书;应用;程序), such as Optical Character Recognition(OCR)(光学字符识别). But the first ML application that really became mainstream(n.主流 a.主流的 v.使主流化), improving the lives of hundredsof millions of people, took over(take over 接管;控制) the world back in the 1990s: it was the spam filter(垃圾邮件过滤器 spam n.垃圾邮件 v.向..群发垃圾邮件 filter n.过滤器;滤光器;滤声器;滤波器;过滤程序 v. 过滤;渗入;透过).Not exactly a self-aware(a. 有自我意识的) Skynet(天网 ?框架), but it does technically qualify as Machine Learning(it has actually learned so well that you seldom need to flag an email as spam anymore)(但从技术上讲,它在技术上符合机器学习(它实际上已经学得很好了,你几乎不需要电子邮件标记为垃圾邮件了). It was followed by(followed by 后面有;接着是) hundreds of ML applications that now quietly power(驱动) hundreds of products and features that you use regularly(regular a. 常规的 n. 常客), from better recommendations(recommend v. 建议;劝告;推荐;介绍) to voice search(接着是数百个机器学习应用程序,这些应用程序现在悄悄地为您经常使用的数百种产品和功能提供支持,从更好的推荐到语音搜索).

  Where does Machine Learning start and where does it end? What exactly does it mean for a machine to learn something? If I download a copy of Wikipedia(维基百科), has my computer really “learned” something? Is it suddenly smarter? In this chapter we will start by clarifying(clarify v. 澄清;阐明) what Machine Learning is and why you may want to use it.

  Then, before we set out(出发) to explore the Machine Learning continent(n. 大陆;洲), we will take alook at the map and learn about the main regions(n. 地区;地域;领域;身体部位) and the most notable(a. 显要的;值得注意的 n.显要人物;名流) landmarks(n. 地标;里程碑;转折点):supervised versus(prep. 与……相比;以……为对手) unsupervised learning, online versus batch learning, instance-based versus modelbased learning. Then we will look at the workflow(n. 工作流程) of a typical ML project, discuss the main challenges you may face, and cover how to evaluate and fine-tune(调好 ?微调) a Machine Learning system.

  This chapter introduces a lot of fundamental concepts (and jargon(n. 专业术语)) that every data scientist should know by heart. It will be a high-level overview(n/v. 概述;综述) (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear(a. 非常清楚的 crystal n.晶体;水晶 a.晶莹的;清澈透明的) to you before continuing to the rest(n/v. 休息  n. 剩余部分) of the book. So grab(n/v. 抓住) a coffee and let’s get started!

  If you already know all the Machine Learning basics, you may want to skip(v. 跳过 n.蹦跳) directly(direct a. 直接的;径直的;坦率的 v. 给……指路;指引;引导;导演;指示;命令) to Chapter 2. If you are not sure, try to answer all the questions listed at the end of the chapter before moving on.

What Is Machine Learning? 

  Machine Learning is the science (and art) of programming computers so they can learn from data.

Here is a slightly(slight a. 轻微的;少量的 v. 怠慢;轻视 n. 冒犯;冷落) more general(a. 普遍的;一般的;常规的;大概的) definition(define v. 给……下定义解释;阐明):

  [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959

  And a more engineering-oriented(面向工程 orient v. 朝向;面对;确定方位) one: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997

  For example, your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails. The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample). In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio(n. 比率;比例) of correctly classified emails. This particular(a. 特定的 n. 详细资料) performance(n. 表演;表现;性能 a. 高性能的) measure(n. 措施;办法;度量单位 v. 测量;估量;记录) is called accuracy(n. 准确性;准确) and it is often used in classification tasks.

  If you just download a copy of Wikipedia, your computer has a lot more data, but it is not suddenly better at any task. Thus, it is not Machine Learning.

Why Use Machine Learning?

  Consider how you would write a spam filter using traditional programming techni‐ ques (Figure(n. 数字;数目;身材;图形;价格 v. 估计;理解;计算;用图画想象) 1-1):

1. First you would look at what spam typically looks like. You might notice that some words or phrases(phrase n. 短语;词组;惯用语;习语 v. 用……方式表达;以……措辞表达) (such as “4U,” “credit(n. 信用;信贷;赞扬;信誉;声望;余额;补助;学分 v. 把钱存入(账户);相信) card,” “free,” and “amazing”) tend to come up(接近;出现;到达) a lot in the subject(n. 主题;话题;学科;科目;课程 v. 使臣服;征服 a. 隶属的,臣服的). Perhaps you would also notice a few other patterns(pattern n. 模式;模型;样品 v. 用图案装饰;给……加上花样;模仿) in the sender’s(sender n. 发送人) name, the email’s body(电子邮件正文), and so on.

2. You would write a detection algorithm(检测算法 detect v. 察觉;检测;识别) for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected.

3. You would test your program, and repeat steps 1 and 2 until it is good enough.

  Since the problem is not trivial, your program will likely become a long list of com‐ plex rulespretty hard to maintain. 

  In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually fre‐ quent patterns of words in the spam examples compared to the ham examples (Figure 1-2). The program is much shorter, easier to maintain, and most likely more accurate.

  Moreover, if spammers notice that all their emails containing “4U” are blocked, they might start writing “For U” instead. A spam filter using traditional programming techniques would need to be updated to flag “For U” emails. If spammers keep work‐ ing around your spam filter, you will need to keep writing new rules forever. 

  In contrast, a spam filter based on Machine Learning techniques automatically noti‐ ces that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention (Figure 1-3).

原文地址:https://blog.csdn.net/naozibuok/article/details/134644668

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任

如若转载,请注明出处:http://www.7code.cn/show_20798.html

如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱:suwngjj01@126.com进行投诉反馈,一经查实,立即删除

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注