20180702 MARTES

Mathematics

I use Online LaTeX Equation Editor to write formulas in LATEX.

结构风险定义

\[R_{srm}(f)=\frac{1}{N}\sum_{i=1}^{N}L(y_{i},f(x_{i}))+\lambda J(f)\]

训练数据集的平均损失

\[R_{emp}(\widehat{f}) = \frac{1}{N}\sum_{i=1}^{N}L(y_{i},\widehat{f}(x_{i}))\]

测试误差

\[e_{test} = \frac{1}{N^{'}}\sum_{i=1}^{N^{'}}L(y_{i},\widehat{f}(x_{i}))\]

公司安排学习机器学习的知识。这周开始，学习《统计学习方法》。这里记录一下学习中的公式。

GitHub 对 AsciiDoctor 的支持不好，这块的公式只能显示出原始内容。

Algorithm

Median of Two Sorted Arrays - LeetCode

There are two sorted arrays nums1 and nums2 of size m and n respectively.

Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).

Unresolved include directive in modules/ROOT/pages/20180702.adoc - include::https://raw.githubusercontent.com/diguage/leetcode/master/src/main/java/com/diguage/algorithm/leetcode/MedianOfTwoSortedArrays.java[]

Review

I read the Raft paper this week. My notes on the paper is followings:

Abstract

Raft is a consensus algorithm for managing a replicated log.

What did Raft solve? How to use it?

Introduction

Inder to improve understandability, Raft decompose some component and state space reduction.

leader election
log replication
safety
Cluster membership changes
Log compaction

Raft has several novel features:

Strong leader
Leader election
Membership changes

Replicated state machines

Consensus algorithms typically arise in the context of replicated state machines.

Replicated state machines are typically implemented using a replicated log.

Keeping the replicated log consistent is the job of the consensus algorithm.

Consensus algorithms for practical systems typically have the following properties:

They ensure safety (never returning an incorrect result) under all non-Byzantine conditions, including network delays, partitions, and packet loss, duplication, and reordering.
They are fully functional (available) as long as any majority of the servers are operational and can communicate with each other and with clients. Thus, a typical cluster of five servers can tolerate the failure of any two servers. Servers are assumed to fail by stopping; they may later recover from state on stable storage and rejoin the cluster.
They do not depend on timing to ensure the consistency of the logs: faulty clocks and extreme message delays can, at worst, cause availability problems.
In the common case, a command can complete as soon as a majority of the cluster has responded to a single round of remote procedure calls; a minority of slow servers need not impact overall system performance.

What’s wrong with Paxos?

Paxos has two significant drawbacks:

The first drawback is that Paxos is exceptionally difficult to understand.
The second problem with Paxos is that it does not provide a good foundation for building practical implementations.
1. One reason is that there is no widely agreed-upon algorithm for multi-Paxos.
2. Furthermore, the Paxos architecture is a poor one for building practical systems; this is another consequence of the single-decree decomposition.
3. Another problem is that Paxos uses a symmetric peer-to-peer approach at its core (though it eventually suggests a weak form of leadership as a performance optimization).

As a result, practical systems bear little resemblance to Paxos.

Designing for understandability

Designing goals of Raft:

It must provide a complete and practical foundation for system building;
it must be safe under all conditions and available under typical operating conditions;
it must be efficient for common operations.
Our most important goal—and most difficult challenge—was understandability.
It must be possible to develop intuitions about the algorithm.

We used two techniques that are generally applicable.

The first technique is the well-known approach of problem decomposition: wherever possible, we divided problems into separate pieces that could be solved, explained, and understood relatively independently.
Our second approach was to simplify the state space by reducing the number of states to consider, making the system more coherent and eliminating nondeterminism where possible.

The Raft consensus algorithm

Raft basics

Raft decomposes the con- sensus problem into three relatively independent subproblems:

Leader election: a new leader must be chosen when an existing leader fails.
Log replication: the leader must accept log entries from clients and replicate them across the cluster, forcing the other logs to agree with its own.
Safety

Three states：

Leader — The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader).
Follower — They issue no requests on their own but simply respond to requests from leaders and candidates.

Candidate — It is used to elect a new leader.

Terms act as a logical clock in Raft, and they allow servers to detect obsolete information such as stale leaders.

If one server’s current term is smaller than the other’s, then it updates its current term to the larger value.

How to update?

Raft servers communicate using remote procedure calls (RPCs), and the basic consensus algorithm requires only two types of RPCs.

RequestVote RPCs are initiated by candidates during elections.
AppendEntries RPCs are initiated by leaders to replicate log entries and to provide a form of heartbeat.

Leader election

Two type timeout:

election timeout .

A candidate wins an election if it receives votes from a majority of the servers in the full cluster for the same term. Each server will vote for at most one candidate in a given term, on a first-come-first-served basis.

If the leader’s term (included in its RPC) is at least as large as the candidate’s current term, then the candidate recognizes the leader as legitimate and returns to follower state.

Rejects the RPCs that the term in is smaller than the current term.

Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly.

Log replication

Term

the term number
the integer idex

Raft guarantees that committed entries are durable and will eventually be executed by all of the available state machines.

How to define the committed entry?

We designed the Raft log mechanism to maintain a high level of coherency between the logs on different servers.

If two entries in different logs have the same index and term, then they store the same command.
If two entries in different logs have the same index and term, then the logs are identical in all preceding entries.

The first property follows from the fact that a leader creates at most one entry with a given log index in a given term, and log entries never change their position in the log.

The second property is guaranteed by a simple consistency check performed by AppendEntries.

In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own.

Find the latest log entry where the two logs agree, delete any entries in the follower’s log after that point, and send the follower all of the leader’s entries after that point.

Removes any conflicting entries in the follower’s log.

In the normal case a new entry can be replicated with a single round of RPCs to a majority of the cluster.

Why can a new entry be replicated with a single round of RPCs?

Safety

Election restriction

The RequestVote RPC implements this restriction: the RPC includes information about the candidate’s log, and the voter denies its vote if its own log is more up-to-date than that of the candidate.

Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs.

I do not understand "Committing entries from previous terms" and "Safety argument" sections.

Follower and candidate crashes

Timing and availability

Leader election is the aspect of Raft where timing is most critical.

References

Tip

Tip sharing: writing JavaDoc in AsciiDoctor.

It is easy to generate JavaDoc with Maven. You can write in AsciiDoctor, then use the plugin to generate JavaDoc.

See configuaration below:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-javadoc-plugin</artifactId>
    <version>3.0.1</version>
    <configuration>
        <source>1.8</source>
        <doclet>org.asciidoctor.Asciidoclet</doclet>
        <docletArtifact>
            <groupId>org.asciidoctor</groupId>
            <artifactId>asciidoclet</artifactId>
            <version>1.5.4</version>
        </docletArtifact>
        <!--<overview>src/main/java/overview.adoc</overview>-->
        <additionalparam>
            --base-dir ${project.basedir}
            --attribute "name=${project.name}"
            --attribute "version=${project.version}"
            --attribute "title-link=http://example.com[${project.name} ${project.version}]"
        </additionalparam>
    </configuration>
</plugin>

If you use IntelliJ IDEA, it will add <p> on JavaDoc empty lines by default. You should close it. Here is how:

Preferences > Editor > Code Style > Java > JavaDoc > 'Generate "<p>" on empty lines'

Uncheck the option then it is fine.

English

First, I keep on reciting more than 50 English sentences every day, including 5 new sentences and reviewing 48 sentences.

During reading the Raft paper, I chose 52 words which I had not known before.

attendee
authority
challenge
compound
condense
consult
contradiction
decree
demonstrate
dense
densely
derive
duplicate
durable
elapse
establish
exacerbate
extraneous
facilitate
fashion
fraction
grant
hypothesize
identical
indefinite
indefinitely
induction
inevitable
inevitably
intervene
intuition
legitimate
magnitude
meld
notorious
notoriously
obsolete
opaque
overall
precede
preceding
prone
randomized
recipient
restriction
revert
sketch
steady
struggle
subtle
superior
survey

一个读友向我询问：该如何学习 Java？

经沟通发现，群友自己在国外读研，迫于生计，从其他专业转行自学 Java。目前是学习 Java 刚刚入门。由于学校课程繁多，只能自学 IT，距离毕业不到两年时间。希望能在毕业之际，找到一个可以安身立命的工作。

鉴于此，考虑到前端知识点不多，学习任务相对较少，而且刷新一下浏览器就能里面看到效果，获得学习反馈。我个人建议学习前端知识：HTML、CSS、JavaScript 为本，后期可以把 HTTP 和算法补起来。推荐了几本经典书籍如下：

HTML & CSS设计与构建网站 (豆瓣) — 入门，培养兴趣。
JavaScript高级程序设计（第3版） (豆瓣) — JavaScript 入门，重点看三四五六七八、十、十三、十四章。三遍以上。
JavaScript DOM编程艺术（第2版） (豆瓣) — 学习 JavaScript 主要作用 DOM 编程。照着撸一遍代码。
ECMAScript 6 入门 — 现在已经是 ES6 的天下了，不学没法出门。
CSS 实战手册（第四版） (豆瓣) — 两本 CSS 入门加全面学习。
CSS揭秘 (豆瓣)
你不知道的JavaScript（上卷） (豆瓣) — 一套很经典的 JavaScript 经典书籍。
你不知道的JavaScript（中卷） (豆瓣)
你不知道的JavaScript（下卷） (豆瓣)
HTML5秘籍（第2版） (豆瓣) — HTML 全面介绍。

我还推荐了 MDN： MDN Web Docs。

推荐了一些工具网站：

CodePen - Front End Developer Playground & Code Editor in the Browser — 可以查询某些经典的组件实现。
神器——Chrome开发者工具(一)
谷歌浏览器开发工具综述 - Chrome 开发工具指南
概述 · Chrome 开发者工具中文文档
Chrome开发者工具的小技巧

明确地定义了一个目标：半年后模仿 CSS Zen Garden 来自己实现一套效果样式。

聊到 ARTS 任务，我个人认为 ARTS 的目的是坚持学习。工作经验丰富的人，可以分享高大上的东西；但是，这对于刚刚入门的人来说，几乎是不可能的任务。所以，应该根据自身的情况，来调整。对于入门的小伙伴来说，分享自己的学习笔记也是一种不错的选择。

从实用主义的角度来考虑，我建议这位读友先入门，学习完基本知识，以能让自己找到工作为目标来调整自己的 ARTS。

另外，需要强调：不要降低对自己的标准。如果没有完成 ARTS 任务，那么就需要反思，节省下来的时间干嘛去了？有没有用来学习？

后续，我又做了一些探索。

通过 Google Trends 可以看到不同前端框架在加拿大的搜索情况，以此也能看到工作机会的大小：

React, Vue, angular, jQuery 探索 - Google 趋势

后续还看了想过的一些相关资料：

从这些资料来看，入门前端后，掌握 React 也是一个几乎必要的技能。

写这个文档时候，我突发奇想，又在 Google Trends 中，加入了 jQuery 的对比，从上面的结果来看，jQuery 生命力依然强劲，也是一个必备技能。想到这里，我想起了我推荐的 jQuery 源码分析课：逐行分析jQuery源码的奥秘 - 网易云课堂。又想起了一个 JavaScript 入门视频： JavaScript教程-从入门到精通 - 网易云课堂也还可以。

最近祝这位读友学习顺利！