• - [x] Theano functions for beam search - [x] Extract necessary inputs - [x] Search itself - [x] Multinomial probabilities in Theano for ndim > 2 - [ ] Test beam search</p><p>该提问来源于开源项目&#...
• Beam Search 目标 知道beam search的概念和原理 能够在代码中使用Beam search 完成预测过程 1. Beam Search的介绍 在进行模型评估的过程中，每次我们选择概率最大的token id作为输出，那么整个输出的句子的概率...
Beam Search
目标

知道beam search的概念和原理
能够在代码中使用Beam search 完成预测过程

1. Beam Search的介绍

在进行模型评估的过程中，每次我们选择概率最大的token id作为输出，那么整个输出的句子的概率就是最大的么？
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wTYTOEhs-1613919938213)(…/images/2.3/greedy search.png)]

Beam search的又被称作束集搜索，是一种seq2seq中用来优化输出结果的算法(不在训练过程中使用)。
例如：传统的获取解码器输出的过程中，每次只选择概率最大的那个结果，作为当前时间步的输出，等到输出结束，我们会发现，整个句子可能并不通顺。虽然在每一个时间步上的输出确实是概率最大的，但是整体的概率确不一定最大的，我们经常把它叫做greedy search[贪心算法]
为了解决上述的问题，可以考虑计算全部的输出的概率乘积，选择最大的哪一个，但是这样的话，意味着如果句子很长，候选词很多，那么需要保存的数据就会非常大，需要计算的数据量就很大
那么Beam Search 就是介于上述两种方法的一个这种的方法，假设Beam width=2，表示每次保存的最大的概率的个数，这里每次保存两个，在下一个时间步骤一样，也是保留两个，这样就可以达到约束搜索空间大小的目的，从而提高算法的效率。
beam width =1 时，就是贪心算法，beam width=候选词的时候，就是计算全部的概率。beam width 是一个超参数。
比如在下图中:
使用一个树状图来表示每个time step的可能输出，其中的数字表示是条件概率
黄色的箭头表示的是一种greedy search，概率并不是最大的
如果把beam width设置为2，那么后续可以找到绿色路径的结果，这个结果是最大的
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VRW6HWct-1613919938221)(…/images/2.3/greedy search.png)]
下图是要给beam width=3的例子

首先输入start token <s>,然后得到四个输出(这里假设一个就四个输出:x,y,z,</s>)，选择概率最大三个，x,y,w
然后分别把x,y,z放到下一个time step中作为输入，分别得到三个不同的输出，找到三个输出中概率最大的三个，x,y,y
继续重复上述步骤，直到获得结束符(概率最大)或者是达到句子的最大长度，那么此时选择概率乘积最大的一个。
拼接整个路径上概率最大的所有结果，比如这里可能是<s>,y,y,x,w,</s>

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VER93VHt-1613919938225)(…/images/2.3/beamsearch.png)]
2. Beam serach的实现
在上述描述的思路中，我们需要注意以下几个内容：

数据该如何保存，每一次的输出的最大的beam width个结果，和之后之前的结果该如何保存
保存了之后的概率应该如何比较大小，保留下概率最大的三个
不能够仅仅只保存当前概率最大的信息，还需要有当前概率最大的三个中，前面的路径的输出结果

2.1 数据结构-堆-的认识
对于上面所说的，保留有限个数据，同时需要根据大小来保留，可以使用一种带有优先级的数据结构来实现，这里我们可以使用堆这种数据结构
堆是一种优先级的队列，但是他其实并不是队列，我们常说的队列都是先进先出或者是先进后出，但是堆只根据优先级的高低来取出数据。
和堆在一起的另外一种数据结构叫做栈,有入栈和出栈的操作，可以理解为是一种先进后出的数据结构，关于栈，大家可以下来在了解。
在python自带的模块中，有一个叫做heapq的模块，提供了堆所有的方法。通过下面的代码我们来了解下heapq的使用方法
my_heap = [] #使用列表保存数据

#往列表中插入数据，优先级使用插入的内容来表示，就是一个比较大小的操作，越大优先级越高
heapq.heappush(my_heap,[29,True,"xiaohong"])
heapq.heappush(my_heap,[28,False,"xiaowang"])
heapq.heappush(my_heap,[29,False,"xiaogang"])

for i in range(3):
ret= heapq.heappop(my_heap)  #pop操作，优先级最小的数据
print(ret)

#输出如下：
[28, False, 'xiaowang']
[29, False, 'xiaogang']
[29, True, 'xiaohong']

可以发现，输出的顺序并不是数据插入的顺序，而是根据其优先级，从小往大pop（False<True）。
2.2 使用堆来实现beam search
为了实现数据的的保存，我们可以把beam search中的数据保存在堆中，同时在往这个堆中添加数据的同时，判断数据的个数，仅仅保存beam width个数据
class Beam:
def __init__(self):
self.heap = list() #保存数据的位置
self.beam_width = config.beam_width #保存数据的总数

"""
添加数据，同时判断总的数据个数，多则删除
:param probility: 概率乘积
:param complete: 最后一个是否为EOS
:param seq: list，所有token的列表
:param decoder_input: 下一次进行解码的输入，通过前一次获得
:param decoder_hidden: 下一次进行解码的hidden，通过前一次获得
:return:
"""
heapq.heappush(self.heap,[probility,complete,seq,decoder_input,decoder_hidden])
#判断数据的个数，如果大，则弹出。保证数据总个数小于等于3
if len(self.heap)>self.beam_width:
heapq.heappop(self.heap)

def __iter__(self):#让该beam能够被迭代
return iter(self.heap)

实现方法，完成模型eval过程中的beam search搜索
思路：

构造<SOS>开始符号等第一次输入的信息，保存在堆中
取出堆中的数据，进行forward_step的操作，获得当前时间步的output，hidden
从output中选择topk（k=beam width）个输出，作为下一次的input
把下一个时间步骤需要的输入等数据保存在一个新的堆中
获取新的堆中的优先级最高（概率最大）的数据，判断数据是否是EOS结尾或者是否达到最大长度，如果是，停止迭代
如果不是，则重新遍历新的堆中的数据

代码如下
# decoder中的新方法
def evaluatoin_beamsearch_heapq(self,encoder_outputs,encoder_hidden):
"""使用 堆 来完成beam search，对是一种优先级的队列，按照优先级顺序存取数据"""

batch_size = encoder_hidden.size(1)
#1. 构造第一次需要的输入数据，保存在堆中
decoder_input = torch.LongTensor([[word_sequence.SOS] * batch_size]).to(config.device)
decoder_hidden = encoder_hidden #需要输入的hidden

prev_beam = Beam()
while True:
cur_beam = Beam()
#2. 取出堆中的数据，进行forward_step的操作，获得当前时间步的output，hidden
#这里使用下划线进行区分
for _probility,_complete,_seq,_decoder_input,_decoder_hidden in prev_beam:
#判断前一次的_complete是否为True，如果是，则不需要forward
#有可能为True，但是概率并不是最大
if _complete == True:
else:
decoder_output_t, decoder_hidden,_ = self.forward_step(_decoder_input, _decoder_hidden,encoder_outputs)
value, index = torch.topk(decoder_output_t, config.beam_width)  # [batch_size=1,beam_widht=3]
#3. 从output中选择topk（k=beam width）个输出，作为下一次的input
for m, n in zip(value[0], index[0]):
decoder_input = torch.LongTensor([[n]]).to(config.device)
seq = _seq + [n]
probility = _probility * m
if n.item() == word_sequence.EOS:
complete = True
else:
complete = False

#4. 把下一个实践步骤需要的输入等数据保存在一个新的堆中
decoder_input,decoder_hidden)
#5. 获取新的堆中的优先级最高（概率最大）的数据，判断数据是否是EOS结尾或者是否达到最大长度，如果是，停止迭代
best_prob,best_complete,best_seq,_,_ = max(cur_beam)
if best_complete == True or len(best_seq)-1 == config.max_len: #减去sos
return self._prepar_seq(best_seq)
else:
#6. 则重新遍历新的堆中的数据
prev_beam = cur_beam

def _prepar_seq(self,seq):#对结果进行基础的处理，共后续转化为文字使用
if seq[0].item() == word_sequence.SOS:
seq=  seq[1:]
if  seq[-1].item() == word_sequence.EOS:
seq = seq[:-1]
seq = [i.item() for i in seq]
return seq

2.3 修改seq2seq
在seq2seq中使用evaluatoin_beamsearch_heapq查看效果，会发现使用beam search的效果比单独使用attention的效果更好
使用小黄鸡语料（50万个问答），单个字作为token，5个epoch之后的训练结果，左边为问，右边是回答
你在干什么 >>>>> 你想干啥？
你妹 >>>>> 不是我
你叫什么名字 >>>>> 你猜
你个垃圾 >>>>> 你才是，你
你是傻逼 >>>>> 是你是傻
笨蛋啊 >>>>> 我不是，你



展开全文
• <div><p>use beam search to select the best suitable sample</p><p>该提问来源于开源项目：hunkim/word-rnn-tensorflow</p></div>
• Beam Search—&gt;Optimal Beam Search 原文 Beam Search Diverse Beam Search 原文 example diverse-beam-search Optimal Beam Search
Beam Search—>Optimal Beam Search

原文

Beam Search

Diverse Beam Search

原文 example diverse-beam-search

Optimal Beam Search

…
展开全文
• <div><p>This PR contains the implementation for beam search in the jitscript syntax. The unit test is confirmed working on Pytorch master as of tonight. <p>I've also added <code>onnx_full_export....
• Home Learners Instructors Developers Download Beam Search Algorithm (Draft by Andrew Jungwirth) Objectives ...To show how the Beam Search Algorithm uses a heuristic ...


Home
Learners
Instructors
Developers

Beam Search Algorithm (Draft by Andrew Jungwirth)
Objectives
To show how the Beam Search Algorithm uses a heuristic function and a given beam width in an attempt to simulate the Breadth-First Search in a memory-efficient way.

To emphasize the importance of limiting a graph-search's memory consumption when finding solutions to large problem spaces.

To demonstrate the strengths and weaknesses of the Beam Search Algorithm.Preparation
In order to understand this algorithm, you must be familiar with the concept of a graph as a group of nodes/vertices and edges connecting these nodes. It is also helpful to understand the how a search tree can be used to show the progress of a graph search.
Additionally, knowledge of the Breadth-First Search Algorithm is required because Beam Search is a modification of this algorithm.
Beam Search Algorithm
Even though the Breadth-First Search Algorithm is guaranteed to find the shortest path from a start node to a goal node in an unweighted graph, it is infeasible to use this algorithm on large search spaces because its memory consumption is exponential. This
causes the algorithm run out of main memory before a solution can be found to most large, nontrivial problems. For this reason, Beam Search was developed in an attempt to achieve the optimal solution found by the Breadth-First Search Algorithm without consuming
too much memory.
In order to accomplish this goal, Beam Search utilizes a heuristic function, h, to estimate the cost to reach the goal from a given node. It also uses a beam width, B, which specifies the number of nodes that are stored at each level of
the Breadth-First Search. Thus, while the Breadth-First Search stores all the frontier nodes (the nodes connected to the closing vertices) in memory, the Beam Search Algorithm only stores the B nodes with the best heuristic values at each level of
the search. The idea is that the heuristic function will allow the algorithm to select nodes that will lead it to the goal node, and the beam width will cause the algorithm to store only these important nodes in memory and avoid running out of memory before
finding the goal state.
Instead of the open list used by the Breadth-First Search Algorithm, the Beam Search Algorithm uses the BEAM to store the nodes that are to be expanded in the next loop of the algorithm. A hash table is used to store nodes that
have been visited, similar to the closed list used in the Breadth-First Search. Beam Search initially adds the starting node to the BEAM and the hash table. Then, each time through the main loop of the algorithm, Beam Search adds
all of the nodes connected to the nodes in the BEAM to its SET of successor nodes and then adds the B nodes with the best heuristic values from the SET to the BEAM and the hash table. Note that a node that
is already in the hash table is not added to the BEAM because a shorter path to that node has already been found. This process continues until the goal node is found, the hash table becomes full (indicating that the memory available
has been exhausted), or the BEAM is empty after the main loop has completed (indicating a dead end in the search).
The Beam Search Algorithm is shown by the pseudocode below. This pseudocode assumes that the Beam Search is used on an unweighted graph so the variable g is used to keep track of the depth of the search, which is the cost of reaching a node at that
level.
Pseudocode for the Beam Search Algorithm

/* initialization */
g = 0;
hash_table = { start };
BEAM = { start };

/* main loop */
while(BEAM ≠ ∅){                             // loop until the BEAM contains no nodes
SET = ∅;                                   // the empty set

/* generate the SET nodes */
for(each state in BEAM){
for(each successor of state){
if(successor == goal) return g + 1;
SET = SET ∪ { successor };             // add successor to SET
}
}

BEAM = ∅;                                  // the empty set
g = g + 1;

/* fill the BEAM for the next loop */
while((SET ≠ ∅) AND (B > |BEAM|)){         // set is not empty and the number of nodes in BEAM is less than B
state = successor in SET with smallest h value;
SET = SET \ { state };                   // remove state from SET
if(state ∉ hash_table){                  // state is not in the hash_table
if(hash_table is full) return ∞;
hash_table = hash_table ∪ { state };   // add state to hash_table
BEAM = BEAM ∪ { state };               // add state to BEAM
}
}
}

// goal was not found, and BEAM is empty - Beam Search failed to find the goal
return ∞;

Example Trace of the Beam Search Algorithm
The following traces of the Beam Search Algorithm use two rows to represent each main loop of the algorithm's execution. The first row of each numbered loop displays the nodes added to the SET. These nodes are ordered by their heuristic values,
with alphabetical ordering used to sort nodes with identical h values. Since the SET is a mathematical set, if a node is inserted into the SET more than once from multiple parents, it only appears in the SET once. The second
row of each numbered loop lists the nodes from the SET that are added to the BEAM in the second part of the main loop. Both rows also display the hash table to show its current state. Notice that the hash table has only
seven slots, indicating that the memory size for this example trace is seven. A simple linear hashing scheme with key values determined by the node names' ASCII values mod 7 is used for simplicity. In all three of these lists, nodes are listed in the format node_name(predecessor_name).
The algorithm is traced four times with different values of B to demonstrate the strengths and weaknesses of the algorithm. Each trace includes a search tree that shows the BEAM at each level of the search. In the graph, the numbers under
the node names are the h values for the nodes. These traces show how Beam Search attempts to find the shortest path from node I to node B in the graph shown in Figure 1. (Figure 1 is included above each trace for convenience.)

Figure 1
Trace 1, B = 1

loop
number
SET (first row per numbered loop)BEAM (second row per numbered loop)
hash_table

BEAM = { I(null) }
hash_table = { _, I(null), _, _, _, _, _ }
1
SET = { G(I), J(I), E(I), H(I) }
hash_table = { _, I(null), _, _, _, _, _ }
1
BEAM = { G(I) }
hash_table = { _, I(null), _, _, _, _, G(I) }
2
SET = { D(G), J(G), I(G) }
hash_table = { _, I(null), _, _, _, _, G(I) }
2
BEAM = { D(G) }
hash_table = { _, I(null), _, D(G), _, _, G(I) }
3
SET = { G(D) }
hash_table = { _, I(null), _, D(G), _, _, G(I) }
3
BEAM = { }
hash_table = { _, I(null), _, D(G), _, _, G(I) }

Figure 2

At this point, the BEAM is empty, and the Beam Search Algorithm has reached a dead-end in its search. Since the node G in the SET was already in the hash table, it could not be added to the BEAM, which left the BEAM empty.
This trace illustrates the greatest weakness of the Beam Search Algorithm: An inaccurate heuristic function can lead the algorithm into a situation in which it cannot find a goal, even if a path to the goal exists. While increasing the value of Bmay
allow Beam Search to find the goal, increasing B by too much may cause the algorithm to run out of memory before it finds the goal. For this reason, the choice of B has a large impact on Beam Search's performance. Figure 2 shows the BEAM nodes
at each level in this dead-end search.

Figure 1
Trace 2, B = 2

loop
number
SET (first row per numbered loop)BEAM (second row per numbered loop)
hash_table

BEAM = { I(null) }
hash_table = { _, I(null), _, _, _, _, _ }
1
SET = { G(I), J(I), E(I), H(I) }
hash_table = { _, I(null), _, _, _, _, _ }
1
BEAM = { G(I), J(I) }
hash_table = { _, I(null), J(I), _, _, _, G(I) }
2
SET = { A(J), D(G), G(J), J(G), E(J), I(G) }
hash_table = { _, I(null), J(I), _, _, _, G(I) }
2
BEAM = { A(J), D(G) }
hash_table = { A(J), I(null), J(I), D(G), _, _, G(I) }
3
SET = { C(A), G(D), J(A) }
hash_table = { A(J), I(null), J(I), D(G), _, _, G(I) }
3
BEAM = { C(A) }
hash_table = { A(J), I(null), J(I), D(G), C(A), _, G(I) }
4
SET = { B(C) [goal found - algorithm returns], A(C) }
hash_table = { A(J), I(null), J(I), D(G), C(A), _, G(I) }

Figure 3

In this trace, the Beam Search Algorithm successfully found the goal via the path IJACB. Even though a solution was found, this solution is not optimal because IECB is a shorter path to the goal node. Once again, an inaccurate heuristic
function reduced the effectiveness of the Beam Search Algorithm. Figure 3 shows the BEAM nodes at each level of the search. Notice that only one node appears in the BEAM at level three in the tree. This demonstrates that Beam Search may not
always be able to fill the BEAM at each level in the search. In the last level of the tree, node A was first added to the SET, and then node B (the goal node) was found and caused the search to complete.

Figure 1
Trace 3, B = 3

loop
number
SET (first row per numbered loop)BEAM (second row per numbered loop)
hash_table

BEAM = { I(null) }
hash_table = { _, I(null), _, _, _, _, _ }
1
SET = { G(I), J(I), E(I), H(I) }
hash_table = { _, I(null), _, _, _, _, _ }
1
BEAM = { G(I), J(I), E(I) }
hash_table = { _, I(null), J(I), _, E(I), _, G(I) }
2
SET = { A(J), C(E), D(G), F(E), G(J), J(E), E(J), H(E), I(E) }
hash_table = { _, I(null), J(I), _, E(I), _, G(I) }
2
BEAM = { A(J), C(E), D(G) }
hash_table = { A(J), I(null), J(I), C(E), E(I), D(G), G(I) }
3
SET = { B(C) [goal found - algorithm returns], A(C), C(A), J(A) }
hash_table = { A(J), I(null), J(I), C(E), E(I), D(G), G(I) }

Figure 4

With B = 3, the Beam Search Algorithm found the optimal path to the goal. However, the larger beam width caused the algorithm to fill the entire memory available for the hash table. Figure 4 shows the BEAM nodes at each level in the search.
In the last level of the tree, nodes A, C, and J were added to theSET, and then the goal node B was found, which caused to search to complete.

Figure 1
Trace 4, B = 4

loop
number
SET (first row per numbered loop)BEAM (second row per numbered loop)
hash_table

BEAM = { I(null) }
hash_table = { _, I(null), _, _, _, _, _ }
1
SET = { G(I), J(I), E(I), H(I) }
hash_table = { _, I(null), _, _, _, _, _ }
1
BEAM = { G(I), J(I), E(I), H(I) }
hash_table = { H(I), I(null), J(I), _, E(I), _, G(I) }
2
SET = { A(J), C(E), D(G), F(E), G(J), J(E), E(H), H(E), I(E) }
hash_table = { H(I), I(null), J(I), _, E(I), _, G(I) }
2
BEAM = { A(J), C(E), D(G) [not enough memory - algorithm returns] }
hash_table = { H(I), I(null), J(I), A(J), E(I), C(E), G(I) }

Figure 5

Using B = 4, the Beam Search Algorithm quickly ran out of memory. This shows the second major weakness of the Beam Search Algorithm: When B becomes large, the algorithm consumes memory very quickly like the Breadth-First Search Algorithm.
Figure 5 shows the BEAM at each level in the search. The last level in the tree shows the progress of the search when the algorithm ran out of memory.
Efficiency/Algorithm Analysis
It is generally effective to analyze graph-search algorithms by considering four traits:
Completeness: A search algorithm is complete if it will find a solution (goal node) when a solution exists.

Optimality: A search algorithm is optimal if it finds the optimal solution. In the case of the Beam Search Algorithm, this means that the algorithm must find the shortest path from the start node to the goal node.

Time complexity: This is an order-of-magnitude estimate of the speed of the algorithm. The time complexity is determined by analyzing the number of nodes that are generated during the algorithm's execution.

Space complexity: This is an order-of-magnitude estimate of the memory consumption of the algorithm. The space complexity is determined by the maximum number of nodes that must be stored at any one time during the algorithm's execution.Completeness
In general, the Beam Search Algorithm is not complete. This is illustrated in Trace 1 above. Even though the memory was not depleted, the algorithm failed to find the goal because it could not add any nodes to the BEAM. Thus, even given unlimited
time and memory, it is possible for the Beam Search Algorithm to miss the goal node when there is a path from the start node to the goal node. A more accurate heuristic function and a larger beam width can improve Beam Search's chances of finding the goal.
However, this lack of completeness is one of the foremost weaknesses of the Beam Search Algorithm.
Optimality
Just as the Beam Search Algorithm is not complete, it is also not guaranteed to be optimal. This is shown by Trace 2 above. In this example, Beam Search found the goal node but failed to find the optimal path to the goal, even though the heuristic in Figure
1 is admissible (underestimates the cost to the goal from every node) and consistent (underestimates the cost between neighboring nodes). This happened because the beam width and an inaccurate heuristic function caused the algorithm to miss expanding the shortest
path. A more precise heuristic function and a larger beam width can make Beam Search more likely to find the optimal path to the goal.
Time Complexity
The time for the Beam Search Algorithm to complete tends to depend on the accuracy of the heuristic function. An inaccurate heuristic function usually forces the algorithm to expand more nodes to find the goal and may even cause it to fail to find the goal.
In the worst case, the heuristic function leads Beam Search all the way to the deepest level in the search tree. Thus, the worst case time is O(Bm), where B is the beam width, and m is the maximum depth of any path in the search
tree. This time complexity is linear because the Beam Search Algorithm only expands B nodes at each level; it does not branch out more widely at each level like many search algorithms that have exponential time complexities. The speed with which this
algorithm executes is one of its greatest strengths.
Space Complexity
Beam Search's memory consumption is its most desirable trait. Because the algorithm only stores B nodes at each level in the search tree, the worst-case space complexity is O(Bm), where B is the beam width, and m is the
maximum depth of any path in the search tree. This linear memory consumption allows Beam Search to probe very deeply into large search spaces and potentially find solutions that other algorithms cannot reach.
Compare with Your Textbook
Algorithms can look differently but still operate in almost the same ways. Compare the pseudocode above with the description in your textbook (if available). Then consider these questions:
Does your textbook use a hash table to store the nodes that have been expanded? If not, how does it store these nodes?

Does your textbook explain what type of structure should be used to implement the SET? If so, what structure does it use?Exploring the Algorithm's Dynamic Behavior
Explore the Algorithm within JHAVÉ
You can practice the Beam Search Algorithm using the algorithm visualization system JHAVÉ. If you have not used JHAVÉ before, please take the time to view the instructions on using JHAVÉ first. If your browser supports Java Webstart, you can launch a visualization
of the Beam Search Algorithm directly from this link.
The Beam Search visualization has fifteen example graphs with which you can experiment. The first four examples, perfect1, perfect2, perfect3, and perfect4, have perfect heuristic functions that allow the algorithm to
find the optimal path if it has enough memory. The next seven graphs, errant1, errant2, errant3,errant4, errant5, errant6, and errant7, have inaccurate heuristic functions that can lead the algorithm
to find paths that are longer than optimal if the beam width is too small. The last four graphs, end1, end2, end3, and end4, result in dead-end searches when using smaller beam widths. For each of these examples, you can
set the values for the beam width, B, and the memory size, M, to see how different values of these parameters affect the outcome of the algorithm on the graph. Finally, the level of detail option allows you to control how the animation steps
through the pseudocode. The high detail option shows how each node is added to the SET one-by-one and is useful when you are less familiar with the algorithm. The low detail option generates all the nodes in the SET in one step so that you
can more easily focus on the other aspects of the algorithm.
Step through the examples in the visualization and test how the different parameters modify the results found by the Beam Search Algorithm. Answer the questions that appear during the visualization to assess your understanding of the algorithm. When you
can consistently answer the questions correctly, try the exercises below.
Exercises
Using the values of B = 2 and M = 10, trace the Beam Search Algorithm on the graph in Figure 6 from start node J to goal node D.

Figure 6

Did your trace from above find the optimal path from node J to node D? If not, is there a value of B that will find the shortest path with M = 10?

Is it possible for Beam Search to make a dead-end search (BEAM is empty at the beginning of the main loop) in the graph above (Figure 6) from node J to node D? If so, what value(s) of B produce(s) this situation?

Modify the heuristic values in the graph above (Figure 6) so that the Beam Search Algorithm can find the optimal path from J to D with B = 1. How much memory is required to find the shortest path in this new graph with B =
1?Designing Data Sets
Creating input for an algorithm is an effective way to demonstrate your understanding of the algorithm's behavior.
Design a graph such that Beam Search fails to find the optimal path from the start node to the goal node with B = 1 but finds this shortest path with B = 2.

Construct a graph so that a Beam Search with B = 2 reaches a dead-end and fails to find the goal node. (Note: Here a dead-end search refers to the situation in which the BEAM is empty at the beginning of the main loop and does
not mean the search runs out of memory.)Modifying the Algorithm
Consider modifying the Beam Search Algorithm as described above so that the lines previously written as

if(hash_table is full) return ∞;
hash_table = hash_table ∪ { state };
BEAM = BEAM ∪ { state };

are reordered as

hash_table = hash_table ∪ { state };
BEAM = BEAM ∪ { state };
if(hash_table is full) return ∞;

Does this have any effect on the results found by the algorithm? Can you devise an example in which the first version of the algorithm finds the goal, but the modified version returns before finding the goal?
Create Your Own Visualization
Using your own source code, presentation software, or manually produced drawings, create your own visualization of the Beam Search Algorithm.
Presentation
Develop a ten-minute presentation on the Beam Search Algorithm that uses the visualization you developed above to explain the strengths and weaknesses of the Beam Search Algorithm.

Copyright © 2005 -- Revised: May 2006

转载于:https://www.cnblogs.com/StevenL/p/6818493.html
展开全文
• <div><p>This is a WIP integration of beam search based on my port of the ctc beam decoder in tensorflow. Will address #86. <p>Remaining: - [x] additional testing - [x] add optional dictionary ...
• Seq2Seq中常用到的优化方法就是Beam Search，但是Beam Search的一个缺点就是生成的N个回答往往差异性很小，无法体现语言的多样性（比如文本摘要、机器翻译的生成文本，往往有不止一种表述方式）。最近看论文的时候...


Seq2Seq中常用到的优化方法就是Beam Search，但是Beam Search的一个缺点就是生成的N个回答往往差异性很小，无法体现语言的多样性（比如文本摘要、机器翻译的生成文本，往往有不止一种表述方式）。最近看论文的时候发现Google提出的改进Beam Search方法，下面来稍微总结下。

论文地址：https://arxiv.org/pdf/1610.02424.pdf

具体如下：选择 Beam size 为 B，然后将其分为 G 组，每一组就有 $\frac{B}{G}$个beam。每个单独的组内跟 beam search很像，不断延展序列。同时通过引入一个dissimilarity 项来保证组与组之间有差异。

如上图所示，B = 6, G=3，每一组的beam width为2。

组内与 beam search 很像：从t-1到 t 时刻，不断的减少搜索空间（如同beam search一样）。

组间差异：对于t=4时刻，我们先对第一组输出y（t=4），然后我们开始对第二组输出y（t=4），但是第二组y（t=4）的score不仅取决于第二组之前的y（t=3），也取决于其与第一组的相似程度。以此类推，在t=4时刻对于第三组的输出，我们从上图可以看到其score的打分标准。这儿对于其 dissimilarity 项的计算采用的办法是 hamming diversity，这个理解起来很简单，比如这个时刻可能输出的词在上面的组出现过，我们就对这个词的分数-1，如果这个时刻可能输出的词在上面组没有出现过，我们就对这个词的分数不惩罚。

Reference

文本生成解码之 Beam Search

展开全文
• beam search:在test的过程中生成几段序列的方式，通常用于机器翻译或看图说话中。 beam search 中有一个重要的参数：beam size = k，表示最后生成的得分最高的前k个序列 在看图说话或机器翻译中，最后生成的句子中...
• 转载 https://blog.csdn.net/xyz1584172808/article/details/89220906 ... ... beam search算法 在看论文Sequence to Sequence Learning with neural networks时看到了beam s...
• ## BeamSearch 简介

万次阅读 2016-06-01 11:41:18
概要传统的广度优先策略能够找到最优的路径，但是在搜索空间非常大的情况下，内存占用是指数级增长，很容易造成内存溢出，因此提出了beam search的算法。 beam search尝试在广度优先基础上进行进行搜索空间的优化...
• <ul><li>[x] Add class <code>model.beam_search.HybridBeamSearchSampler</code></li>[x] Allow method <code>model.beam_search._expand_to_beam_size</code> to accept Symbols</li><li>[x] Add <code>_extract_...
• Wiki定义：Incomputer science,beam searchis aheuristicsearch algorithmthat explores a graph by expanding the most promising node in a limited set. Beam search is an optimization ofbest-first search...

...