Disclaimer - Seoul National...
Transcript of Disclaimer - Seoul National...
저 시-비 리- 경 지 2.0 한민
는 아래 조건 르는 경 에 한하여 게
l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.
다 과 같 조건 라야 합니다:
l 하는, 저 물 나 포 경 , 저 물에 적 된 허락조건 명확하게 나타내어야 합니다.
l 저 터 허가를 면 러한 조건들 적 되지 않습니다.
저 에 른 리는 내 에 하여 향 지 않습니다.
것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.
Disclaimer
저 시. 하는 원저 를 시하여야 합니다.
비 리. 하는 저 물 리 목적 할 수 없습니다.
경 지. 하는 저 물 개 , 형 또는 가공할 수 없습니다.
공학석사 학위논문
Relation Memory Network:
A Simple Neural architecture for
Text-based Reasoning
관계 메모리 네트워크:
텍스트 형식의 추론 문제를 푸는 간단한 신경망 구조
2018 년 2 월
서울대학교 대학원
산업공학과
문 지 형
Abstract
Relation Memory Network:
A Simple Neural architecture for
Text-based Reasoning
Jihyung Moon
Department of Industrial Engineering
The Graduate School
Seoul National University
With the advancement of machine learning algorithms that deal with text, more
and more companies are using chatbots for their business. Current chatbot technol-
ogy relies on key word extraction through Natural Language Processing (NLP) and
human-generated IF-THEN rules. However, as recent developments in deep learning
shows great possibility to imitate human reasoning ability without external interven-
tion, many deep learning models are being developed to find the right answer when a
question is given. To solve the text-based question and answering task that requires
relational reasoning, it is necessary to memorize a large amount of information and
find out the question relevant information from the memory. Most approaches were
based on external memory and four components proposed by Memory Network. The
distinctive component among them was the way of finding the necessary information
and it contributes to the performance. Recently, a simple but powerful neural network
module for reasoning called Relation Network (RN) has been introduced. We ana-
lyzed RN from the view of Memory Network, and realized that its MLP component
i
is able to reveal the complicate relation between question and object pair. Motivated
from it, we introduce Relation Memory Network (RMN) which uses MLP to find out
relevant information on Memory Network architecture. It shows new state-of-the-art
results in jointly trained bAbI-10k story-based question answering tasks and bAbI
dialog-based question answering tasks. By virtue of RMN’s high reasoning ability
even in memory-intensive situations, it is expected to be better applied to general
situations compared to other models.
Keywords: Natural Language Processing; Deep Learning; Reasoning; Memory Net-
work; Relation Network
Student Number: 2016-21101
ii
Contents
Abstract i
Contents iv
List of Tables vi
List of Figures vii
Chapter 1 Introduction 1
Chapter 2 Related Work 6
2.1 Memory-Augmented Neural Network . . . . . . . . . . . . . . . . . . . 6
2.2 Relation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 3 Relation Memory Network 14
3.1 Embedding component . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Attention component . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Updating component . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Reasoning component . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 4 Experiments 19
iii
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 bAbI story-based QA dataset . . . . . . . . . . . . . . . . . . . 19
4.1.2 bAbI dialog dataset . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Training details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.1 bAbI story-based QA dataset . . . . . . . . . . . . . . . . . . . 23
4.2.2 bAbI dialog dataset . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 5 Results and Discussion 25
5.1 bAbI story-based QA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 bAbI dialog-based QA . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Model analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.3.1 Effectiveness of the MLP-based output feature map . . . . . . 35
5.3.2 Performance of RN and RMN according to memory size . . . . 36
5.3.3 Effectiveness of the number of hops . . . . . . . . . . . . . . . . 37
Chapter 6 Conclusion 38
Bibliography 40
Appendix 43
A Model details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
국문초록 47
iv
List of Tables
Table 2.1 MemN2N, GMemN2N, DMN+, and RN in terms of MemNN
architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Table 5.1 Average error rate on bAbI story-based tasks with 10k training
samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Table 5.2 bAbI story-based task visualization of α . . . . . . . . . . . . . 29
Table 5.3 Caption for LOF . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 5.4 Visualization of α on bAbI dialog tasks without match type . . 33
Table 5.5 Test error of RMN on bAbI story-based QA dataset with dif-
ferent configurations . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 5.6 Test error and training time of RN and RMN on bAbI story-
based QA dataset with different memory size . . . . . . . . . . 37
Table 5.7 Test error of RMN on bAbI story-based QA dataset with dif-
ferent number of hops . . . . . . . . . . . . . . . . . . . . . . . 38
Table A.1 Hyperparameters of Related Memory Networks on bAbI dialog
tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
v
Table B.1 Visualization of α1 and α2 on user input revised bAbI dialog
task 3 without match type . . . . . . . . . . . . . . . . . . . . . 46
vi
List of Figures
Figure 1.1 Chatbot benefits for business . . . . . . . . . . . . . . . . . . . 3
Figure 2.1 MemN2N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 2.2 GMemN2N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 2.4 Relation Network . . . . . . . . . . . . . . . . . . . . . . . . . 13
Figure 2.5 The output vector of gθ when the input objects are related
and unrelated . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 3.1 Relation Memory Network . . . . . . . . . . . . . . . . . . . . 15
Figure 3.2 One-hot represented sentence and its embedding example us-
ing LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 4.1 Sample statements and questions from bAbI story based QA
tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 4.2 Sample dialog from bAbI dialog based QA tasks . . . . . . . . 23
1
Chapter 1
Introduction
With advances in machine learning algorithms that deal with unstructured data, there
is an increasing trend to use chatbot for business like Google Assistant, Siri, Alexa,
Bixby and so on. The chatbot can provide personalized counsel and personalized con-
tents information to consumers, and can be used as a business channel for marketing
because it is in contact with the user. Its benfit is described in Figure 1.1.
Currently, most of the chatbot frameworks are based on the Natural Language
Processing (NLP) and IF-THEN rules. To understand user intention, NLP techniques
such as Named Entity Recognition (NER), Part-of-speech (POS) tagging are used
to extract keywords from the user utterance. Then, predefined IF-THEN rules are
applied to these keywords to select the most suitable answer from the established
answer candidate set. It seems quite simple, however, a lot of IF-THEN rules should
be generated to cover all exceptions in the language system.
Recently, neural network based deep learning models have made an enormous
progress in artificial intelligence. Deep learning models could handle highly complicate
pattern so that developed to process and analyze text itself. However, for the task
2
Figure 1.1: Chatbot benefits for business
of answering which requires logical reasoning, more complex and careful handling of
features is called for. A reasoning task requires the machine answer a simple question
upon the delivery of a series of sequential information. For example, imagine that the
machine is given the following three sentences: “Mary got the milk there.”, “John
moved to the bedroom.”, and “Mary traveled to the hallway.” Once prompted with
the question, “Where is the milk?”, the machine then needs to sequentially focus on
the two sentences, “Mary got the milk there.” and “Mary traveled to the hallway.”
in order to successfully determine that the milk is located in the hallway.
Inspired by this reasoning mechanism, J. Weston and Bordes (2015) has intro-
duced the memory network (MemNN), which consists of an external memory and
four components: input feature map (I), generalization (G), output feature map (O),
and response (R) [7]. The external memory enables the model to deal with a knowl-
3
edge base without loss of information. Input feature map embeds the incoming sen-
tences. Generalization updates old memories given the new input and output feature
map finds relevant information from the memory. Finally, response produces the final
output.
Based on the memory network architecture, neural network based models like end-
to-end memory network (MemN2N) [14], gated end-to-end memory network (GMemN2N)
[10], dynamic memory network (DMN) [9], and dynamic memory network + (DMN+)
[16] are proposed. Since strong reasoning ability depends on whether the model is able
to sequentially catching the right supporting sentences that lead to the answer, the
most important thing that discriminates those models is the way of constructing the
output feature map. As the output feature map becomes more complex, it is able
to learn patterns for more complicate relations. For example, MemN2N, which has
the lowest performance among the four models, measures the relatedness between
question and sentence by the inner product, while the best performing DMN+ uses
inner product and absolute difference with two embedding matrices.
Recently, a new architecture called Relation Network (RN) [12] has been proposed
as a general solution to relational reasoning. The design philosophy behind it is to
directly capture the supporting relation between the sentences through the multi-
layer perceptron (MLP). Despite its simplicity, RN achieves better performance than
previous models without any catastrophic failure.
The interesting thing we found is that RN can also be interpreted in terms of
MemNN. It is composed of O and R where each corresponds to MLP which focuses
4
on the related pair and another MLP which infers the answer. RN does not need to
have G because it directly finds all the supporting sentences at once. In this point of
view, the significant component would be MLP-based output feature map. As MLP
is enough to recognize highly non-linear pattern, RN could find the proper relation
better than previous models to answer the given question.
However, as RN considers a pair at a time unlike MemNN, the number of relations
that RN learns is n2 when the number of input sentence is n. When n is small, the cost
of learning relation is reduced by n times compared to MemNN based models, which
enables more data-efficient learning [12]. However, when n increases, the performance
becomes worse than the previous models. In this case, the pair-wise operation in-
creases the number of non-related sentence pairs more than the related sentence pair,
thereby confuses RN’s learning. Santoro et al. (2017) has suggested attention mech-
anisms as a solution to filter out unimportant relations; however, since it interrupts
the reasoning operation, it may not be the optimal solution to the problem [12].
Our proposed model, “Relation Memory Network” (RMN), is able to find complex
relation even when a lot of information is given. It stores information in the memory,
and uses MLP to find out question-relevant information which inherits RN’s MLP-
based output feature map on Memory Network architecture. To answer that needs
more reasoning steps, RMN simply erase the information already used to sequentially
finds out necessary information. Final answer is selected from the answer candidates
via MLP-based response and model’s reasoning ability is evaluated through this result.
Among the recent deep learning based models, RMN shows the state-of-the-art result
5
on the text-based question answering tasks.
We present the related works in Chapter 2, and the detailed component of RMN
in Chatper 3. Chapter 4 contains the description of used datasets and training details,
and Chatper 5 describes the experimental results and detailed Discussions. Finally,
we conclude in Chapter 6.
6
Chapter 2
Related Work
2.1 Memory-Augmented Neural Network
Since the text has basically sequential nature, a model which use the same transition
function with the same parameters at every time step is proposed. It is called recurrent
neural network (RNN) and its improved variant long short term memory (LSTM) [5]
and gated recurrent unit (GRU) [2] are widely applied to deal with text data. However,
to answer the question from a given set of facts, the model needs to memorize the
facts from the past while RNN and LSTM are inept at remembering past stories
because of their small internal memory [14].
To cope with this problem, J. Weston and Bordes (2015) has proposed a new
class of memory-augmented model called Memory Network (MemNN) [7]. MemNN
comprises of an external memory m and four components: input feature map (I),
generalization (G), output feature map (O), and response (R). I encodes the sentences
and they are stored in memory m. G updates the memory, whereas O reads output
feature o from the memory. Finally, R infers an answer from o.
MemN2N is the first model to design MemNN’s idea as an end-to-end trainable
7
(a) MemN2N with hop 1 (b) MemN2N with hop 3
Figure 2.1: MemN2N
neural network. Its components are well described in Figure 2.1a. I embeds the entire
set of sentences xi to memory vectors mi using an embedding matrix A. The query
q is also embedded via another embedding matrix B to obtain an internal state u.
O first computes the match between u and each memory mi by taking inner product
followed by a softmax:
pi = Softmax(uTmi) (2.1)
Output feature vector o is a sum over the transformed inputs ci, weighted by the
probability vector from the input:
o =∑i
pici (2.2)
Answer a is produced from the sum of the output vector o and the input embedding
u:
a = Softmax(W (o+ u)) (2.3)
8
Figure 2.2: GMemN2N
Equation 2.1, 2.2, and 2.3 are repeated to handle many relations as described in Fig-
ure 2.1b. This operation is called k-hop operation where k is the number of repetition.
GMemN2N, as shown in Figure 2.2, introduced a gate mechanism to the output
feature map and showed an accuracy improvement compared to MemN2N.
DMN+, another model that follows the structure of MemNN, consists of I, G,
O, and R components. Overall architecture is shown in Figure 2.3a. In Figure 2.3b,
the input feature map of DMN+ is described and we could see it embeds the entire
set of sentences xi to memory vectors fi using GRU . Xiong et al. (2016) use new
concept called episodic memory m to cope with sequential reasoning process [16].
They refer to the episodic memory on the tth pass over the inputs as mt as seen in
Figure 2.3c. The most important and distinctive part of DMN+ is the output feature
map based on various relatedness such as absolute difference, as well as inner product,
to understand the relation between sentence and question at various points. This is
9
(a) DMN+
(b) Input feature map
(c) Output feature map
represented in equation 2.4, 2.5, 2.6, and 2.7.
zti = [fi ◦ q; fi ◦mt−1; |fi − q|; |fi −mt−1|] (2.4)
Zti = W (2)tanh(W (1)zti + b(1)) + b(2) (2.5)
gti =exp(Zti )∑k exp(Ztk)
(2.6)
ct =∑i
gtifi (2.7)
Table 2.1 explains how different memory network based models are. The more
difficult the task, the more complex the output feature map to get the correct answer.
For a dataset experimenting the text-based reasoning ability of the model, the overall
10
accuracy could be increased in order of MemN2N, GMemN2N, and DMN+, where
the complexity of the component increases.
11
Table 2.1: MemN2N, GMemN2N, DMN+, and RN in terms of MemNN architecture
Output feature map Generalization
MemN2N
αti = Softmax((rt)Tmti)(r0 = q)
ot =∑
i αtim
ti
rt+1 = ot + rt
mt = W tmt−1
GMemN2N
αti = Softmax((rt)Tmti)(r0 = q)
ot =∑
i αtim
ti
T t = σ(W tot)
rt+1 = ot � T t + rt � (1− T t)
mt = W tmt−1
DMN+αti = Softmax(g([mi ◦ q,mi ◦ rt−1, |mi − q|, |mi − rt−1|]))
(r0 = q)rt = GRU(αi, r
t−1)
RNri = gθ-MLP([mi,mj ,q])
o =∑
i ri
-
12
2.2 Relation Network
Figure 2.4: Relation Network
Relation Network (RN) has emerged as a new and simpler framework for solving
the general reasoning problem. Under the assumption that relations are inferred from
the connection between an object and another, RN takes in a pair of objects as its
input. For the question and answering problem, RN conditions on the question, and
then processes this new input through two MLPs. The mathematical expression is as
following:
RN(O) = fφ(∑i,j
gθ([oi,oj ,q])) (2.8)
where gθ and fφ are MLPs with parameters θ and φ and o is embedded object.
The role of each MLP is not clearly defined in the original paper, but from the
view of MemNN, it can be understood that gθ corresponds to O and fφ corresponds
to R. To verify the role of gθ, we compare the output when pairs are made with
13
(a) related objects (b) unrelated objects
Figure 2.5: The output vector of gθ when the input objects are related and unrelated
supporting sentences and when made with unrelated sentences. For example, there
is a question “Is the chocolate bigger than the box?” given “The container is bigger
than the chocolate.”, “The box of chocolates is bigger than the chocolate.”, “The box
is bigger than the box of chocolates.”, “The box of chocolates fits inside the chest.”,
and “The chocolate fits inside the box.”. In this case, the supporting sentences are
“The box of chocolates is bigger than the chocolate.” and “The box is bigger than the
box of chocolates.” and unrelated sentences are “The box of chocolates fits inside the
chest.” and “The chocolate fits inside the box.”. Figure 2.5 shows the visualization
result of each output.
When we focus on whether the value is activated or not, we can see that gθ
distinguishes supporting sentence pair from non-supporting sentence pair as output
feature map examines how relevant the sentence is to the question. Therefore, we
can comprehend the output of gθ reveals the relation between the object pair and the
question and fφ aggregates all these outputs to infer the answer. Table 2.1 summarizes
the interpretation of RN in terms of MemNN.
14
Chapter 3
Relation Memory Network
Motivated by MemNN’s architecture and the potential of MLP to discover the related
object, Relation Memory Network (RMN) use MLP in the output feature map while
following MemNN’s overall framework.
18 Mary travelled to the kitchen.
146 Daniel travelled to the hallway.
147 Daniel went back to the garden.
148 Where was the apple before
the garden?
� �
144 Daniel grabbed the apple.
145 Mary went to the bedroom.
hallway�
�
�
� ��
���������
�
�
�����
� ����
�����������
�������
���������
�����
������
�
�
�
�
�
�
�
�
�
�
���
�
���
�
���
�
�
�
�
�
���
�
���
�
���
�
�
� � �
Figure 3.1: Relation Memory Network
RMN is composed of four components - embedding, attention, updating, and rea-
soning. It takes as the inputs a set of sentences x1,x2, ...,xn and its related question
u, and outputs an answer a. Each of the xi, u, and a is made up of one-hot representa-
tion of words, for example, xi = {xi1,xi2,xi3, ...,xini} (xij ∈ RV , j = (1, 2, ..., ni), V =
15
vocabulary size, ni = number of words in sentence i).
Figure 3.2: One-hot represented sentence and its embedding example using LSTM
16
3.1 Embedding component
We first embed words in each xi = {xi1,xi2,xi3, ...,xini} and u to a continuous space
multiplying an embedding matrix A ∈ Rd×V . Then, the embedded sentence is stored
and represented as a memory object mi while question is represented as q. Any of the
following methods are available for embedding component: simple sum (equation 3.1),
position encoding [7] (equation 3.2), concatenation (equation 3.3), LSTM, and GRU.
In case of LSTM or GRU, mi is the final hidden state of it, as described in Figure 3.2.
mi =∑j
Axij (3.1)
mi =∑j
lj ·Axij (lkj = (1− j/ni)− (k/d)(1− 2j/ni)) (3.2)
mi = [Axi1, Axi2, ... , Axini ] (3.3)
As the following attention component takes the concatenation of mi and q, it is not
necessarily the case that sentence and question have the same dimensional embedding
vectors unlike previous memory-augmented neural networks.
17
3.2 Attention component
Attention component can be applied more than once depending on the problem;
Figure 3.1 illustrates 2 hop version of RMN. We refer to the ith embedded sentence
on the tth hop as mti.
To constitute the attention component, we applied simple MLP represented as gtθ.
It must be ended with 1 unit output layer to provide a scalar weight wti , which leads
to an attention weight αti between 0 and 1. In the beginning, a vector concatenated
with m1i and q flows to the g1θ . From the result of g1θ , attention weight α1
i is calculated
using additional variable β1 (≥ 1) to control the intensity of attention, inspired by
the way Neural Turing Machine [3] reads from the memory. Then we get the related
memory object r1, a weighted sum of α1i and memory object m1
i for all i. If there exist
more hops, r1 is directly taken to the next hop and iterates over this process with the
updated memory object m2i . All the procedures are rewritten as equation 3.4, 3.5,
and 3.6:
wti ← gtθ([mti, r
t−1]) (i = (1, 2, ..., n), r0 = q) (3.4)
αti ←exp(βtwti)∑i exp(βtwti)
(βt(z) = 1 + log(1 + exp(z))) (3.5)
rt ←∑i
αti ·mti (3.6)
18
3.3 Updating component
To forget the information already used, we use intuitive updating component to renew
the memory. It is replaced by the amount of unconsumed from the old one:
mt+1i ← (1− αti) mt
i (3.7)
Contrary to other components, updating is not a mandatory component. When it is
considered to have 1 hop, there is no need to use this.
3.4 Reasoning component
Similar to attention component, reasoning component is also made up of MLP, rep-
resented as fφ. It receives both q and the final result of attention component rf and
then takes a softmax to produce the model answer a:
a← Softmax(fφ([rf ,q])) (3.8)
19
Chapter 4
Experiments
4.1 Dataset
4.1.1 bAbI story-based QA dataset
bAbI story-based QA dataset [15] is composed of 20 different types of tasks for testing
natural language reasoning ability. Each task requires different methods to infer the
answer. The dataset includes a set of statements comprised of multiple sentences, a
question and answer. A statement can be as short as two sentences and as long as
320 sentences. To answer the question, it is necessary to find relevant one or more
sentences to a given question and derive answer from them. Answer is typically a
single word but in a few tasks, answers are a set of words. Each task is regarded
as success when the accuracy is greater than 95%. There are two versions of this
dataset, one that has 1k training examples and the other with 10k examples. Most of
the previous models test their accuracy on 10k dataset with trained jointly.
20
Figure 4.1: Sample statements and questions from bAbI story based QA tasks
21
4.1.2 bAbI dialog dataset
bAbI dialog dataset [1] is a set of 5 tasks within the goal-oriented context of restaurant
reservation. It is designed to test if model can learn various abilities such as performing
dialog management, querying knowledge bases (KBs), and interpreting the output of
such queries. The KB can be queried using API calls and 4 fields (a type of cuisine,
a location, a price range, and a party size). They should be filled to issue an API
call. Task 1 tests the capacity of interpreting a request and asking the right questions
to issue an API call. Task 2 checks the ability to modify an API call. Task 3 and 4
test the capacity of using outputs from an API call to propose options in the order
of rating and to provide extra-information of what user asks for. Task 5 combines
everything. The maximum length of the dialog for each task is different: 14 for task
1, 20 for task 2, 78 for task 3, 13 for task 4, and 96 for task 5. As restaurant name,
locations, and cuisine types always face new entities, there are normal and OOV test
sets to assess model’s generalization ability. Training sets consist of 1k examples,
which is not a large amount of creating realistic learning conditions.
22
Figure 4.2: Sample dialog from bAbI dialog based QA tasks
23
4.2 Training details
4.2.1 bAbI story-based QA dataset
We trained 2 hop RMN jointly on all tasks using 10k dataset for model to infer the
solution suited to each type of tasks. We limited the input to the last 70 stories for all
tasks except task 3 for which we limited input to the last 130 stories, similar to Xiong
et al. (2016) which is the hardest condition among previous models [16]. Then, we
labeled each sentence with its relative position. Embedding component is similar
to Santoro et al. (2017), where story and question are embedded through different
LSTMs; 32 unit word-lookup embeddings; 32 unit LSTM for story and question [12].
For attention component, as we use 2 hop RMN, there are g1θ and g2θ ; both are three-
layer MLP consisting of 256, 128, 1 unit with ReLU activation function [11]. fφ is
composed of 512, 512, and 159 units (the number of words appearing in bAbI dataset
is 159) of three-layer MLP with ReLU non-linearities where the final layer was a linear
that produced logits for a softmax over the answer vocabulary. For regularization, we
use batch normalization [6] for all MLPs. The softmax output was optimized with a
cross-entropy loss function using the Adam optimizer [8] with a learning rate of 2e−4.
24
4.2.2 bAbI dialog dataset
We trained on full dialog scripts with every model response as answer, all previous
dialog history as stories, and the last user utterance as question. Model selects the
most probable response from 4,212 candidates which are ranked from a set of all bot
utterances appearing in training, validation and test sets (plain and OOV) for all tasks
combined. We also report results when we use match type features for dialog. Match
type feature is an additional label on the candidates indicating if word is found on
the dialog history. This feature can alleviate OOV problem. Training was done with
Adam optimizer and a learning rate of 1e−4 for all tasks. Additional model details
are given in Appendix A.
25
Chapter 5
Results and Discussion
5.1 bAbI story-based QA
As we can see in table 5.1, RMN shows state-of-the-art result on bAbI story-based
Question Answering dataset: 98.8% accuracy where only a single task with no catas-
trophic failure. It succeeded on task 17, 18, 19 where MemN2N, GMemN2N, and
DMN are failed to solve, and on task 16 which DMN+, DNC, and EntNet scored
high error rates.
Table 5.2 shows how our model solved several tasks. RMN’s attention component
g1θ and g2θ complement each other to identify the necessary facts to answer correctly.
Sometimes both g1θ and g2θ concentrate on the same sentences which are all critical to
answer the question, and sometimes g1θ finds a fact related to the given question and
with this information g2θ chooses the key fact to answer. While trained jointly, RMN
learns these different solutions for each task. For the task 3, the only failed task,
attention component still functions well; it focuses sequentially on the supporting
sentences. However, the reasoning component, fφ, had difficulty catching the word
‘before’. We could easily figure out ‘before’ implies ‘just before’ the certain situation,
26
whereas RMN confused its meaning. As shown in table 5.2c, our model found all
previous locations before the garden. Still, it is remarkable that the simple MLP
carried out all of these various roles.
27
Table 5.1: Average error rate on bAbI story-based tasks with 10k training samples
Task MemNN MemN2N GMemN2N DMN DMN+ DNC EntNeta RNb RMN
1: Single Supporting Fact 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0
2: Two Supporting Facts 0.0 0.3 0.0 1.8 0.3 0.4 2.8 8.3 0.5
3: Three Supporting Facts 0.0 9.3 4.5 4.8 1.1 1.8 10.6 17.1 14.7
4: Two Argument Relations 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5: Three Argument Relations 2.0 0.6 0.2 0.7 0.5 0.8 0.4 0.7 0.4
6: Yes/No Questions 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0
7: Counting 15.0 3.7 1.8 3.1 2.4 0.6 0.8 0.4 0.5
8: Lists/Sets 9.0 0.8 0.3 3.5 0.0 0.3 0.1 0.3 0.3
9: Simple Negation 0.0 0.8 0.0 0.0 0.0 0.2 0.0 0.0 0.0
10: Indefinite Knowledge 2.0 2.4 0.2 2.5 0.0 0.2 0.0 0.0 0.0
11: Basic Coreference 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.4 0.5
12: Conjunction 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13: Compound Coreference 0.0 0.0 0.0 0.2 0.0 0.1 0.0 0.0 0.0
14: Time Reasoning 1.0 0.0 0.0 0.0 0.0 0.4 3.6 0.0 0.0
15: Basic Deduction 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
16: Basic Induction 0.0 0.4 0.0 0.6 45.3 33.1 52.1 4.9 0.9
17: Positional Reasoning 35.0 40.7 27.8 40.4 4.2 12.0 11.7 1.6 0.3
18: Size Reasoning 5.0 6.7 8.5 4.7 2.1 0.8 2.1 2.1 2.3
19: Path Finding 64.0 66.5 31.0 65.5 0.0 3.9 63.0 3.2 2.9
20: Agent’s Motivations 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Mean error (%) 6.7 6.6 3.7 6.4 2.8 2.7 7.4 2.0 1.2
Failed tasks (err. >5%) 4 4 3 2 1 2 4 2 1
aFor a fair comparison, we report EntNet’s result [4] which was jointly trained on all tasks. Itwas written in the appendix of the paper.
bOur implementation. The result is different from what Santoro et al. (2017) mentioned, whichis caused by the initialization [12].
28
Table 5.2: bAbI story-based task visualization of α
(a) Task 7
Seq. Task 7: Counting α1 α2
8 John grabbed the apple there . 0.02 0.00
9 John gave the apple to Mary . 0.08 0.16
10 Mary passed the apple to John . 0.17 0.24
11 Mary journeyed to the hallway . 0.00 0.01
13 Sandra went to the garden . 0.00 0.00
14 Mary went to the kitchen . 0.00 0.03
15 Mary picked up the football there . 0.29 0.24
16 Mary picked up the milk there . 0.27 0.32
User input How many objects is Mary carrying?
Answer Two
Model answer Two [Correct]
(b) Task 14
Seq. Task 14: Time reasoning α1 α2
1 Mary went back to the school yesterday . 0.00 0.01
2 Fred went to the school yesterday . 0.00 0.00
3 Julie went back to the kitchen yesterday . 0.13 0.98
4 Fred journeyed to the kitchen this morning . 0.00 0.00
5 This morning julie journeyed to the school . 0.66 0.02
6 This evening mary went back to the school . 0.01 0.00
7 This afternoon julie went to the bedroom . 0.06 0.00
User input Where was Julie before the school ?
Answer Kitchen
Model answer Kitchen [Correct]
(c) Task 3
Seq. Task 3: Three Supporting Facts α1 α2
33 Daniel took the football . 0.46 0.01
39 Sandra travelled to the bedroom . 0.01 0.00
40 Daniel moved to the bathroom . 0.01 0.31
41 Sandra got the milk . 0.00 0.00
42 Daniel travelled to the garden . 0.04 0.42
46 Daniel went to the hallway . 0.00 0.25
50 Daniel put down the apple . 0.00 0.01
51 Daniel put down the football there . 0.38 0.00
User input Where was the football before the garden ?
Answer Bathroom
Model answer Bathroom [Correct]
Seq. Task 3: Three Supporting Facts α1 α2
1 Mary got the football . 0.26 0.00
3 Mary picked up the football . 0.39 0.00
5 Mary moved to the office . 0.00 0.05
6 Mary went to the hallway . 0.00 0.05
11 Mary travelled to the garden . 0.00 0.21
12 Mary travelled to the kitchen . 0.00 0.29
14 Mary moved to the office . 0.00 0.25
16 Mary went to the garden . 0.00 0.05
22 Mary discarded the football there . 0.24 0.00
User input Where was the football before the garden ?
Answer Office
Model answer Kitchen [Incorrect]
29
5.2 bAbI dialog-based QA
The results in the Table 5.3 show that the RMN has the best results in any conditions.
Without any match type, RN and RMN outperform previous memory-augmented
models on both normal and OOV tasks. This is mainly attributed to the impressive
result on task 4 which can be interpreted as an effect of MLP based output feature
map.
To solve task 4, it is critical to understand the relation between ‘phone number’ of
user input and ‘r phone’ of previous dialog as shown in Table 5.4d. We assumed that
inner product was not sufficient to capture their implicit similarity and performed an
supporting experiment. We converted RMN’s attention component to inner product
based attention, and the results revealed the error rate increased to 11.3%.
For the task 3 and task 5 where the maximum length is especially longer than
the others, RN performs worse than MemN2N, GMemN2N and RMN. The number
of unnecessary object pairs created by the RN not only increases the processing time
but also decreases the accuracy.
With the match type feature, all models other than RMN have significantly im-
proved their performance except for task 3 compared to the plain condition. RMN
was helped by the match type only on the OOV tasks and this implies RMN is able
to find relation in the With Match condition for the normal tasks.
When we look at the OOV tasks more precisely, RMN failed to perform well on
the OOV task 1 and 2 even though g1θ properly focused on the related object as
30
shown in Table 5.4a. We state that this originated from the fact that the number
of keywords in task 1 and 2 is bigger than that in task 4. In task 1 and 2, all four
keywords (cuisine, location, number and price) must be correctly aligned from the
supporting sentence in order to make the correct API call which is harder than task
4. Consider the example in Table 5.4a and Table 5.4d. Supporting sentence of task 4
have one keyword out of three words, whereas supporting sentences of task 1 and 2
consist of four keywords (cuisine, location, number and price) out of sixteen words.
Different from other tasks, RMN yields the same error rate 25.1% with MemN2N
and GMemN2N on the task 3. The main goal of task 3 is to recommend restaurant
from knowledge base in the order of rating. All failed cases are displaying restaurant
where the user input is <silence>which is somewhat an ambiguous trigger to find
the input relevant previous utterance. As shown in Table 5.4c, there are two different
types of response to the same user input. One is to check whether all the required
fields are given from the previous utterances and then ask user for the missing fields
or send a “Ok let me look into some options for you.” message. The other type is to
recommend restaurant starting from the highest rating. All models show lack of ability
to discriminate these two types of silences so that concluded to the same results. To
verify our statement, we performed an additional experiment on task 3 and checked
the performance gain (extra result is given in Table B.1 of Appendix B).
31
Table 5.3: Average error rate on bAbI dialog tasks a
TaskPlain With Match
MemN2N GMemN2N RNb RMN MemN2N GMemN2N RNb RMN
1: Issuing API calls 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2: Updating API calls 0.0 0.0 0.5 0.0 1.7 0.0 0.0 0.0
3: Displaying options 25.1 25.1 26.6 25.1 25.1 25.1 27.1 25.1
4: Providing extra information 40.5 42.8 0.0 0.0 0.0 0.0 0.0 0.0
5: Conducting full dialogs 3.9 3.7 23.3 2.5 6.6 2.0 16.6 1.8
Average error rates (%) 13.9 14.3 10.1 5.5 6.7 5.4 8.7 5.4
1 (OOV): Issuing API calls 27.7 17.6 17.8 16.8 3.5 0.0 1.5 0.0
2 (OOV): Updatining API calls 21.1 21.1 23.2 21.1 5.5 5.8 0.0 0.0
3 (OOV): Displaying options 25.6 24.7 27.2 24.9 24.8 24.9 29.8 25.1
4 (OOV): Providing extra information 42.4 43.0 0.0 0.0 0.0 0.0 0.0 0.0
5 (OOV): Conducting full dialogs 34.5 33.3 38.3 34.5 22.3 20.6 28.4 21.7
Average error rates (%) 30.3 27.9 21.3 19.5 11.2 10.3 12.0 9.4
aOnly compared with models under the same conditionsbOur implementation.
32
Table 5.4: Visualization of α on bAbI dialog tasks without match type
(a) Plain result of task 1
Seq. Locutor Task 1: Issuing API calls α1
1 user Good morning 0.01
2 bot Hello what can i help you with today 0.01
3 user Can you make restaurant reservation for eight people in cheap price range with british cuisine 0.23
4 bot I’m on it 0.00
5 user <silence> 0.14
6 bot Where should it be 0.01
7 user In madrid 0.25
8 bot Ok let me look into some options for you 0.18
User input <silence>
Answer API call british madrid eight cheap
Model answer API call british madrid eight cheap [Correct]
(b) OOV result of task 1
Seq. Locutor Task 1 (OOV): Issuing API calls α1
1 user Hi 0.01
2 bot Hello what can i help you with today 0.01
3 user May I have table with cantonese cuisine in cheap price range in seoul for two people 0.49
4 bot I’m on it 0.10
5 user <silence> 0.01
6 bot Ok let me look into some options for you 0.27
User input <silence>
Answer API call cantonese seoul two cheap
Model answer API call italian paris two cheap [Incorrect]
33
(c) Same user input with different goal in task 3
Seq. Locutor Task 3: Displaying options α1
7 user resto 1 r rating 1 0.01
14 user resto 8 r rating 8 0.01
21 user resto 3 r rating 3 0.01
22 user Hello 0.01
23 bot Hello what can i help you with today 0.01
24 user Can you book table with french food for two people in madrid in moderate price range 0.21
25 bot I’m on it 0.01
User input <silence>
Answer Where should it be?
Model answer Where should it be? [Correct]
Seq. Locutor Task 3: Displaying options α1
7 user resto 1 r rating 1 0.00
14 user resto 8 r rating 8 0.00
21 user resto 3 r rating 3 0.00
24 user Can you book table with french food for two people in madrid in moderate price range 0.14
25 bot I’m on it 0.00
26 user <silence> 0.09
27 bot Ok let me look into some options for you 0.15
28 user <silence> 0.10
29 bot What do you think of this option : resto 8 0.08
30 user No this does not work for me 0.06
31 bot Sure let me find other option for you 0.08
User input <silence>
Answer What do you think of this option: resto 3
Model answer What do you think of this option: resto 7 [Incorrect]
34
(d) Plain result of task 4
Seq. Locutor Task 4: Providing extra information α1
1 user resto 3 r phone resto 3 phone 0.77
3 user resto 3 r address resto 3 address 0.01
4 user resto 3 r location london 0.01
6 user resto 3 r price cheap 0.01
10 user I’d like to book table at resto 3 0.02
11 bot Great let me do reservation 0.03
User input What is phone number of restaurant?
Answer Here it is: resto 3 phone
Model answer Here it is: resto 3 phone [Correct]
(e) OOV result of task 4
Seq. Locutor Task 4 (OOV): Providing extra information α1
1 user resto 1 r phone resto 1 phone 0.79
3 user resto 1 r address resto 1 address 0.01
4 user resto 1 r location hanoi 0.01
6 user resto 1 r price expensive 0.01
10 user Can you make restaurant reservation at resto 1 0.01
11 bot Great let me do reservation 0.04
User input What is phone number of restaurant?
Answer Here it is: resto 3 phone
Model answer Here it is: resto 3 phone [Correct]
35
5.3 Model analysis
5.3.1 Effectiveness of the MLP-based output feature map
The most important feature that distinguishes MemNN based models is the output
feature map. Table 5.5 summarizes the experimental results for the bAbI story-based
QA dataset when replacing the RMN’s MLP-based output feature map with the idea
of the previous models. inner product was used in MemN2N, inner product with
gate was used in GMemN2N, and inner product and absolute difference with two
embedding matrices was used in DMN and DMN+. From the Table 5.5, the more
complex the output feature map, the better the overall performance. In this point of
view, MLP is the effective output feature map.
Table 5.5: Test error of RMN on bAbI story-based QA dataset with different config-
urations
inner product inner product with gateinner product and absolute difference
with two embedding matrices
MLP
error rate 29.4 25.9 11.2 1.2
36
5.3.2 Performance of RN and RMN according to memory size
Additional experiments were conducted with the bAbI story-based QA dataset to see
how memory size affects both performance and training time of RN and RMN. Test
errors with training time written in parentheses are summarized in Table 5.6.
When memory size is small, we could observe the data-effeciency of RN. It shows
similar performance to RMN in less time. However, when the memory size increases,
performance is significantly reduced compared to RMN, even though it has been
learned for a longer time. It is even lower than itself when the memory size is 20.
On the other hand, RMN maintains high performance even when the memory size
increases.
Table 5.6: Test error and training time of RN and RMN on bAbI story-based QA
dataset with different memory size
memory size RN RMN
20 2.0 (0.65 days) 1.5 (1.46 days)
130 9.8 (9.47 days) 1.2 (4.94 days)
37
5.3.3 Effectiveness of the number of hops
bAbI story based QA dataset differs in the number of supporting sentences by each
task that need to be referenced to solve problems. For example, task 1, 2, and 3
require single, two, and three supporting facts, respectively. The result of the mean
error rate for each task according to the number of hops is in Table 5.7.
Overall, the number of hops is correlated with the number of supporting sentences.
In this respect, when the number of relations increases, RMN could reason across
increasing the number of hops to 3, 4 or more.
Table 5.7: Test error of RMN on bAbI story-based QA dataset with different number
of hops
hop 1 hop 2 hop 3
task 1 (1) a 0.0 0.0 0.2
task 2 (2) 62.0 0.5 2.1
task 3 (3) 62.4 14.7 14.6
task 10 (1) 0.0 0.0 3.6
aNumber in parentheses indicates the number of supporting sentences to solve the task
38
Chapter 6
Conclusion
Our work, RMN, is a simple and powerful architecture that effectively handles text-
based question answering tasks when large size of memory and high reasoning ability
is required. Multiple access to the external memory to find out necessary information
through a multi-hop approach is similar to most of existing approaches. However, by
using a MLP that can effectively deal with complex relatedness when searching for
the right supporting sentences among a lot of sentences, RMN raised the state-of-the-
art performance on the story-based QA dataset and goal-oriented dialog QA dataset.
When comparing with RN which also used MLP to understand relations, RMN was
more effective in the case of large memory. By virtue of RMN’s high reasoning ability
regardless of the memory size, it is able to get one step closer to the end-to-end
chatbot using artificial intelligence.
Future work will apply RMN to image based reasoning task (e.g., CLEVR, DAQUAR,
VQA etc.). To extract features from the image, VGG net [13] is used in convention and
outputs 196 objects of 512 dimensional vectors which also require large sized memory.
An important direction will be to find an appropriate way to focus sequentially on
39
related object which was rather easy in text-based reasoning.
40
Bibliography
[1] Antoine Bordes and Jason Weston. Learning end-to-end goal-oriented dialog.
CoRR, abs/1605.07683, 2016.
[2] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Em-
pirical evaluation of gated recurrent neural networks on sequence modeling.
CoRR, abs/1412.3555, 2014.
[3] Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv
preprint arXiv:1410.5401, 2014.
[4] Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, and Yann Le-
Cun. Tracking the world state with recurrent entity networks. arXiv preprint
arXiv:1612.03969, 2016.
[5] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural
computation, 9(8):1735–1780, 1997.
[6] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep net-
work training by reducing internal covariate shift. In International Conference
on Machine Learning, pages 448–456, 2015.
41
[7] S. Chopra J. Weston and A. Bordes. Memory networks. International Conference
on Learning Representations (ICLR), 2015.
[8] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization.
arXiv preprint arXiv:1412.6980, 2014.
[9] Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan
Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. Ask me anything:
Dynamic memory networks for natural language processing. In International
Conference on Machine Learning, pages 1378–1387, 2016.
[10] Fei Liu and Julien Perez. Gated end-to-end memory networks. In Proceedings
of the 15th Conference of the European Chapter of the Association for Compu-
tational Linguistics: Volume 1, Long Papers, volume 1, pages 1–10, 2017.
[11] Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted
boltzmann machines. In Proceedings of the 27th international conference on
machine learning (ICML-10), pages 807–814, 2010.
[12] Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan
Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network mod-
ule for relational reasoning. Advances in neural information processing systems,
2017.
[13] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
42
[14] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. End-to-end memory
networks. In Advances in neural information processing systems, pages 2440–
2448, 2015.
[15] Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. To-
wards ai-complete question answering: A set of prerequisite toy tasks. CoRR,
abs/1502.05698, 2015.
[16] Caiming Xiong, Stephen Merity, and Richard Socher. Dynamic memory net-
works for visual and textual question answering. In International Conference on
Machine Learning, pages 2397–2406, 2016.
43
Appendix A
Model details
Table A.1: Hyperparameters of Related Memory Networks on bAbI dialog tasks
TaskStory and Question
Embedding
Word-lookup
Embedding Dim
Hop gθ fφ Activation Use Batch Norm
1 sum 128 1 2048, 2048, 1 2048, 2048, 4212 tanh True
2 sum 128 1 1024, 1024, 1 1024, 1024, 4212 tanh True
3 sum 128 1 1024, 1024, 1024, 1 1024, 1024, 1024, 4212 tanh True
4 concatenation 50 1 1024, 1024, 1 1024, 1024, 4212 tanh True
5 concatenation 64 2 4096, 4096, 1 4096, 4096, 4212 tanh True
44
Appendix B
Additional Results
We modify the user input from <silence>to <silence><silence>when looking for
restaurant recommendations. This makes model to distinguish two different situations
whether to ask for additional fields or to recommend restaurant.
45
Table B.1: Visualization of α1 and α2 on user input revised bAbI dialog task 3 without
match type
Seq. Locutor Task 3: Displaying options α1 α2
1 user resto 8 r phone resto 8 phone 0.00 0.01
2 user resto 8 r cuisine french 0.01 0.01
3 user resto 8 r address resto 8 address 0.01 0.01
4 user resto 8 r location madrid 0.01 0.01
5 user resto 8 r number two 0.00 0.01
6 user resto 8 r price moderate 0.00 0.01
7 user resto 8 r rating 8 0.01 0.39
8 user resto 3 r phone resto 3 phone 0.01 0.01
9 user resto 3 r cuisine french 0.00 0.01
10 user resto 3 r address resto 3 address 0.00 0.01
11 user resto 3 r location madrid 0.00 0.01
12 user resto 3 r number two 0.00 0.01
13 user resto 3 r price moderate 0.00 0.01
14 user resto 3 r rating 3 0.00 0.01
15 user resto 1 r phone resto 1 phone 0.01 0.01
16 user resto 1 r cuisine french 0.01 0.01
17 user resto 1 r address resto 1 address 0.01 0.01
18 user resto 1 r location madrid 0.01 0.01
19 user resto 1 r number two 0.00 0.01
20 user resto 1 r price moderate 0.00 0.01
21 user resto 1 r rating 1 0.00 0.01
22 user Good morning 0.01 0.01
46
Seq. Locutor Task 3: Displaying options α1 α2
23 bot Hello what can I help you with today? 0.12 0.01
24 user May I have table for two with french food? 0.08 0.02
25 bot I’m on it 0.03 0.01
26 user <silence> 0.01 0.01
27 bot Where should it be 0.09 0.01
28 user In madrid 0.01 0.01
29 bot Which price range are looking for? 0.09 0.01
30 user In moderate price range please 0.01 0.01
31 bot Ok let me look into some options for you 0.24 0.01
User input <silence><silence>
Answer What do you think of this option: resto 8
Model answer What do you think of this option: resto 8 [correct]
47
국문초록
머신러닝의발전으로텍스트를다룰수있는능력이향상되면서챗봇을비즈니스에활용
하는 사례가 늘고 있다. 현재의 챗봇 기술은 자연어처리 (Natural Language Processing,
NLP)를통한중요단어추출과사람이생성한 IF-THEN규칙에의존한다.그러나최근
딥러닝의발전으로외부개입없이사람의추론능력을모방함으로써,질문이주어졌을때
정답을 찾는 모델이 개발되고 있다. 텍스트를 통한 질문과 답변이 원활하게 이루어지기
위해서는 많은 양의 정보를 기억하고 그 중에서 질문에 관련된 정보를 찾는 추론 능력이
필요하다. 그래서 대부분의 모델은 외부 메모리와 메모리 네트워크 (Memory Network,
MemNN) 에서 제안한 네 가지 구성 요소를 기반으로 이루어져 디자인 되어왔고, 좋은
성능을 보였다. 각 모델은 필요한 정보를 찾아내는 구성 요소에서 두드러진 차이를 나
타냈는데, 이 부분이 복잡하게 구성될수록 복잡한 관계를 잘 찾아냄으로써 성능 향상을
보였다. 최근에는 관계 네트워크 (Relation Network, RN) 라고 불리는 추론을 위한 간
단하면서도 강력한 신경망 모듈이 소개되었다. 우리는 메모리 네트워크 관점에서 관계
네트워크를 분석하여 다층 퍼셉트론 (Multi-layer perceptron, MLP) 로 이루어진 구성
요소가 질문과 저장된 정보 간의 복잡한 관계를 나타낼 수 있음을 알게 되었다. 이를
바탕으로 다층 퍼셉트론을 사용하여 메모리 네트워크 아키텍처에서 관련 정보를 찾는
관계 메모리 네트워크 (Relation Memory Network, RMN) 를소개한다. RMN은모델의
단순함에도불구하고 bAbI-10k스토리기반 (bAbI-10k story based)및 bAbI대화기반
(bAbI dialog based) 질문 응답 작업에서 가장 높은 정확도를 보였다. 또한, 다른 모델
48
에 비해 메모리가 많은 상황에서도 높은 정확도를 보이기 때문에 향후 데이터가 많은
일반적인 상황에 보다 잘 적용될 것이라고 기대한다.
주요어: 자연어처리; 딥러닝; 추론; 메모리 네트워크; 관계 네트워크
학번: 2016-21101
49