Stella: Problem Solving with Conversational Agents


Dante Tam, 5/29/2017


In this article, we work through a complicated, intricate computer science problem. We begin by analyzing it, brainstorming and implementing solutions, and then testing those solutions on clearly defined tests and metrics, while maintaining best practices and good software architecture.


Stating the problem: My conversational agent, Stella, takes in a text command, and needs to return the appropriate action. Given a list of actions, this is a relatively easy task for a human. However, it is much harder for computers to see grammatical structures, word contexts, etc.


In machine learning, this is called a classification problem. For review, a classification asks given some data point $X_i$, what is the actual class or category $y_i$? In this case, our input data is a sentence, and our possible class choices are the various commands. For example,


$$"Please\; show\; my\; finances."\;\Rightarrow \;the\; finance\; command$$


First Approach, Text Similarity: We model every command as having some important keywords. The more keywords the command and input share, the more likely the command is correct. We extend this to include synonyms of the words present. Technically, we achieved this using WordNet, a collection of English words organized into a graph. Words that are similar to each other i.e. synonyms, are connected together. We analyzed the similarity of two words by their degrees of connection. Much like how LinkedIn shows you more of your closer friends and 1st degree connections, we prioritized first degree synonyms. We also give less but significant priority to similarity of defintions and sentence usages i.e. dictionary entries. In the future, this method could be improved with NLP research in Lesk dictionary methods, which rely on words' contexts.


Second Approach, Zero-One Encoded Vectors w/ Softmax Regression: We represent the inputs as vectors, and the commands as numbered labels. The problem transforms below, from the first type to the second:


$$"Please\; show\; my\; finances."\;\Rightarrow \;the\; finance\; command$$

$$...$$

$$[0,0,1,0,1,1,1,...]\;\Rightarrow 17$$


Simply put, we want to find a device that turns the sentence into a label. Mathematically, our goal is to find a vector $w$ such that


$$Xw = \hat{y} \approx y \Leftrightarrow minarg_w \vert \vert Xw - y \vert \vert_2^2$$


where $\hat{y}$ is our best guess. We use a method called softmax regression which achieves close to the goal. Google's machine learning library TensorFlow helps us greatly with this task.


To get the sentences as vectors, we collect all the unique words, numbered $1$ to $V$. Then every sentence can be represented as a vector $s$ of length $V$, such that $s_i = 1$ means that the sentence contains the $ith$ word, otherwise $s_i = 0$. For example,


$$"Please\; show\; my\; finances.",\; "Show\; my\; citations."$$

$$Please \Leftrightarrow 1, show \Leftrightarrow 2, my \Leftrightarrow 3, finances \Leftrightarrow 4, citations \Leftrightarrow 5$$

$$[1,1,1,1,0],\;[0,1,1,0,1]$$


Overall, this approach had near 73% validation accuracy, and while quite neat, was not strong enough for our applications.


Third Approach, Convolutional Neural Networks for Sentence Classification: We use a convolutional neural network (CNN) on a large sentence vector, as described in the research paper of the same name (Yoon Kim, 2014). We use word2vec, a tool developed which converts words into vectors of length $k$, such that similar words have similar vectors. Sentences are now $n$ by $k$ matrices, where $n$ is the maximum length of all sentences and $k$ is the dimension from earlier. This essentially gives us an image, which is processed well by CNNs. For regular images, CNNs can become edge detectors, brightness detectors, and so on, just as our CNN can begin to recognize large, semi-abstract concepts such as grammatical structures, word contexts, etc.


This method was state of the art in 2014 and not surprisingly it fared well, scoring 97% training accuracy.


We finally have an algorithm for our conversational agent, Stella, which takes in user requests and completes the appropriate action. It is a robust solution, scoring very highly on our validation tests. It is extensible, as we can add many more classes and commands and training data. It is organized into its own Pythom module, which can be called in a desktop application, web server, etc. It is a good solution.


[PDF]


Citations


Princeton University "About WordNet." WordNet. Princeton University. 2010. <http://wordnet.princeton.edu>

Mihalcea, Tarau. "TextRank: Bringing Order into Texts." University of North Texas. 2005. <https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf>

S. Brin and L. Page. 1998. The anatomy of a large-scale hyper-textual Web search engine. Computer Networks and ISDN Systems, 30(1–7).

Pak, Paroubek. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Universit ́e de Paris-Sud, Laboratoire LIMSI-CNRS. 2011. < http://web.archive.org/web/20111119181304/http://deepthoughtinc.com/wp-content/uploads/2011/01/Twitter-as-a-Corpus-for-Sentiment-Analysis-and-Opinion-Mining.pdf >

Tan, Steinbach, Kumar. "Association Analysis: Basic Concepts and Algorithms." Introduction to Data Mining. Pearson, 2005. < https://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf >

Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14, no. 3, pp 130-137.

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.

Manning, Christopher D, et al. An Introduction to Information Retrieval. Cambridge, England, Cambridge University Press, 2009.

Olah, Christopher. "Understanding LSTM Networks." Colah\'s Blog. N.p., 27 Aug. 2015. Web. 16 May 2017. http://colah.github.io/posts/2015-08-Understanding-LSTMs/>.

O. Vinyals, Q.V. Le. A Neural Conversational Model. ICML Deep Learning Workshop, arXiv 2015.

With help from the APIs: Google (Calendar, Gmail, Maps, Custom Search Engine, Finance, StaticMap); MediaWiki Wikipedia Content and Search; Facebook; Messenger; Yahoo (YQL, Finance); NASA GIBS

Inspired by UC Berkeley's CS 170 and CS 189 courses in algorithms and machine learning, as well as natural language processing.

Built with d3.js, Bootstrap, jQuery & AJAX, deployed with GitHub Pages.

Icons from game-icons.net.