We all heard by now deep learning is getting into healthcare. So here is a paper from U. of Toronto, which gives preliminary results on using deep learning to interpret mamograms and chess radiograph reports. The study, as titled, is preliminary, because it is still at the stage of comparing against good old AI techniques such as random forrest and SVM. But then there are several interesting architectural choices in the work which worth your time to take a look. e.g. the use of bi-CNN instead of just CNN.
In this paper by the MILA group, with Yoshua Bengio as the last author, proposed a rather intriguing idea to evaluate dialogue system. First off, some context, usually a dialogue system, deep learning or not, is evaluated using evaluation techniques for statistical machine translation (SMT) such as BLEU or ROUGE. In a nutshell, both of these techniques require human references and correctness of the response will cross-check with the reference. So if there are more words from the reference in the response, generally you get a higher score.
But then we know that dialogue system is not exactly machine translation. Aren't there many ways just to come up with the same response in a dialogue? Like "Great", "Good", "Fine" pretty much are responses for the question "How are you?" But what if the references only have just "Great" and "Good"? That's the problem of what the authors called "word overlap" metric. Indeed if the word doesn't appear in the reference, even if your response make sense, you can't get high score.
So instead of doing a whole word comparison, the author think "can't we just compare in the embedded word space"? That's the idea of word embedding. And the intriguing part about the paper is that it posed dialogue evaluation as measuring distance in this embedded word space. The authors use HRED, which they found a good representation of the reference dialogue.
That results in a rather powerful method. Not only it shows a high correlation with human scores. Also, new response can be more easily evaluated because the comparison happens in the semantic space.
We think this is a highly interesting paper of the week. So check it out!