The new AI Index report for 2018 is just published last week. We are excited because you may think of it as one report which capture the state of AI development every year. Looking through the report, you may find the section on the growth of AI as a field and its public perception. You may also learn the trend of the state of the art of several fields such as computer vision and machine translation.
The 2017 report was criticized for focusing too much on North America development, but then this year you can see the report cover much of activities around the world. e.g. We learn that Tsinghua University has the highest increase of course enrollment in AI.
Notably missing in the report is automatic speech recognition (ASR). That perhaps has to do with the difficulty of searching for one golden benchmark for ASR. As you may know, ASR performance tremendously across different noise condition.
Anyway, we recommend you to read the report in details.
Last few years, we several works in deep learning training which use large batch size. The advantage of the approach is that you may easily spread the computation of a large batch across several machines.
But here is the problem, how can you decide what batch size to use, or... if a task is suitable to have a large batch size. OpenAI's work seems to give at least one answer to the problem. They found that a simple quantity, gradient noise scale correlate to the optimal batch size in a training task.
Reading the paper, there are many to unpack here. e.g. The authors favor the use of simplified version of the metric which just require calculation of a determinant of a matrix, and the trace of the covariance matrix. They found that such measure correlates with the batch size.
All-in-all, this is indeed an interesting finding because now you may have a guidance for tuning one important parameter, batch size, in your training. Perhaps the question in practice: can you just use a sample of your dataset to training set to measure gradient noise scale, and use it in a larger set? And more importantly, do we have other guidance on tuning other parameters? Those are interesting question to ask, and if we can solve them, may be we can truly call neural network training a science more than an art.
There are many posts on AIDL which ask about how to use a deep learning framework. But there are many of them are about experience in contributing deep learning engine. Singh's post is a notable exception.
Notable AI Blog Posts