Key takeaways from ICML 2019

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Continuing our article series on the 2019 International Conference on Machine Learning in June, I’ll dive into my thoughts on the presentation titled Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask by Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski. This was one of the most interesting and probably the best-presented talks of the week in my opinion as the Uber AI team shared its research on the Lottery Ticket phenomena.

The Lottery Ticket hypothesis is this: “A randomly-initialized dense neural network contains a subnetwork that is initialized such that, when trained in isolation, it can match the test accuracy of the original network after training for at most the same number of iterations.” 

Testing the Lottery Ticket algorithm goes like this: randomly initialize a full dense network, train it to convergence, prune weights that have smallest final magnitude, rewind the remaining weights to their initial values, and repeat. It was shown that this method of pruning the “low-impact” weights actually gives better results. 

The team tested and experimented with many aspects of this theory and got interesting results. The experiments resulted in three key findings: 

1) They pruned the unwanted weights by setting them equal to zero and they found that this was the best method for pruning (as opposed to setting them to the smaller initial values).  

2) They found better results when they didn’t prune the weights with the smallest final magnitude but rather pruned the weights where there was not a large change to the magnitude. 

3) It is not necessary to preserve the exact original initialization of the weights (but the sign does matter). 

It would be interesting to test this out for some of our own neural networks (e.g., CPM predictor).  In other words, by pruning the nodes from the CPM predictor’s neural network, we may be able to make better predictions of CPM.