Abstract:
1) Generalization in deep nets: the role of distance from initialization 2) Entropy-SG(L)D optimizes the prior of a (valid) PAC-Bayes bound 3) Large Batch Training of DNNs with Layer-wise Adaptive Rate Scaling
Live content is unavailable. Log in and register to view live content