no code implementations • 15 May 2023 • Chaoyue Liu, Like Hui
Comparing with linear neural networks, we show that a ReLU activated wide neural network at random initialization has a larger angle separation for similar data in the feature space of model gradient, and has a smaller condition number for NTK.
no code implementations • 8 Feb 2023 • Like Hui, Mikhail Belkin, Stephen Wright
We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.
no code implementations • 17 Feb 2022 • Like Hui, Mikhail Belkin, Preetum Nakkiran
We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property).
no code implementations • ICLR 2021 • Like Hui, Mikhail Belkin
We explore several major neural architectures and a range of standard benchmark datasets for NLP, automatic speech recognition (ASR) and computer vision tasks to show that these architectures, with the same hyper-parameter settings as reported in the literature, perform comparably or better when trained with the square loss, even after equalizing computational resources.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Nov 2018 • Like Hui, Siyuan Ma, Mikhail Belkin
We apply a fast kernel method for mask-based single-channel speech enhancement.