no code implementations • 7 Jul 2023 • Marc Duquesnoy, Chaoyue Liu, Vishank Kumar, Elixabete Ayerbe, Alejandro A. Franco
This ML pipeline allows the inverse design of the process parameters to adopt in order to manufacture electrodes for energy or power applications.
no code implementations • 7 Jun 2023 • Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin
In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD).
no code implementations • 5 Jun 2023 • Chaoyue Liu, Amirhesam Abedsoltan, Mikhail Belkin
This behaviour is believed to be a result of neural networks learning the pattern of clean data first and fitting the noise later in the training, a phenomenon that we refer to as clean-priority learning.
no code implementations • 15 May 2023 • Chaoyue Liu, Like Hui
Comparing with linear neural networks, we show that a ReLU activated wide neural network at random initialization has a larger angle separation for similar data in the feature space of model gradient, and has a smaller condition number for NTK.
1 code implementation • 24 May 2022 • Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin
While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models.
no code implementations • 24 May 2022 • Libin Zhu, Chaoyue Liu, Mikhail Belkin
In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity.
no code implementations • ICLR 2022 • Chaoyue Liu, Libin Zhu, Mikhail Belkin
Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent.
no code implementations • 8 Dec 2021 • Chaoyue Liu, Yulai Zhang
Hyper-parameter optimization is a crucial problem in machine learning as it aims to achieve the state-of-the-art performance in any model.
no code implementations • NeurIPS 2020 • Chaoyue Liu, Libin Zhu, Mikhail Belkin
We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width.
no code implementations • 29 Feb 2020 • Chaoyue Liu, Libin Zhu, Mikhail Belkin
The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks.
1 code implementation • ICLR 2020 • Chaoyue Liu, Mikhail Belkin
This is in contrast to the classical results in the deterministic scenario, where the same step size ensures accelerated convergence of the Nesterov's method over optimal gradient descent.
no code implementations • 28 Feb 2018 • Chaoyue Liu, Mikhail Belkin
Analyses of accelerated (momentum-based) gradient descent usually assume bounded condition number to obtain exponential convergence rates.
1 code implementation • Elsevier 2017 • Xin Yang, Chaoyue Liu, Zhiwei Wang, Jun Yang, Hung Le Min, Liang Wang, Kwang-Ting (Tim) Cheng
Each network is trained using images of a single modality in a weakly-supervised manner by providing a set of prostate images with image-level labels indicating only the presence of PCa without priors of lesions’ locations.
no code implementations • NeurIPS 2016 • Chaoyue Liu, Mikhail Belkin
Clustering, in particular $k$-means clustering, is a central topic in data analysis.