1 code implementation • ICLR 2022 • Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, Ting Liu
Instead, we define a \textit{surrogate gap}, a measure equivalent to the dominant eigenvalue of Hessian at a local minimum when the radius of the neighborhood (to derive the perturbed loss) is small.
2 code implementations • NeurIPS 2021 • Juntang Zhuang, Yifan Ding, Tommy Tang, Nicha Dvornek, Sekhar Tatikonda, James S. Duncan
We demonstrate that ACProp has a convergence rate of $O(\frac{1}{\sqrt{T}})$ for the stochastic non-convex case, which matches the oracle rate and outperforms the $O(\frac{logT}{\sqrt{T}})$ rate of RMSProp and Adam.
no code implementations • 14 Feb 2021 • Juntang Zhuang, Nicha Dvornek, Sekhar Tatikonda, Xenophon Papademetris, Pamela Ventola, James Duncan
Furthermore, MSA uses the adjoint method for accurate gradient estimation in the ODE; since the adjoint method is generic, MSA is a generic method for both linear and non-linear systems, and does not require re-derivation of the algorithm as in EM.
1 code implementation • ICLR 2021 • Juntang Zhuang, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan
Neural ordinary differential equations (Neural ODEs) are a new family of deep-learning models with continuous depth.
Ranked #19 on Image Generation on ImageNet 64x64 (Bits per dim metric)
no code implementations • NeurIPS Workshop DL-IG 2020 • Juntang Zhuang, Tommy Tang, Sekhar Tatikonda, Nicha C Dvornek, Yifan Ding, Xenophon Papademetris, James S Duncan
We propose AdaBelief optimizer to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability.
8 code implementations • NeurIPS 2020 • Juntang Zhuang, Tommy Tang, Yifan Ding, Sekhar Tatikonda, Nicha Dvornek, Xenophon Papademetris, James S. Duncan
Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step.
2 code implementations • ICML 2020 • Juntang Zhuang, Nicha Dvornek, Xiaoxiao Li, Sekhar Tatikonda, Xenophon Papademetris, James Duncan
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e. g. image classification) are significantly inferior to discrete-layer models.
no code implementations • 27 Aug 2018 • Javid Dadashkarimi, Alexander Fabbri, Sekhar Tatikonda, Dragomir R. Radev
In this paper we propose to use feature transfer in a zero-shot experimental setting on the task of semantic parsing.
no code implementations • 19 Jul 2018 • Javid Dadashkarimi, Sekhar Tatikonda
Generating logical form equivalents of human language is a fresh way to employ neural architectures where long short-term memory effectively captures dependencies in both encoder and decoder units.
no code implementations • 22 Nov 2016 • Patrick Rebeschini, Sekhar Tatikonda
This paper investigates the behavior of the Min-Sum message passing scheme to solve systems of linear equations in the Laplacian matrices of graphs and to compute electric flows.
no code implementations • 12 Feb 2016 • Patrick Rebeschini, Sekhar Tatikonda
We propose a notion of correlation in constrained optimization that is based on the sensitivity of the optimal solution upon perturbations of the constraints.
no code implementations • 12 Jul 2015 • Michael J. Kane, Bryan Lewis, Sekhar Tatikonda, Simon Urbanek
Linear regression models depend directly on the design matrix and its properties.
no code implementations • 7 Dec 2012 • Ramji Venkataramanan, Tuhin Sarkar, Sekhar Tatikonda
The proposed encoding algorithm sequentially chooses columns of the design matrix to successively approximate the source sequence.
no code implementations • 3 Feb 2012 • Ramji Venkataramanan, Antony Joseph, Sekhar Tatikonda
We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression.