no code implementations • 7 Apr 2024 • Yuqing Li, Tao Luo, Qixuan Zhou
While NTK typically assumes that $\lim_{m\to\infty}\frac{\log \kappa}{\log m}=\frac{1}{2}$, and imposes each weight parameters to scale by the factor $\frac{1}{\sqrt{m}}$, in our theta-lazy regime, we discard the factor and relax the conditions to $\lim_{m\to\infty}\frac{\log \kappa}{\log m}>0$.
1 code implementation • 15 Mar 2024 • Binbin Li, Yuqing Li, Siyu Jia, Bingnan Ma, Yu Ding, Zisen Qi, Xingbang Tan, Menghan Guo, Shenghui Liu
This necessitates a dual focus on both the syntactic information of individual utterances and the semantic interaction among them.
1 code implementation • 11 Mar 2024 • Yuting Wei, Yuanxing Xu, Xinru Wei, Simin Yang, Yangfu Zhu, Yuqing Li, Di Liu, Bin Wu
Given the importance of ancient Chinese in capturing the essence of rich historical and cultural heritage, the rapid advancements in Large Language Models (LLMs) necessitate benchmarks that can effectively evaluate their understanding of ancient contexts.
1 code implementation • 27 Sep 2023 • Yuqing Li, Wenyuan Zhang, Binbin Li, Siyu Jia, Zisen Qi, Xingbang Tan
Conversational aspect-based sentiment quadruple analysis (DiaASQ) aims to extract the quadruple of target-aspect-opinion-sentiment within a dialogue.
no code implementations • 25 May 2023 • Zhongwang Zhang, Yuqing Li, Tao Luo, Zhi-Qin John Xu
In order to investigate the underlying mechanism by which dropout facilitates the identification of flatter minima, we study the noise structure of the derived stochastic modified equation for dropout.
no code implementations • 17 May 2023 • Zhangchen Zhou, Hanxu Zhou, Yuqing Li, Zhi-Qin John Xu
Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training.
no code implementations • 12 Mar 2023 • Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu
The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research.
no code implementations • 30 Nov 2021 • Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu
We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i. e., loss landscape of an NN contains all critical points of all the narrower NNs.
no code implementations • 30 Mar 2021 • Yuqing Li, Tao Luo, Chao Ma
In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet) and densely connected networks (DenseNet).
no code implementations • 7 Jul 2020 • Yuqing Li, Tao Luo, Nung Kwan Yip
Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function.