Search Results for author: Zhangchen Zhou

Found 5 papers, 0 papers with code

A rationale from frequency perspective for grokking in training neural network

no code implementations • 24 May 2024 • Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu

Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training.

Paper
Add Code

Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

no code implementations • 24 May 2024 • Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu

Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving.

Paper
Add Code

Anchor function: a type of benchmark functions for studying language models

no code implementations • 16 Jan 2024 • Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu

However, language model research faces significant challenges, especially for academic research groups with constrained resources.

Language Modelling

Paper
Add Code

Understanding the Initial Condensation of Convolutional Neural Networks

no code implementations • 17 May 2023 • Zhangchen Zhou, Hanxu Zhou, Yuqing Li, Zhi-Qin John Xu

Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training.

Paper
Add Code

Phase Diagram of Initial Condensation for Two-layer Neural Networks

no code implementations • 12 Mar 2023 • Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu

The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research.

Vocal Bursts Valence Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.