2 code implementations • 6 Feb 2024 • Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention.
no code implementations • 29 Oct 2022 • Zheyang Xiong, Fangshuo Liao, Anastasios Kyrillidis
The strong Lottery Ticket Hypothesis (LTH) claims the existence of a subnetwork in a sufficiently large, randomly initialized neural network that approximates some target neural network without the need of training.
2 code implementations • 11 Feb 2021 • Jianjin Xu, Zheyang Xiong, Xiaolin Hu
To ensure temporal inconsistency between the frames of the stylized video, a common approach is to estimate the optic flow of the pixels in the original video and make the generated pixels match the estimated optical flow.