no code implementations • 10 Dec 2023 • Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu
Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks.
no code implementations • 16 Jul 2022 • Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu
And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models.
no code implementations • 25 Nov 2019 • Lingfei Wu, Ian En-Hsu Yen, Siyu Huo, Liang Zhao, Kun Xu, Liang Ma, Shouling Ji, Charu Aggarwal
In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings through global alignments, (ii) maintain positive-definiteness of the kernel, without introducing a diagonal dominant kernel matrix, and (iii) have a training cost linear with respect to not only the length of the string but also the number of training string samples.
no code implementations • 25 Nov 2019 • Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang, Kun Xu, Liang Zhao, Xi Peng, Yinglong Xia, Charu Aggarwal
In particular, RGE is shown to achieve \emph{(quasi-)linear scalability} with respect to the number and the size of the graphs.
no code implementations • NeurIPS 2018 • Ian En-Hsu Yen, Wei-Cheng Lee, Kai Zhong, Sung-En Chang, Pradeep K. Ravikumar, Shou-De Lin
We consider a generalization of mixed regression where the response is an additive combination of several mixture components.
no code implementations • NIPS 2018 2018 • Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Liang Zhao, Yinglong Xia, Michael Witbrock
Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse applications.
1 code implementation • 14 Sep 2018 • Lingfei Wu, Ian En-Hsu Yen, Jin-Feng Yi, Fangli Xu, Qi Lei, Michael Witbrock
The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series.
no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.
1 code implementation • 25 May 2018 • Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, Charu Aggarwal
Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.
Ranked #5 on Image/Document Clustering on pendigits
no code implementations • 14 Feb 2018 • Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikumar, Michael Witbrock
For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation.
no code implementations • ICML 2017 • Ian En-Hsu Yen, Wei-Cheng Lee, Sung-En Chang, Arun Sai Suggala, Shou-De Lin, Pradeep Ravikumar
The latent feature model (LFM), proposed in (Griffiths \& Ghahramani, 2005), but possibly with earlier origins, is a generalization of a mixture model, where each instance is generated not from a single latent class but from a combination of latent features.
no code implementations • ICML 2017 • Qi Lei, Ian En-Hsu Yen, Chao-yuan Wu, Inderjit S. Dhillon, Pradeep Ravikumar
We consider the popular problem of sparse empirical risk minimization with linear predictors and a large number of both features and observations.
no code implementations • NeurIPS 2016 • Ian En-Hsu Yen, Xiangru Huang, Kai Zhong, Ruohan Zhang, Pradeep K. Ravikumar, Inderjit S. Dhillon
In this work, we show that, by decomposing training of Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace expensive structured oracle with Factorwise Maximization Oracle (FMO) that allows efficient implementation of complexity sublinear to the factor domain.
1 code implementation • ICML 2016 • Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, Inderjit S. Dhillon
In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor.
no code implementations • NeurIPS 2015 • Ian En-Hsu Yen, Kai Zhong, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon
Over the past decades, Linear Programming (LP) has been widely used in different areas and considered as one of the mature technologies in numerical optimization.
no code implementations • NeurIPS 2015 • Ian En-Hsu Yen, Shan-Wei Lin, Shou-De Lin
In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limited-memory setting, where a dual block-coordinate descent (dual-BCD) method was used to balance cost spent on I/O and computation.
no code implementations • NeurIPS 2014 • Ian En-Hsu Yen, Ting-Wei Lin, Shou-De Lin, Pradeep K. Ravikumar, Inderjit S. Dhillon
In this paper, we propose a Sparse Random Feature algorithm, which learns a sparse non-linear predictor by minimizing an $\ell_1$-regularized objective function over the Hilbert Space induced from kernel function.
no code implementations • NeurIPS 2014 • Ian En-Hsu Yen, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon
State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs.