Search Results for author: Ian En-Hsu Yen

Found 18 papers, 3 papers with code

FP8-BERT: Post-Training Quantization for Transformer

no code implementations • 10 Dec 2023 • Jianwei Li, Tianchi Zhang, Ian En-Hsu Yen, Dongkuan Xu

Transformer-based models, such as BERT, have been widely applied in a wide range of natural language processing tasks.

Paper
Add Code

S4: a High-sparsity, High-performance AI Accelerator

no code implementations • 16 Jul 2022 • Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu

And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models.

Quantization Vocal Bursts Intensity Prediction

Paper
Add Code

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

no code implementations • 25 Nov 2019 • Lingfei Wu, Ian En-Hsu Yen, Siyu Huo, Liang Zhao, Kun Xu, Liang Ma, Shouling Ji, Charu Aggarwal

In this paper, we present a new class of global string kernels that aims to (i) discover global properties hidden in the strings through global alignments, (ii) maintain positive-definiteness of the kernel, without introducing a diagonal dominant kernel matrix, and (iii) have a training cost linear with respect to not only the length of the string but also the number of training string samples.

Paper
Add Code

Scalable Global Alignment Graph Kernel Using Random Features: From Node Embedding to Graph Embedding

no code implementations • 25 Nov 2019 • Lingfei Wu, Ian En-Hsu Yen, Zhen Zhang, Kun Xu, Liang Zhao, Xi Peng, Yinglong Xia, Charu Aggarwal

In particular, RGE is shown to achieve \emph{(quasi-)linear scalability} with respect to the number and the size of the graphs.

Graph Classification Graph Embedding

Paper
Add Code

MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization

no code implementations • NeurIPS 2018 • Ian En-Hsu Yen, Wei-Cheng Lee, Kai Zhong, Sung-En Chang, Pradeep K. Ravikumar, Shou-De Lin

We consider a generalization of mixed regression where the response is an additive combination of several mixture components.

regression

Paper
Add Code

From Node Embedding to Graph Embedding: Scalable Global Graph Kernel via Random Features

no code implementations • NIPS 2018 2018 • Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Liang Zhao, Yinglong Xia, Michael Witbrock

Graph kernels are one of the most important methods for graph data analysis and have been successfully applied in diverse applications.

Graph Embedding

Paper
Add Code

Random Warping Series: A Random Features Method for Time-Series Embedding

1 code implementation • 14 Sep 2018 • Lingfei Wu, Ian En-Hsu Yen, Jin-Feng Yi, Fangli Xu, Qi Lei, Michael Witbrock

The proposed kernel does not suffer from the issue of diagonal dominance while naturally enjoys a \emph{Random Features} (RF) approximation, which reduces the computational complexity of existing DTW-based techniques from quadratic to linear in terms of both the number and the length of time-series.

Clustering Dynamic Time Warping +2

Paper
Code

Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Paper
Add Code

Scalable Spectral Clustering Using Random Binning Features

1 code implementation • 25 May 2018 • Lingfei Wu, Pin-Yu Chen, Ian En-Hsu Yen, Fangli Xu, Yinglong Xia, Charu Aggarwal

Moreover, our method exhibits linear scalability in both the number of data samples and the number of RB features.

Ranked #5 on Image/Document Clustering on pendigits

Clustering graph construction +2

Paper
Code

D2KE: From Distance to Kernel and Embedding

no code implementations • 14 Feb 2018 • Lingfei Wu, Ian En-Hsu Yen, Fangli Xu, Pradeep Ravikumar, Michael Witbrock

For many machine learning problem settings, particularly with structured inputs such as sequences or sets of objects, a distance measure between inputs can be specified more naturally than a feature representation.

Time Series Analysis

Paper
Add Code

Latent Feature Lasso

no code implementations • ICML 2017 • Ian En-Hsu Yen, Wei-Cheng Lee, Sung-En Chang, Arun Sai Suggala, Shou-De Lin, Pradeep Ravikumar

The latent feature model (LFM), proposed in (Griffiths \& Ghahramani, 2005), but possibly with earlier origins, is a generalization of a mixture model, where each instance is generated not from a single latent class but from a combination of latent features.

Paper
Add Code

Doubly Greedy Primal-Dual Coordinate Descent for Sparse Empirical Risk Minimization

no code implementations • ICML 2017 • Qi Lei, Ian En-Hsu Yen, Chao-yuan Wu, Inderjit S. Dhillon, Pradeep Ravikumar

We consider the popular problem of sparse empirical risk minimization with linear predictors and a large number of both features and observations.

Paper
Add Code

Dual Decomposed Learning with Factorwise Oracle for Structural SVM of Large Output Domain

no code implementations • NeurIPS 2016 • Ian En-Hsu Yen, Xiangru Huang, Kai Zhong, Ruohan Zhang, Pradeep K. Ravikumar, Inderjit S. Dhillon

In this work, we show that, by decomposing training of Structural Support Vector Machine (SVM) into a series of multiclass SVM problems connected through messages, one can replace expensive structured oracle with Factorwise Maximization Oracle (FMO) that allows efficient implementation of complexity sublinear to the factor domain.

Paper
Add Code

PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

1 code implementation • ICML 2016 • Ian En-Hsu Yen, Xiangru Huang, Pradeep Ravikumar, Kai Zhong, Inderjit S. Dhillon

In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor.

General Classification Text Classification

Paper
Code

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

no code implementations • NeurIPS 2015 • Ian En-Hsu Yen, Kai Zhong, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon

Over the past decades, Linear Programming (LP) has been widely used in different areas and considered as one of the mature technologies in numerical optimization.

Paper
Add Code

A Dual Augmented Block Minimization Framework for Learning with Limited Memory

no code implementations • NeurIPS 2015 • Ian En-Hsu Yen, Shan-Wei Lin, Shou-De Lin

In past few years, several techniques have been proposed for training of linear Support Vector Machine (SVM) in limited-memory setting, where a dual block-coordinate descent (dual-BCD) method was used to balance cost spent on I/O and computation.

Paper
Add Code

Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space

no code implementations • NeurIPS 2014 • Ian En-Hsu Yen, Ting-Wei Lin, Shou-De Lin, Pradeep K. Ravikumar, Inderjit S. Dhillon

In this paper, we propose a Sparse Random Feature algorithm, which learns a sparse non-linear predictor by minimizing an $\ell_1$-regularized objective function over the Hilbert Space induced from kernel function.

Paper
Add Code

Constant Nullspace Strong Convexity and Fast Convergence of Proximal Methods under High-Dimensional Settings

no code implementations • NeurIPS 2014 • Ian En-Hsu Yen, Cho-Jui Hsieh, Pradeep K. Ravikumar, Inderjit S. Dhillon

State of the art statistical estimators for high-dimensional problems take the form of regularized, and hence non-smooth, convex programs.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.