no code implementations • 29 May 2024 • Naoki Nishikawa, Taiji Suzuki
Deep neural networks based on state space models (SSMs) are attracting much attention in sequence modeling since their computational cost is significantly smaller than that of Transformers.
1 code implementation • 10 May 2024 • Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size.
no code implementations • 30 Apr 2024 • Toshimitsu Uesaka, Taiji Suzuki, Yuhta Takida, Chieh-Hsin Lai, Naoki Murata, Yuki Mitsufuji
Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications.
no code implementations • 26 Mar 2024 • Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation.
no code implementations • 22 Mar 2024 • Shokichi Takakura, Taiji Suzuki
In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime through the lens of kernel methods.
no code implementations • 8 Feb 2024 • Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré
More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.
no code implementations • 2 Feb 2024 • Juno Kim, Taiji Suzuki
However, existing theoretical studies on how this phenomenon arises are limited to the dynamics of a single layer of attention trained on linear regression tasks.
no code implementations • 2 Dec 2023 • Juno Kim, Kakei Yamamoto, Kazusato Oko, Zhuoran Yang, Taiji Suzuki
In this paper, we extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
no code implementations • 15 Nov 2023 • Shuhei Nitta, Taiji Suzuki, Albert Rodríguez Mulet, Atsushi Yaguchi, Ryusuke Hirai
In this paper, we propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories.
1 code implementation • 1 Aug 2023 • Kishan Wimalawarne, Taiji Suzuki, Sophie Langer
Learning the Green's function using deep learning models enables to solve different classes of partial differential equations.
no code implementations • 24 Jun 2023 • Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki
Graph neural networks (GNNs) have pioneered advancements in graph representation learning, exhibiting superior feature learning and performance over multilayer perceptrons (MLPs) when handling graph inputs.
no code implementations • 12 Jun 2023 • Taiji Suzuki, Denny Wu, Atsushi Nitanda
Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.
no code implementations • 30 May 2023 • Shokichi Takakura, Taiji Suzuki
Despite the great success of Transformer networks in various applications such as natural language processing and computer vision, their theoretical aspects are not well understood.
no code implementations • 13 May 2023 • Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi
However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance.
no code implementations • 6 Mar 2023 • Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.
no code implementations • 3 Mar 2023 • Kazusato Oko, Shunta Akiyama, Taiji Suzuki
While efficient distribution learning is no doubt behind the groundbreaking success of diffusion modeling, its theoretical guarantees are quite limited.
no code implementations • 12 Feb 2023 • Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki
Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small.
no code implementations • 8 Feb 2023 • Tomoya Murata, Taiji Suzuki
In the previous work, the best known utility bound is $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ in terms of the squared full gradient norm, which is achieved by Differential Private Gradient Descent (DP-GD) as an instance, where $n$ is the sample size, $d$ is the problem dimensionality and $\varepsilon_\mathrm{DP}$ is the differential privacy parameter.
no code implementations • 12 Sep 2022 • Kishan Wimalawarne, Taiji Suzuki
Additionally, we propose adaptive learning between directly graph polynomial convolution models and learning directly from the adjacency matrix.
no code implementations • 1 Sep 2022 • Kazusato Oko, Shunta Akiyama, Tomoya Murata, Taiji Suzuki
While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation.
no code implementations • 30 May 2022 • Shunta Akiyama, Taiji Suzuki
While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established.
no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.
1 code implementation • 30 Mar 2022 • Yuri Kinoshita, Taiji Suzuki
The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications.
no code implementations • 19 Mar 2022 • Kanji Sato, Akiko Takeda, Reiichiro Kawai, Taiji Suzuki
Gradient Langevin dynamics and a variety of its variants have attracted increasing attention owing to their convergence towards the global optimal solution, initially in the unconstrained convex framework while recently even in convex constrained non-convex problems.
no code implementations • 12 Feb 2022 • Tomoya Murata, Taiji Suzuki
In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time.
no code implementations • 25 Jan 2022 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.
no code implementations • 29 Sep 2021 • Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato
Generalization measures are intensively studied in the machine learning community for better modeling generalization gaps.
no code implementations • ICLR 2022 • Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu
We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.
no code implementations • ICLR 2022 • Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang
Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.
no code implementations • 29 Sep 2021 • Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi
Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.
no code implementations • ICLR 2022 • Sho Okumoto, Taiji Suzuki
Although the approximation and estimation errors of neural networks are affected by the curse of dimensionality in the existing analyses for typical function spaces such as the \Holder and Besov spaces, we show that, by considering anisotropic smoothness, they can alleviate exponential dependency on the dimensionality but they only depend on the smoothness of the target functions.
1 code implementation • 25 Aug 2021 • Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi
Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.
no code implementations • 24 Aug 2021 • Kishan Wimalawarne, Taiji Suzuki
We investigate adaptive layer-wise graph convolution in deep GCN models.
no code implementations • 5 Aug 2021 • Chihiro Watanabe, Taiji Suzuki
However, it is limited to a two-mode reordering (i. e., the rows and columns are reordered separately) and it cannot be applied in the one-mode setting (i. e., the same node order is used for reordering both rows and columns), owing to the characteristics of its model architecture.
no code implementations • 11 Jun 2021 • Shunta Akiyama, Taiji Suzuki
Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • 26 Mar 2021 • Chihiro Watanabe, Taiji Suzuki
This denoised mean matrix can be used to visualize the global structure of the reordered observed matrix.
no code implementations • 23 Feb 2021 • Chihiro Watanabe, Taiji Suzuki
Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis.
no code implementations • 5 Feb 2021 • Tomoya Murata, Taiji Suzuki
Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • ICLR 2021 • Taiji Suzuki, Shunta Akiyama
Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature.
no code implementations • 23 Sep 2020 • Kazuma Tsuji, Taiji Suzuki
In this study, we focus on the adaptivity of deep learning; consequently, we treat the variable exponent Besov space, which has a different smoothness depending on the input location $x$.
no code implementations • 19 Sep 2020 • Kengo Machida, Kuniaki Uto, Koichi Shinoda, Taiji Suzuki
To overcome this problem, we propose a method called minimum stable rank DARTS (MSR-DARTS), for finding a model with the best generalization error by replacing architecture optimization with the selection process using the minimum stable rank criterion.
Ranked #24 on Neural Architecture Search on CIFAR-10
no code implementations • 30 Jul 2020 • Akira Nakagawa, Keizo Kato, Taiji Suzuki
According to the Rate-distortion theory, the optimal transform coding is achieved by using an orthonormal transform with PCA basis where the transform space is isometric to the input.
no code implementations • NeurIPS 2020 • Taiji Suzuki
Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.
no code implementations • ICLR 2021 • Atsushi Nitanda, Taiji Suzuki
In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.
no code implementations • 19 Jun 2020 • Tomoya Murata, Taiji Suzuki
In this paper, we study importance labeling problem, in which we are given many unlabeled data and select a limited number of data to be labeled from the unlabeled data, and then a learning algorithm is executed on the selected one.
no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.
1 code implementation • NeurIPS 2020 • Kenta Oono, Taiji Suzuki
By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions.
no code implementations • 27 May 2020 • Chihiro Watanabe, Taiji Suzuki
In this case, it becomes crucial to consider the selective bias in the block structure, that is, the block structure is selected from all the possible cluster memberships based on some criterion by the clustering algorithm.
no code implementations • ICLR 2020 • Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang
This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.
no code implementations • 4 Mar 2020 • Yusuke Hayashi, Taiji Suzuki
To address this challenge, we design a novel meta-regularization objective using {\it cyclical annealing schedule} and {\it maximum mean discrepancy} (MMD) criterion.
no code implementations • 29 Feb 2020 • Boris Muzellec, Kanji Sato, Mathurin Massias, Taiji Suzuki
In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite dimensional Hilbert space.
no code implementations • 14 Jan 2020 • Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang
Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on.
no code implementations • 26 Dec 2019 • Laurent Dillard, Yosuke Shinya, Taiji Suzuki
We also show that our method outperforms an existing compression method studied in the DA setting by a large margin for high compression rates.
no code implementations • 13 Nov 2019 • Shingo Yashima, Atsushi Nitanda, Taiji Suzuki
To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.
1 code implementation • 29 Oct 2019 • Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa
Compressing DNNs is important for the real-world applications operating on resource-constrained devices.
no code implementations • NeurIPS 2021 • Taiji Suzuki, Atsushi Nitanda
The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.
no code implementations • pproximateinference AABI Symposium 2019 • Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang
Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.
no code implementations • 25 Sep 2019 • Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa
Compressing deep neural networks (DNNs) is important for real-world applications operating on resource-constrained devices.
no code implementations • ICLR 2020 • Taiji Suzuki, Hiroshi Abe, Tomoaki Nishimura
However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network.
no code implementations • 9 Sep 2019 • Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki
Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum.
no code implementations • 26 Jun 2019 • Kosuke Haruki, Taiji Suzuki, Yohei Hamakawa, Takeshi Toda, Ryuji Sakai, Masahiro Ozawa, Mitsuhiro Kimura
Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep learning because of its training-time efficiency, however, extremely large-batch SGD leads to poor generalization and easily converges to sharp minima, which prevents naive large-scale data-parallel SGD (DP-SGD) from converging to good minima.
no code implementations • 10 Jun 2019 • Chihiro Watanabe, Taiji Suzuki
Latent block models are used for probabilistic biclustering, which is shown to be an effective method for analyzing various relational data sets.
no code implementations • 29 May 2019 • Tomoya Murata, Taiji Suzuki
Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD.
1 code implementation • ICLR 2020 • Kenta Oono, Taiji Suzuki
We show that when the Erd\H{o}s -- R\'{e}nyi graph is sufficiently dense and large, a broad range of GCNs on it suffers from the "information loss" in the limit of infinite layers with high probability.
no code implementations • 23 May 2019 • Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki
Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.
no code implementations • 22 May 2019 • Satoshi Hayakawa, Taiji Suzuki
Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as H\"{o}lder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice.
no code implementations • ICLR 2019 • Kenta Oono, Taiji Suzuki
We develop new approximation and statistical learning theories of convolutional neural networks (CNNs) via the ResNet-type structure where the channel size, filter size, and width are fixed.
no code implementations • 24 Mar 2019 • Kenta Oono, Taiji Suzuki
The key idea is that we can replicate the learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as long as the FNNs have \textit{block-sparse} structures.
no code implementations • 19 Dec 2018 • Atsushi Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa
In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation.
no code implementations • ICLR 2019 • Taiji Suzuki
In addition to this, it is shown that deep learning can avoid the curse of dimensionality if the target function is in a mixed smooth Besov space.
no code implementations • NeurIPS 2018 • Tomoya Murata, Taiji Suzuki
We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times.
no code implementations • 26 Aug 2018 • Taiji Suzuki, Hiroshi Abe, Tomoya Murata, Shingo Horiuchi, Kotaro Ito, Tokuma Wachi, So Hirai, Masatoshi Yukishima, Tomoaki Nishimura
The concept of model compression is also important for analyzing the generalization error of deep learning, known as the compression-based error bound.
no code implementations • 14 Jun 2018 • Atsushi Nitanda, Taiji Suzuki
In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.
no code implementations • 8 Mar 2018 • Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami, Taiji Suzuki
The behavior of users in certain services could be a clue that can be used to infer their preferences and may be used to make recommendations for other services they have never used.
no code implementations • ICML 2018 • Atsushi Nitanda, Taiji Suzuki
Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.
no code implementations • 7 Jan 2018 • Atsushi Nitanda, Taiji Suzuki
In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.
no code implementations • 14 Dec 2017 • Atsushi Nitanda, Taiji Suzuki
The superior performance of ensemble methods with infinite models are well known.
no code implementations • 6 Nov 2017 • Masaaki Takada, Taiji Suzuki, Hironori Fujisawa
However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features.
no code implementations • 29 May 2017 • Taiji Suzuki
Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one.
1 code implementation • NeurIPS 2017 • Song Liu, Akiko Takeda, Taiji Suzuki, Kenji Fukumizu
Density ratio estimation is a vital tool in both machine learning and statistical community.
no code implementations • NeurIPS 2017 • Tomoya Murata, Taiji Suzuki
In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings.
no code implementations • 6 Jan 2017 • Song Liu, Kenji Fukumizu, Taiji Suzuki
Recent years have seen an increasing popularity of learning the sparse \emph{changes} in Markov Networks.
no code implementations • NeurIPS 2016 • Taiji Suzuki, Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami
We investigate the statistical performance and computational efficiency of the alternating minimization procedure for nonparametric tensor learning.
no code implementations • 8 Mar 2016 • Tomoya Murata, Taiji Suzuki
We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning.
no code implementations • 2 Apr 2015 • Song Liu, Taiji Suzuki, Masashi Sugiyama, Kenji Fukumizu
We learn the structure of a Markov Network between two groups of random variables from joint observations.
no code implementations • 13 Aug 2014 • Taiji Suzuki
In this paper, we investigate the statistical convergence rate of a Bayesian low-rank tensor estimator.
no code implementations • 7 Jul 2014 • Ryota Tomioka, Taiji Suzuki
We show that the spectral norm of a random $n_1\times n_2\times \cdots \times n_K$ tensor (or higher-order array) scales as $O\left(\sqrt{(\sum_{k=1}^{K}n_k)\log(K)}\right)$ under some sub-Gaussian assumption on the entries.
no code implementations • 2 Jul 2014 • Song Liu, Taiji Suzuki, Raissa Relator, Jun Sese, Masashi Sugiyama, Kenji Fukumizu
We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$.
no code implementations • 4 Nov 2013 • Taiji Suzuki
We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems.
no code implementations • 25 Apr 2013 • Song Liu, John A. Quinn, Michael U. Gutmann, Taiji Suzuki, Masashi Sugiyama
We propose a new method for detecting changes in Markov network structure between two sets of samples.
no code implementations • NeurIPS 2013 • Ryota Tomioka, Taiji Suzuki
We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity.
no code implementations • NeurIPS 2012 • Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi
A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference.
no code implementations • 2 Mar 2012 • Taiji Suzuki, Masashi Sugiyama
If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.
no code implementations • NeurIPS 2011 • Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama
Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.
no code implementations • NeurIPS 2011 • Taiji Suzuki
Finally, we show that, when the complexities of candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type regularization shows better learning rate compared with sparse ℓ1 regularization.
no code implementations • NeurIPS 2011 • Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm.
1 code implementation • 15 Dec 2009 • Takafumi Kanamori, Taiji Suzuki, Masashi Sugiyama
We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties.