Search Results for author: Wen-mei Hwu

Found 48 papers, 23 papers with code

xER: An Explainable Model for Entity Resolution using an Efficient Solution for the Clique Partitioning Problem

no code implementations • NAACL (TrustNLP) 2021 • Samhita Vadrevu, Rakesh Nagi, JinJun Xiong, Wen-mei Hwu

In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method.

Entity Resolution graph partitioning

Paper
Add Code

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level

1 code implementation • 7 Mar 2024 • Ali Hassani, Wen-mei Hwu, Humphrey Shi

We observe that our fused kernels successfully circumvent some of the unavoidable inefficiencies in unfused implementations.

287

Paper
Code

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

1 code implementation • 28 Jun 2023 • Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-mei Hwu

To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy.

Graph Sampling

Paper
Code

IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research

1 code implementation • 27 Feb 2023 • Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-mei Hwu

Most existing public datasets for GNNs are relatively small, which limits the ability of GNNs to generalize to unseen data.

Node Classification

Paper
Code

Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures

no code implementations • 16 Jan 2023 • Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-mei Hwu

Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs.

8k C++ code +1

Paper
Add Code

Submission-Aware Reviewer Profiling for Reviewer Recommender System

no code implementations • 8 Nov 2022 • Omer Anjum, Alok Kamatar, Toby Liang, JinJun Xiong, Wen-mei Hwu

We propose an approach that learns from each abstract published by a potential reviewer the topics studied and the explicit context in which the reviewer studied the topics.

Recommendation Systems

Paper
Add Code

Can Language Models Be Specific? How?

1 code implementation • 11 Oct 2022 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

We hope this work can bring to awareness the notion of specificity of language models and encourage the research community to further explore this important but understudied problem.

Language Modelling Specificity

Paper
Code

DEER: Descriptive Knowledge Graph for Explaining Entity Relationships

1 code implementation • 21 May 2022 • Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

Experiments demonstrate that our system can extract and generate high-quality relation descriptions for explaining entity relationships.

BIG-bench Machine Learning Descriptive +4

Paper
Code

Understanding Jargon: Combining Extraction and Generation for Definition Modeling

1 code implementation • 14 Nov 2021 • Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

From the composition of this phrase, machines may guess twin prime is a certain kind of prime, but it is still difficult to deduce exactly what twin stands for without additional knowledge.

Text Generation

Paper
Code

Graph Neural Network Training with Data Tiering

no code implementations • 10 Nov 2021 • Seung Won Min, Kun Wu, Mert Hidayetoğlu, JinJun Xiong, Xiang Song, Wen-mei Hwu

With our data tiering method, we additionally provide a new data placement and access strategy to further minimize the CPU-GPU communication overhead.

Fraud Detection

Paper
Add Code

MLHarness: A Scalable Benchmarking System for MLCommons

no code implementations • 9 Nov 2021 • Yen-Hsiang Chang, Jianhao Pu, Wen-mei Hwu, JinJun Xiong

With the society's growing adoption of machine learning (ML) and deep learning (DL) for various intelligent solutions, it becomes increasingly imperative to standardize a common set of measures for ML/DL models with large scale open datasets under common development practices and resources so that people can benchmark and compare models quality and performance on a common ground.

Benchmarking

Paper
Add Code

Open Relation Modeling: Learning to Define Relations between Entities

1 code implementation • Findings (ACL) 2022 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

Relations between entities can be represented by different instances, e. g., a sentence containing both entities or a fact in a Knowledge Graph (KG).

Open Relation Modeling Relation +1

Paper
Code

Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach

1 code implementation • ACL 2021 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu

To support a fine-grained domain without relying on a matching corpus for supervision, we develop hierarchical core-fringe learning, which learns core and fringe terms jointly in a semi-supervised manner contextualized in the hierarchy of the domain.

Paper
Code

Pseudo-IoU: Improving Label Assignment in Anchor-Free Object Detection

1 code implementation • 29 Apr 2021 • Jiachen Li, Bowen Cheng, Rogerio Feris, JinJun Xiong, Thomas S. Huang, Wen-mei Hwu, Humphrey Shi

Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric.

Object object-detection +1

Paper
Code

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

1 code implementation • 4 Mar 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.

Recommendation Systems

Paper
Code

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

1 code implementation • 20 Jan 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.

Paper
Code

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation • 1 Jan 2021 • Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Paper
Code

TEMPI: An Interposed MPI Library with a Canonical Representation of CUDA-aware Datatypes

1 code implementation • 28 Dec 2020 • Carl Pearson, Kun Wu, I-Hsin Chung, JinJun Xiong, Wen-mei Hwu

MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications.

Distributed, Parallel, and Cluster Computing

Paper
Code

Interpretable Visual Reasoning via Induced Symbolic Space

1 code implementation • ICCV 2021 • Zhonghao Wang, Kai Wang, Mo Yu, JinJun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi

Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space.

Ranked #3 on Visual Question Answering (VQA) on CLEVR

Visual Question Answering (VQA) Visual Reasoning

Paper
Code

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations • 14 Oct 2020 • Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

Paper
Add Code

Exploring Semantic Capacity of Terms

1 code implementation • EMNLP 2020 • Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-mei Hwu, JinJun Xiong

We introduce and study semantic capacity of terms.

regression

Paper
Code

At-Scale Sparse Deep Neural Network Inference with Efficient GPU Implementation

1 code implementation • 28 Jul 2020 • Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, JinJun Xiong, Rakesh Nagi, Wen-mei Hwu

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.

Paper
Code

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations • 6 May 2020 • Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

Paper
Add Code

Alleviating Semantic-level Shift: A Semi-supervised Domain Adaptation Method for Semantic Segmentation

no code implementations • 2 Apr 2020 • Zhonghao Wang, Yunchao Wei, Rogerior Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

A key challenge of this task is how to alleviate the data distribution discrepancy between the source and target domains, i. e. reducing domain shift.

Domain Adaptation Semantic Segmentation +1

Paper
Add Code

Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

1 code implementation • CVPR 2020 • Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerio Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work.

Ranked #8 on Semantic Segmentation on DensePASS

Semantic Segmentation Unsupervised Domain Adaptation

Paper
Code

DLSpec: A Deep Learning Task Exchange Specification

no code implementations • 26 Feb 2020 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Deep Learning (DL) innovations are being introduced at a rapid pace.

Paper
Add Code

MLModelScope: A Distributed Platform for Model Evaluation and Benchmarking at Scale

no code implementations • 19 Feb 2020 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.

Benchmarking

Paper
Add Code

The Design and Implementation of a Scalable DL Benchmarking Platform

no code implementations • 19 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios.

Benchmarking

Paper
Add Code

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)

no code implementations • 18 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e. g. within $95\%$ accuracy and up to $4. 4\times$ benchmarking time speedup on Amazon EC2 c5. xlarge).

Benchmarking Image Classification +3

Paper
Add Code

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations • 18 Nov 2019 • Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

Paper
Add Code

Benanza: Automatic $μ$Benchmark Generation to Compute "Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs

no code implementations • 16 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities.

Benchmarking

Paper
Add Code

MLModelScope: A Distributed Platform for ML Model Evaluation and Benchmarking at Scale

no code implementations • 25 Sep 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.

Benchmarking

Paper
Add Code

PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space

no code implementations • IJCNLP 2019 • Omer Anjum, Hongyu Gong, Suma Bhat, Wen-mei Hwu, JinJun Xiong

Finding the right reviewers to assess the quality of conference submissions is a time consuming process for conference organizers.

Topic Models

Paper
Add Code

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems

2 code implementations • 20 Sep 2019 • Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Object detection and tracking are challenging tasks for resource-constrained embedded systems.

Efficient Neural Network Object +3

231

Paper
Code

SPGNet: Semantic Prediction Guidance for Scene Parsing

no code implementations • ICCV 2019 • Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, JinJun Xiong, Thomas Huang, Wen-mei Hwu, Honghui Shi

The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.

Decoder Pose Estimation +3

Paper
Add Code

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

no code implementations • 19 Aug 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu

Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack).

BIG-bench Machine Learning

Paper
Add Code

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation • 25 Jun 2019 • Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

object-detection Object Detection

231

Paper
Code

A Retrospective Recount of Computer Architecture Research with a Data-Driven Study of Over Four Decades of ISCA Publications

no code implementations • 22 Jun 2019 • Omer Anjum, Wen-mei Hwu, JinJun Xiong

Recently we decided to conduct a more thorough study based on all past papers of International Symposium on Computer Architecture (ISCA) from 1973 to 2018, which resulted this article.

document understanding Natural Language Understanding

Paper
Add Code

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations • 20 May 2019 • Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

object-detection Object Detection

231

Paper
Code

Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking

no code implementations • 29 Apr 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu

An increasingly complex and diverse collection of Machine Learning (ML) models as well as hardware/software stacks, collectively referred to as "ML artifacts", are being proposed - leading to a diverse landscape of ML.

Benchmarking BIG-bench Machine Learning

Paper
Add Code

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations • 9 Apr 2019 • Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

C++ code object-detection +1

231

Paper
Code

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus

no code implementations • NAACL 2019 • Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, Wen-mei Hwu

Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style.

Decoder reinforcement-learning +4

Paper
Add Code

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

no code implementations • 29 Jan 2019 • Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei Hwu, John Paul Strachan, Kaushik Roy, Dejan S Milojicic

We also present the PUMA compiler which translates high-level code to PUMA ISA.

Emerging Technologies Hardware Architecture

Paper
Add Code

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

no code implementations • 24 Nov 2018 • Abdul Dakkak, Cheng Li, Simon Garcia de Gonzalo, JinJun Xiong, Wen-mei Hwu

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines.

Distributed, Parallel, and Cluster Computing

Paper
Add Code

Frustrated with Replicating Claims of a Shared Model? A Solution

no code implementations • 24 Nov 2018 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that model owners and evaluators are hard-pressed analyzing and studying them.

Paper
Add Code

A Simple Non-i.i.d. Sampling Approach for Efficient Training and Better Generalization

no code implementations • 23 Nov 2018 • Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi

While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.

General Classification Image Classification +6