no code implementations • NAACL (TrustNLP) 2021 • Samhita Vadrevu, Rakesh Nagi, JinJun Xiong, Wen-mei Hwu
In this paper, we use Clique Partition- ing Problem (CPP), which is an Integer Pro- gram (IP) to formulate ER as a graph partition- ing problem and then highlight the explainable nature of this method.
1 code implementation • 7 Mar 2024 • Ali Hassani, Wen-mei Hwu, Humphrey Shi
We observe that our fused kernels successfully circumvent some of the unavoidable inefficiencies in unfused implementations.
1 code implementation • 28 Jun 2023 • Jeongmin Brian Park, Vikram Sharma Mailthody, Zaid Qureshi, Wen-mei Hwu
To address these issues, we propose the GPU Initiated Direct Storage Access (GIDS) dataloader, to enable GPU-oriented GNN training for large-scale graphs while efficiently utilizing all hardware resources, such as CPU memory, storage, and GPU memory with a hybrid data placement strategy.
1 code implementation • 27 Feb 2023 • Arpandeep Khatua, Vikram Sharma Mailthody, Bhagyashree Taleka, Tengfei Ma, Xiang Song, Wen-mei Hwu
Most existing public datasets for GNNs are relatively small, which limits the ability of GNNs to generalize to unseen data.
no code implementations • 16 Jan 2023 • Kun Wu, Mert Hidayetoğlu, Xiang Song, Sitao Huang, Da Zheng, Israt Nisa, Wen-mei Hwu
Relational graph neural networks (RGNNs) are graph neural networks with dedicated structures for modeling the different types of nodes and edges in heterogeneous graphs.
no code implementations • 8 Nov 2022 • Omer Anjum, Alok Kamatar, Toby Liang, JinJun Xiong, Wen-mei Hwu
We propose an approach that learns from each abstract published by a potential reviewer the topics studied and the explicit context in which the reviewer studied the topics.
1 code implementation • 11 Oct 2022 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu
We hope this work can bring to awareness the notion of specificity of language models and encourage the research community to further explore this important but understudied problem.
1 code implementation • 21 May 2022 • Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu
Experiments demonstrate that our system can extract and generate high-quality relation descriptions for explaining entity relationships.
1 code implementation • 14 Nov 2021 • Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu
From the composition of this phrase, machines may guess twin prime is a certain kind of prime, but it is still difficult to deduce exactly what twin stands for without additional knowledge.
no code implementations • 10 Nov 2021 • Seung Won Min, Kun Wu, Mert Hidayetoğlu, JinJun Xiong, Xiang Song, Wen-mei Hwu
With our data tiering method, we additionally provide a new data placement and access strategy to further minimize the CPU-GPU communication overhead.
no code implementations • 9 Nov 2021 • Yen-Hsiang Chang, Jianhao Pu, Wen-mei Hwu, JinJun Xiong
With the society's growing adoption of machine learning (ML) and deep learning (DL) for various intelligent solutions, it becomes increasingly imperative to standardize a common set of measures for ML/DL models with large scale open datasets under common development practices and resources so that people can benchmark and compare models quality and performance on a common ground.
1 code implementation • Findings (ACL) 2022 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu
Relations between entities can be represented by different instances, e. g., a sentence containing both entities or a fact in a Knowledge Graph (KG).
1 code implementation • ACL 2021 • Jie Huang, Kevin Chen-Chuan Chang, JinJun Xiong, Wen-mei Hwu
To support a fine-grained domain without relying on a matching corpus for supervision, we develop hierarchical core-fringe learning, which learns core and fringe terms jointly in a semi-supervised manner contextualized in the hierarchy of the domain.
1 code implementation • 29 Apr 2021 • Jiachen Li, Bowen Cheng, Rogerio Feris, JinJun Xiong, Thomas S. Huang, Wen-mei Hwu, Humphrey Shi
Current anchor-free object detectors are quite simple and effective yet lack accurate label assignment methods, which limits their potential in competing with classic anchor-based models that are supported by well-designed assignment methods based on the Intersection-over-Union~(IoU) metric.
1 code implementation • 4 Mar 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu
In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.
1 code implementation • 20 Jan 2021 • Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu
While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.
1 code implementation • 1 Jan 2021 • Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen
This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.
1 code implementation • 28 Dec 2020 • Carl Pearson, Kun Wu, I-Hsin Chung, JinJun Xiong, Wen-mei Hwu
MPI derived datatypes are an abstraction that simplifies handling of non-contiguous data in MPI applications.
Distributed, Parallel, and Cluster Computing
1 code implementation • ICCV 2021 • Zhonghao Wang, Kai Wang, Mo Yu, JinJun Xiong, Wen-mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi
Finally, we achieve a higher level of interpretability by imposing OCCAM on the objects represented in the induced symbolic concept space.
Ranked #3 on Visual Question Answering (VQA) on CLEVR
no code implementations • 14 Oct 2020 • Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen
High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.
1 code implementation • EMNLP 2020 • Jie Huang, Zilong Wang, Kevin Chen-Chuan Chang, Wen-mei Hwu, JinJun Xiong
We introduce and study semantic capacity of terms.
1 code implementation • 28 Jul 2020 • Mert Hidayetoglu, Carl Pearson, Vikram Sharma Mailthody, Eiman Ebrahimi, JinJun Xiong, Rakesh Nagi, Wen-mei Hwu
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.
no code implementations • 6 May 2020 • Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen
We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.
no code implementations • 2 Apr 2020 • Zhonghao Wang, Yunchao Wei, Rogerior Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi
A key challenge of this task is how to alleviate the data distribution discrepancy between the source and target domains, i. e. reducing domain shift.
1 code implementation • CVPR 2020 • Zhonghao Wang, Mo Yu, Yunchao Wei, Rogerio Feris, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi
We consider the problem of unsupervised domain adaptation for semantic segmentation by easing the domain shift between the source domain (synthetic data) and the target domain (real data) in this work.
Ranked #8 on Semantic Segmentation on DensePASS
no code implementations • 26 Feb 2020 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu
Deep Learning (DL) innovations are being introduced at a rapid pace.
no code implementations • 19 Feb 2020 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu
Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.
no code implementations • 19 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu
MLModelScope defines abstractions for frameworks and supports board range of DL models and evaluation scenarios.
no code implementations • 18 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu
We show that DLBricks provides an accurate performance estimate for the DL models and reduces the benchmarking time across systems (e. g. within $95\%$ accuracy and up to $4. 4\times$ benchmarking time speedup on Amazon EC2 c5. xlarge).
no code implementations • 18 Nov 2019 • Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen
The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.
no code implementations • 16 Nov 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu
An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities.
no code implementations • 25 Sep 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu
Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them.
no code implementations • IJCNLP 2019 • Omer Anjum, Hongyu Gong, Suma Bhat, Wen-mei Hwu, JinJun Xiong
Finding the right reviewers to assess the quality of conference submissions is a time consuming process for conference organizers.
2 code implementations • 20 Sep 2019 • Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Object detection and tracking are challenging tasks for resource-constrained embedded systems.
no code implementations • ICCV 2019 • Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, JinJun Xiong, Thomas Huang, Wen-mei Hwu, Honghui Shi
The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.
no code implementations • 19 Aug 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu
Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack).
1 code implementation • 25 Jun 2019 • Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.
no code implementations • 22 Jun 2019 • Omer Anjum, Wen-mei Hwu, JinJun Xiong
Recently we decided to conduct a more thorough study based on all past papers of International Symposium on Computer Architecture (ISCA) from 1973 to 2018, which resulted this article.
2 code implementations • 20 May 2019 • Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen
Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.
no code implementations • 29 Apr 2019 • Cheng Li, Abdul Dakkak, JinJun Xiong, Wen-mei Hwu
An increasingly complex and diverse collection of Machine Learning (ML) models as well as hardware/software stacks, collectively referred to as "ML artifacts", are being proposed - leading to a diverse landscape of ML.
2 code implementations • 9 Apr 2019 • Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen
While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.
no code implementations • NAACL 2019 • Hongyu Gong, Suma Bhat, Lingfei Wu, JinJun Xiong, Wen-mei Hwu
Our generator employs an attention-based encoder-decoder to transfer a sentence from the source style to the target style.
no code implementations • 29 Jan 2019 • Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei Hwu, John Paul Strachan, Kaushik Roy, Dejan S Milojicic
We also present the PUMA compiler which translates high-level code to PUMA ISA.
Emerging Technologies Hardware Architecture
no code implementations • 24 Nov 2018 • Abdul Dakkak, Cheng Li, Simon Garcia de Gonzalo, JinJun Xiong, Wen-mei Hwu
Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines.
Distributed, Parallel, and Cluster Computing
no code implementations • 24 Nov 2018 • Abdul Dakkak, Cheng Li, JinJun Xiong, Wen-mei Hwu
Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that model owners and evaluators are hard-pressed analyzing and studying them.
no code implementations • 23 Nov 2018 • Bowen Cheng, Yunchao Wei, Jiahui Yu, Shiyu Chang, JinJun Xiong, Wen-mei Hwu, Thomas S. Huang, Humphrey Shi
While training on samples drawn from independent and identical distribution has been a de facto paradigm for optimizing image classification networks, humans learn new concepts in an easy-to-hard manner and on the selected examples progressively.
3 code implementations • 5 Oct 2018 • Bowen Cheng, Yunchao Wei, Rogerio Feris, JinJun Xiong, Wen-mei Hwu, Thomas Huang, Humphrey Shi
In particular, DCR places a separate classification network in parallel with the localization network (base detector).
2 code implementations • 18 Sep 2018 • Carl Pearson, Abdul Dakkak, Cheng Li, Sarah Hashash, JinJun Xiong, Wen-mei Hwu
This report presents the design of the Scope infrastructure for extensible and portable benchmarking.
Performance