2023
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos
Motivated by Goldman’s Theory of Human Action—a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur—the authors propose a novel learning formulation for motion and context, where context is derived as the complement to motion.
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos
LEAP is a novel method for generating video-grounded action programs through use of a Large Language Model (LLM). These action programs represent the motoric, perceptual, and structural aspects of action, and consist of sub-actions, pre- and post-conditions, and control flows. LEAP’s action programs are centered on egocentric video and employ recent developments in LLMs both as a source for program knowledge and as an aggregator and assessor of multimodal video information.
arXiv.org
Dehao Yuan, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos
Given samples of a continuous object (e.g. a function), Hyper-Dimensional Function Encoding (HDFE) produces an explicit vector representation of the given object, invariant to the sample distribution and density. Sample distribution and density invariance enables HDFE to consistently encode continuous objects regardless of their sampling, and therefore allows neural networks to receive continuous objects as inputs for machine learning tasks such as classification and regression. HDFE does not require any training and is proved to map the object into an organized embedding space, which facilitates the training of the downstream tasks. In addition, the encoding is decodable, which enables neural networks to regress continuous objects by regressing their encodings. HDFE can be used as an interface for processing continuous objects.
arXiv.org
Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, due to the large-scale unknown environments and limited sensing coverage of these sensors. Here, the authors present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, this algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. The method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
arXiv.org
Matthew S. Evanusa, Vaishnavi Patil, Michelle Girvan, Joel Goodman, Cornelia Fermüller, Yiannis Aloimonos
Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for percep- tion-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the pre- dictions. Here, the authors present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture.
International Conference on Artificial Neural Networks (ICANN 2023) as compiled in International Conference on Artificial Neural Networks and Machine Learning by Springer
Nitin Sanket, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for percep- tion-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the pre- dictions. Here, the authors present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture.
Science Robotics
Chinmaya Devaraj, Cornelia Fermüller, Yiannis Aloimonos
GCN-based zero-shot learning approaches commonly use fixed input graphs representing external knowledge that usually comes from language. However, such input graphs fail to incorporate the visual domain nuances. The authors introduce a method to ground the external knowledge graph visually. The method is demonstrated on a novel concept of grouping actions according to a shared notion and shown to be of superior performance in zero-shot action recognition on two challenging human manipulation action datasets, the EPIC Kitchens dataset, and the Charades dataset. They further show that visually grounding the knowledge graph enhances the performance of GCNs when an adversarial attack corrupts the input graph.
Computer Vision Foundation workshop
Snehesh Shrestha, Ishan Tamrakar, Cornelia Fermüller, Yiannis Aloimonos
Haptic sensing can provide a new dimension to enhance people’s musical and cinematic experiences. However, designing a haptic pattern is neither intuitive nor trivial. Imagined haptic patterns tend to be different from experienced ones. As a result, researchers use simple step-curve patterns to create haptic stimul. Here, the authors designed and developed an intuitive haptic pattern designer that lets you rapidly prototype creative patterns. The simple architecture, wireless connectivity, and easy-to-program communication protocol make it modular and easy to scale. In this demo, workshop participants can select from a library of haptic patterns and design new ones. They can feel the pattern as they make changes in the user interface.
arXiv.org
Neal Anwar, Chethan Parameshwara, Cornelia Fermüller, Yiannis Aloimonos
Hyperdimensional Computing (HDC) is an emerging neuroscience-inspired framework wherein data of various modalities can be represented uniformly in high-dimensional space as long, redundant holographic vectors. When equipped with the proper Vector Symbolic Architecture (VSA) and applied to neuromorphic hardware, HDC-based networks have been demonstrated to be capable of solving complex visual tasks with substantial energy efficiency gains and increased robustness to noise when compared to standard Artificial Neural Networks (ANNs). Here, the authors present a bipolar HD encoding mechanism designed for encoding spatiotemporal data, which captures the contours of DVS-generated time surfaces created by moving objects by fitting to them local surfaces which are individually encoded into HD vectors and bundled into descriptive high-dimensional representations.
IEEE 2023 57th Annual Conference on Information Sciences and Systems (CISS)
Michael Maynord, M. Mehdi Farhangi, Cornelia Fermüller, Yiannis Aloimonos, Gary Levine, Nicholas Petrick, Berkman Sahiner, Aria Pezeshk
Proposes a cooperative labeling method that allows researchers to make use of weakly annotated medical imaging data for training a machine learning algorithm. As most clinically produced data is weakly-annotated - produced for use by humans rather than machines, and lacking information machine learning depends upon - this approach allows researchers to incorporate a wider range of clinical data and thereby increase the training set size.
Medical Physics
Snehesh Shrestha, William Sentosatio, Huiashu Peng, Cornelia Fermüller, Yiannis Aloimonos
This work introduces FEVA, a video annotation tool with streamlined interaction techniques and a dynamic interface that makes labeling tasks easy and fast. FEVA focuses on speed, accuracy, and simplicity to make annotation quick, consistent, and straightforward.
arXiv.org
2022
Hussam Amrouch, Mohsen Imani, Xun Jiao, Yiannis Aloimonos, Cornelia Fermuller, Dehao Yuan, Dongning Ma, Hamza E. Barkam, Paul Genssler, Peter Sutor
Hyperdimensional Computing (HDC) is rapidly emerging as an attractive alternative to traditional deep learning algorithms. Despite the profound success of Deep Neural Networks (DNNs) in many domains, the amount of computational power and storage that they demand during training makes deploying them in edge devices very challenging if not infeasible. This, in turn, inevitably necessitates streaming the data from the edge to the cloud which raises serious concerns when it comes to availability, scalability, security, and privacy. Further, the nature of data that edge devices often receive from sensors is inherently noisy. However, DNN algorithms are very sensitive to noise, which makes accomplishing the required learning tasks with high accuracy immensely difficult. In this paper, we aim at providing a comprehensive overview of the latest advances in HDC.
2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Chahat Deep Singh, Riya Kumari, Cornelia Fermüller, Nitin J. Sanket, Yiannis Aloimonos
WorldGen is an open-source framework to autonomously generate countless structured and unstructured 3D photorealistic scenes such as city view, object collection, and object fragmentation along with its rich ground truth annotation data. WorldGen being a generative model, the user has full access and control to features such as texture, object structure, motion, camera and lens properties for better generalizability by diminishing the data bias in the network. The authors demonstrate the effectiveness of WorldGen by presenting an evaluation on deep optical flow. They hope such a tool can open doors for future research in a myriad of domains related to robotics and computer vision by reducing manual labor and the cost of acquiring rich and high-quality data.
arXiv.org
Xiaomin Lin, Nitin J. Sanket, Nare Karapetyan, Yiannis Aloimonos
A new way to mathematically model oysters and render images of oysters in simulation. This new method can boost detection performance with minimal real data, especially when used in conjunction with underwater robots.
arXiv.org
Peter Sutor, Dehao Yuan, Douglas Summers-Stay, Cornelia Fermüller, Yiannis Aloimonos
The authors explore the notion of using binary hypervectors to directly encode the final, classifying output signals of neural networks in order to fuse differing networks together at the symbolic level.
arXiv.org
Matthew Evanusa, Snehesh Shrestha, Vaishnavi Patil, Cornelia Fermüller, Michelle Girvan & Yiannis Aloimonos
Echo State Networks are a class of recurrent neural networks that can learn to regress on or classify sequential data by keeping the recurrent component random and training only on a set of readout weights, which is of interest to the current edge computing and neuromorphic community. However, they have struggled to perform well with regression and classification tasks and therefore, could not compete in performance with traditional RNNs, such as LSTM and GRU networks. To address this limitation, the authors have developed a hybrid network called Parallelized Deep Readout Echo State Network that combines the deep learning readout with a fast random recurrent component, with multiple ESNs computing in parallel.
SN Computer Science (Springer)
2021
Behzad Sadrfaridpour, Yiannis Aloimonos, Miao Yu, Yang Tao, Donald Webster
To test the idea that advancements in robotics and artificial intelligence offer the potential to improve the monitoring of oyster beds, the researchers prepared a remote operated underwater vehicle (ROV) with a camera and filmed in the Chesapeake Bay. They then used these videos to train convolutional neural networks (CNNs) to count oysters and track them in consecutive image frames so they are not identified multiple times.
arXiv.org
Chethan M. Parameshwara, Simin Li, Cornelia Fermüller, Nitin J. Sanket, Matthew S. Evanusa, Yiannis Aloimonos
The researchers propose SpikeMS, the first deep encoder-decoder SNN architecture for the real-world large-scale problem of motion segmentation using the event-based DVS camera as input.
arXiv.org
2020
Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
The researchers present a novel Deep Reservoir Network for time series prediction and classification that learns through non-differentiable hidden reservoir layers using a biologically-inspired back propagation alternative. This alternative, called Direct Feedback Alignment, resembles global dopamine signal broadcasting in the brain. The researchers demonstrate its efficacy on two real-world multidimensional time series datasets.
arXiv.org
Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
The researchers show that a large, deep layered spiking neural network with dynamical, chaotic activity mimicking the mammalian cortex with biologically-inspired learning rules, such as STDP, is capable of encoding information from temporal data.
arXiv.org
Matthew Evanusa, Snehesh Shrestha, Michelle Girvan, Cornelia Fermüller, Yiannis Aloimonos
Demonstrates the use of a backpropagation hybrid mechanism for parallel reservoir computingwith a meta ring structure and its application on a real-world gesture recognition dataset. This mechanism can be used as an alternative to state of the art recurrent neural networks, LSTMs and GRUs.
arXiv.org
Eadom Dessalene, Michael Maynord, Chinmaya Devaraj, Cornelia Fermüller, Yiannis Aloimonos
Introduces Egocentric Object Manipulation Graphs (Ego-OMG):a novel representation for activity modeling and anticipation of near future actions.
arXiv.org
Anton Mitrokhin, Peter Sutor, Douglas Summers-Stay, Cornelia Fermüller, Yiannis Aloimonos
By using hashing neural networks to produce binary vector representations of images, the authors show how hyperdimensional vectors can be constructed such that vector-symbolic inference arises naturally out of their output.
Frontiers in Robotics and AI
Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermüller, Yiannis Aloimonos
Presents a Graph Convolutional neural network for the task of scene motion segmentation by a moving camera. Describes spatial and temporal features of event clouds, which provide cues for motion tracking and segmentation.
Computer Vision Foundation
Chethan M. Parameshwara, Nitin J. Sanket, Arjun Gupta, Cornelia Fermüller, Yiannis Aloimonos
A solution to multi-object motion segmentation using a combination of classical optimization methods along with deep learning and does not require prior knowledge of the 3D motion and the number and structure of objects.
arXiv.org
John Kanu, Eadom Dessalene, Xiaomin Lin, Cornelia Fermüller, Yiannis Aloimonos
A novel robotic agent framework for learning to perform temporally extended tasks using spatial reasoning in a deep reinforcement learning framework, by sequentially imagining visual goals and choosing appropriate actions to fulfill imagined goals.
arXiv.org
1996
Cornelia Fermüller, Yiannis Aloimonos
If 3D rigid motion can be correctly estimated from image sequences, the structure of the scene can be correctly derived using the equations for image formation. However, an error in the estimation of 3D motion will result in the computation of a distorted version of the scene structure. Of computational interest are these regions in space where the distortions are such that the depths become negative, because in order for the scene to be visible it has to lie in front of the image, and thus the corresponding depth estimates have to be positive. The stability analysis for the structure from motion problem presented in this paper investigates the optimal relationship between the errors in the estimated translational and rotational parameters of a rigid motion that results in the estimation of a minimum number of negative depth values. The input used is the value of the flow along some direction, which is more general than optic flow or correspondence. For a planar retina it is shown that the optimal configuration is achieved when the projections of the translational and rotational errors on the image plane are perpendicular. Furthermore, the projection of the actual and the estimated translation lie on a line through the center. For a spherical retina, given a rotational error, the optimal translation is the correct one; given a translational error, the optimal rotational error depends both in direction and value on the actual and estimated translation as well as the scene in view. The proofs, besides illuminating the confounding of translation and rotation in structure from motion, have an important application to ecological optics. The same analysis provides a computational explanation of why it is easier to estimate self-motion in the case of a spherical retina and why shape can be estimated easily in the case of a planar retina, thus suggesting that nature’s design of compound eyes (or panoramic vision) for flying systems and camera-type eyes for primates (and other systems that perform manipulation) is optimal.
International Journal of Computer Vision
2020
Maria Coelho, Mark Austin, Shivam Mishra, Mark Blackburn
Due to remarkable advances in computer, communications and sensing technologies over the past three decades,large-scale urban systems are now far more heterogeneous and automated than their predecessors. They may, in fact, be connected to other types of systems in completely new ways. These characteristics make the tasks of system design, analysis and integration of multi-disciplinary concerns much more difficult than in the past. We believe these challenges can be addressed by teaching machines to understand urban networks. This paper explores opportunities for using a recently developed graph autoencoding approach to encode the structure and associated network attributes as low-dimensional vectors. We exercise the proposed approach on a problem involving identification of leaks in urban water distribution systems.
IARIA International Journal on Advances in Networks and Services
2023
Armin Lederer, Erfaun Noorani, John Baras, Sandra Hirche
While the focus of inhibitory control has been on risk-neutral formulations, human studies have shown a tight link between response inhibition and risk attitude. Inspired by this insight, the authors propose a flexible, risk-sensitive method for inhibitory control. Our method is based on a risk-aware condition for value functions, which guarantees the satisfaction of state constraints.
arXiv.org
Erfaun Noorani, Christos Mavridis, John Baras
Incorporating risk in the decision-making process has been shown to lead to significant performance improvement in optimal control and reinforcement learning algorithms. Here, the authors construct a temporal-difference risk-sensitive reinforcement learning algorithm using the exponential criteria commonly used in risk-sensitive control. The proposed method resembles an actor-critic architecture with the ‘actor’ implementing a policy gradient algorithm based on the exponential of the reward-to-go, which is estimated by the ‘critic.’ The novelty of the update rule of the ‘critic’ lies in the use of a modified objective function that corresponds to the underlying multiplicative Bellman’s equation.
2023 American Control Conference
2022
Erfaun Noorani, Christos Mavridis, John Baras
Risk-sensitive reinforcement learning algorithms have been studied to introduce robustness and sample efficiency, and lead to better real-life performance. Here, the authors introduce new model-free risk-sensitive reinforcement learning algorithms as variations of widely-used Policy Gradient algorithms with similar implementation properties.
arXiv.org
Christos Mavridis, John Baras
Hierarchical learning algorithms that gradually approximate a solution to a data-driven optimization problem are essential to decision-making systems, especially under limitations on time and computational resources. In this study, the authors introduce a general-purpose hierarchical learning architecture based on the progressive partitioning of a possibly multi-resolution data space. The optimal partition is gradually approximated by solving a sequence of optimization sub-problems that yield a sequence of partitions with increasing number of subsets. The authors show that the solution of each optimization problem can be estimated online using gradient-free stochastic approximation updates.
arXiv.org
Erfaun Noorani, John Baras
Trust facilitates collaboration and coordination in teams and is paramount to achieving optimality in the absence of direct communication and formal coordination devices. The authors investigate the influence of agents' risk-attitudes on trust and the emergence of coordination in multi-agent environments. They consider Independent Risk-sensitive Policy Gradient, Risk-sensitive REINFORCE, RL-agents in repeated 2-agent coordination games. They suggest that risk-sensitive agents could achieve better results in multi-agent task environments.
2022 European Control Conference (ECC)
Christos Mavridis, Erfaun Noorani, John Baras
Prototype-based learning methods have been extensively studied as fast, recursive, data-driven, interpretable, and robust learning algorithms. The authors study the effect of entropy regularization in prototype-based learning regarding (i) robustness with respect to the dataset and the initial conditions, and (ii) the generalization properties of the learned representation. A duality relationship, with respect to a Legendre-type transform, between free energy and Kulback-Leibler divergence measures, is used to show that entropy-regularized prototype-based learning is connected to exponential objectives associated with risk-sensitive learning.
2022 30th Mediterranean Conference on Control and Automation (MED)
Christos Mavridis, George Kontudis, John Baras
Here, the authors introduce a sparse Gaussian process regression model whose covariance function is parameterized by the locations of a progressively growing set of pseudo-inputs generated by an online deterministic annealing optimization algorithm. This is an active learning approach, which, in contrast to most existing works, can modify already selected pseudo-inputs and is trained with recursive, gradient-free updates.
61st IEEE Conference on Decision and Control (2022)
Christos Mavridis, John Baras
The authors introduce a learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized.
arXiv.org
Anousheh Gholami, Nariman Torkzaban, John Baras
Federated learning (FL) has received significant attention from both academia and industry, as an emerging paradigm for building machine learning models in a communication-efficient and privacy preserving manner. It enables potentially a massive number of resource constrained agents (e.g. mobile devices and IoT devices) to train a model by a repeated process of local training on agents and centralized model aggregation on a central server. The paper proposes trust as a metric to measure the trustworthiness of the FL agents and thereby enhance the security of the FL training.
2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC)
2021
Christos Mavridis, John Baras
Principles from mathematics, system theory, and optimization are used to investigate the structure of a data-agnostic learning architecture that resembles the “one learning algorithm” believed to exist in the visual and auditory cortices of the human brain. The authors' approach consists of a closed-loop system with (i) a multi-resolution analysis pre-processor, (ii) a group-invariant feature extractor, and (iii) a progressive knowledge-based learning module, along with multi-resolution feedback loops that are used for learning.
arXiv.org
Nilesh Suriyarachchi, John Baras
The communication and sensing capabilities of modern connected autonomous vehicles (CAVs) will allow new approaches in control to help solve the problem of stop-and-go waves in highway networks. The paper introduces a communication-based cooperative control method for CAVs in multi-lane highways in a mixed traffic setting. Each vehicle is able to take proactive control actions. This is an improvement over existing reactive methods which rely on shock waves already being present. In addition, the new method’s performance is independent of the highway structure; the algorithm performs identically on ring roads like “beltways” and straight roads. The method allows for proactive control application and exhibits good shock wave dissipation performance even when only a few CAVs are present amongst conventional vehicles. The results were verified on a three-lane circular highway loop using realistic traffic simulation software.
IEEE 94th Vehicular Technology Conference (Fall 2021)
Nilesh Suriyarachchi, Faizan Tariq, Christos Mavridis, John Baras
Highway on-ramp merge junctions remain a major bottleneck in transportation networks. However, with the introduction of Connected Autonomous Vehicles (CAVs) with advanced sensing and communication capabilities modern algorithms can capitalize on the cooperation between vehicles. This paper enhances highway merging efficiency by optimally coordinating CAVs in order to maximize the flow of vehicles while satisfying all safety constraints.
2021 IEEE International Intelligent Transportation Systems Conference (ITSC)
Christos Mavridis, John Baras
An online prototype-based learning algorithm for clustering and classification, based on the principles of deterministic annealing.
arXiv.org
2020
Fatemeh Alimardani, Nilesh Suriyarachchi, Faizan Tariq, John Baras
Explores the integration of two of the most common traffic management strategies, namely, ramp metering and route guidance, into existing highway networks with human-driven vehicles.
Chapter in the forthcoming book, Transportation Systems for Smart, Sustainable, Inclusive and Secure Cities
Christos Mavridis, John Baras
The researchers investigate the convergence properties of stochastic vector quantization (VQ) and its supervised counterpart, Learning Vector Quantization (LVQ), using Bregman divergences. We employ the theory of stochastic approximation to study the conditions on the initialization and the Bregman divergence generating functions, under which,the algorithms converge to desired configurations. These results formally support the use of Bregman divergences, such as the Kullback-Leibler divergence, in vector quantization algorithms.
johnbaras.com
Aneesh Raghavan, John Baras
This paper pertains to stochastic multi-agent decision-making problems. The authors revisit the concepts of event-state-operation-structure and relationship of incompatibility from literature, and use them as a tool to study the algebraic structure of a set of events. They consider a multi-agent hypothesis testing problem and show that the set of events forms an ortholattice. They then consider the binary hypothesis testing problem wth finite observation space.
arXiv.org
Aneesh Raghavan, John Baras
This paper pertains to hypothesis testing problems, specifically the problem of collaborative binary hypothesis testing.
arXiv.org
Ion Matei, Johan de Kleer, Christoforos Somarakis, Rahul Rai, John Baras
To understand changes in physical systems and facilitate decisions, explaining how model predictions are made is crucial. In this paper the authors use model-based interpretability, where models of physical systems are constructed by composing basic constructs that explain locally how energy is exchanged and transformed.
arXiv.org
2019
Mohammad Mamduhi, Karl Johansson, Ehsan Hashemi, John Baras
This paper proposes an event-triggered, add-on safety mechanism in a networked vehicular system that can adjust control parameters for timely braking while maintaining maneuverability.
arXiv.org
2022
Tianchen Liu, Nikhil Chopra, Jayesh Samtani
Many strawberry growers in some areas of the United States rely on customers to pick the fruits during the peak harvest months. Unfavorable weather conditions such as high humidity and excessive rainfall can quickly promote fruit rot and diseases. This study establishes an elementary farm information system to demonstrate timely information on the farm and fruit conditions (ripe, unripe) to the growers. The information system processes a video clip or a sequence of images from a camera to provide a map which can be viewed to estimate quantities of strawberries at different stages of ripeness. The farm map is built by state-of-the-art vision-based simultaneous localization and mapping (SLAM) techniques, which can generate the map and track the motion trajectory using image features. It can help farm labor direct traffic to specific strawberry locations within a farm where fruits need to be picked, or where berries need to be removed. The obtained system can help reduce farm revenue loss and promote sustainable crop production.
Proceedings of the 2022 Biology and Life Sciences Forum
2021
Kushal Chakrabarti, Nikhil Chopra
Accelerated gradient-based methods are being extensively used for solving non-convex machine learning problems, especially when the data points are abundant or the available data is distributed across several agents. Two of the prominent accelerated gradient algorithms are AdaGrad and Adam. AdaGrad is the simplest accelerated gradient method, particularly effective for sparse data. Adam has been shown to perform favorably in deep learning problems compared to other methods. Here the authors propose a new fast optimizer, Generalized AdaGrad (G-AdaGrad), for accelerating the solution of potentially non-convex machine learning problems.
arXiv.org
2020
Kushal Chakrabarti, Nirupam Gupta, Nikhil Chopra
This paper considers the multi-agent linear least-squares problem in a server-agent network. The system comprises multiple agents, each having a set of local data points, that are connected to a server. The goal for the agents is to compute a linear mathematical model that optimally fits the collective data points held by all the agents, without sharing their individual local data points. The paper proposes an iterative pre-conditioning technique that mitigates the deleterious effect of the conditioning of data points on the rate of convergence of the gradient-descent method.
arXiv.org
2023
Rance Cleaveland, Jeroen J. A. Keiren, Peter Fontana
The authors establish relative expressiveness results for several modal mu-calculi interpreted over timed automata. These mu-calculi combine modalities for expressing passage of (real) time with a general framework for defining formulas recursively; several variants have been proposed in the literature. We show that one logic, which we call Lrel ν,μ, is strictly more expressive than the other mu-calculi considered. It is also more expressive than the temporal logic TCTL, while the other mu-calculi are incomparable with TCTL in the setting of general timed automata.
arXiv.org
2022
Jeroen J. A. Keiren, Rance Cleaveland
This paper revisits soundness and completeness of proof systems for proving that sets of states in infinite-state labeled transition systems satisfy formulas in the modal mu-calculus. The authors' results rely on novel results in lattice theory, which give constructive characterizations of both greatest and least fixpoints of monotonic functions over complete lattices. They show how these results may be used to reconstruct the sound and complete tableau method for this problem due to Bradfield and Stirling. They also show how the flexibility of their lattice-theoretic basis simplifies reasoning about tableau-based proof strategies for alternative classes of systems. In particular, the authors extend the modal mu-calculus with timed modalities, and prove that the resulting tableaux method is sound and complete for timed transition systems.
arXiv.org
2020
Samuel Huang, Rance Cleaveland
This paper describes a technique for inferring temporal-logic properties for sets of finite data streams. Such data streams arise in many domains, including server logs, program testing, and financial and marketing data; temporal-logic formulas that are satisfied by all data streams in a set can provide insight into the underlying dynamics of the system generating these streams. The authors' approach makes use of so-called Linear Temporal Logic (LTL) queries, which are LTL formulas containing a missing subformula and interpreted over finite data streams. Solving such a query involves computing a subformula that can be inserted into the query so that the resulting grounded formula is satisfied by all data streams in the set. The paper describes an automaton-driven approach to solving this query-checking problem and demonstrates a working implementation via a pilot study.
arXiv.org
Peter Fontana, Rance Cleaveland
This report contains the descriptions of the timed automata (models) and the prop-erties (specifications) that are used as the “benchmark examples in Data structure choices for on-the-fly model checking of real-time systems” and “The power of proofs: New algorithms for timed automata model checking.” The four models from those sources are: CSMA, FISCHER, LEADER, and GRC. Additionally we include in this re-port two additional models: FDDI and PATHOS. These six models are often used to benchmark timed automata model checker speed throughout timed automata model checking papers.
arXiv.org
Rance Cleaveland
This paper shows how the use of Structural Operational Semantics (SOS) inthe style popularized by the process-algebra community can lead to a more succinct and useful construction for building finite automata from regular expressions.
arXiv.org
2022
Ahmed Adel Attia, Carol Espy-Wilson
A deep learning-based approach using Masked Autoencoders to accurately reconstruct the mistracked articulatory recordings for 41 out of 47 speakers of the XRMB dataset. (The University of Wisconsin X-Ray Microbeam (XRMB) dataset is one of various datasets that provide articulatory recordings synced with audio recordings.) The authors' model is able to reconstruct articulatory trajectories that closely match ground truth, even when three out of eight articulators are mistracked, and retrieve 3.28 out of 3.4 hours of previously unusable recordings.
arXiv.org
Deanna Kelly, Glen Coppersmith, John Dickerson, Carol Espy-Wilson, Hanna Michel, Philip Resnik, Carol Espy-Wilson
Machine learning approaches to mental health face a challenging tension between scalability and validity. Tools are needed to help predict symptoms, but important uncertainties remain. How can we be confident that remote data surveillance reflects an individual’s true clinical condition? How do we obtain such data at a large scale for machine learning techniques? The authors present work aimed at addressing these gaps.
Biological Psychiatry
Yashish Maduwantha Siriwardena, Ganesh Sivaraman, Carol Espy-Wilson
Multi-task learning (MTL) frameworks have proven to be effective in diverse speech related tasks like automatic speech recognition (ASR) and speech emotion recognition. This paper proposes a MTL framework to perform acoustic-to-articulatory speech inversion by simultaneously learning an acoustic to phoneme mapping as a shared task.
arXiv.org
Rahil Parikh, Ilya Kavalerov, Carol Espy-Wilson, Shihab Shamma
Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. The authors analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net.
arXiv.org
Nadee Seneviratne, Carol Espy-Wilson
The authors develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to unimodal classifiers.
arXiv.org
2021
Yashish Maduwantha Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson
The authors investigate speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g. hallucinations and delusions), using two distinct channel-delay correlation methods. They show that the schizophrenic subjects with strong positive symptoms and who are markedly ill pose more complex articulatory coordination patterns in facial and speech gestures than what is observed in healthy subjects.
arXiv.org; Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI ’21)
Carol Espy-Wilson
Dr. Espy-Wilson discusses a speech inversion system her group has developed that maps the acoustic signal to vocal tract variables (TVs). The trajectories of the TVs show the timing and spatial movement of speech gestures. She explains how her group uses machine learning techniques to compute articulatory coordination features (ACFs) from the TVs. The ACFs serve as an input into a deep learning model for mental health classification. Espy-Wilson also illustrates the key acoustic differences between speech produced by subjects when they are mentally ill relative to when they are in remission and relative to healthy controls. The ultimate goal of this research is the development of a technology (perhaps an app) for patients that can help them, their therapists and caregivers monitor their mental health status between therapy sessions.
Keynote speech at the 2021 Acoustical Society of America Annual Meeting, June 8, 2021
View a press release from the Acoustical Society of America about this speech
Nadee Seneviratne, Carol Espy-Wilson
The paper proposes a new multi-stage architecture trained on vocal tract variable (TV)-based articulatory coordination features (ACFs) for depression severity classification which clearly outperforms the baseline models. The authors establish that the robustness of ACFs based on TVs holds beyond mere detection of depression and even in severity level classification. This work can be extended to develop a multi-modal system that can take advantage of textual information obtained through Automatic Speech Recognition tools. Linguistic features can reveal important information regarding the verbal content of a depressed patient relating to their mental health condition.
arXiv.org; accepted for Interspeech2021, Aug. 30-Sept. 3, 2021
Yashish Maduwantha Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson
This study, conducted with AIM-HI funding, investigates speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g.hallucinations and delusions), using a time delay embedded correlation analysis. It finds a distinction between healthy and schizophrenia subjects in neuromotor coordination in speech.
ResearchGate.net
2020
Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson
Implements three auto-encoder and GAN based models to synthetically generate higher dimensional feature vectors useful for speech emotion recognition from a simpler prior distribution pz.
IEEE Transactions on Affective Computing
2019
Saurabh Sahu, Vikramjit Mitra, Nadee Seneviratne, Carol Espy-Wilson
The paper leverages multi-modal learning and automated speech recognition (ASR) systems toward building a speech-only emotion recognition model.
Interspeech 2019
2023
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos
Motivated by Goldman’s Theory of Human Action—a framework in which action decomposes into 1) base physical movements, and 2) the context in which they occur—the authors propose a novel learning formulation for motion and context, where context is derived as the complement to motion.
2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)
Eadom Dessalene, Michael Maynord, Cornelia Fermüller, Yiannis Aloimonos
LEAP is a novel method for generating video-grounded action programs through use of a Large Language Model (LLM). These action programs represent the motoric, perceptual, and structural aspects of action, and consist of sub-actions, pre- and post-conditions, and control flows. LEAP’s action programs are centered on egocentric video and employ recent developments in LLMs both as a source for program knowledge and as an aggregator and assessor of multimodal video information.
arXiv.org
Dehao Yuan, Furong Huang, Cornelia Fermüller, Yiannis Aloimonos
Given samples of a continuous object (e.g. a function), Hyper-Dimensional Function Encoding (HDFE) produces an explicit vector representation of the given object, invariant to the sample distribution and density. Sample distribution and density invariance enables HDFE to consistently encode continuous objects regardless of their sampling, and therefore allows neural networks to receive continuous objects as inputs for machine learning tasks such as classification and regression. HDFE does not require any training and is proved to map the object into an organized embedding space, which facilitates the training of the downstream tasks. In addition, the encoding is decodable, which enables neural networks to regress continuous objects by regressing their encodings. HDFE can be used as an interface for processing continuous objects.
arXiv.org
Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, due to the large-scale unknown environments and limited sensing coverage of these sensors. Here, the authors present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, this algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. The method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes.
arXiv.org
Matthew S. Evanusa, Vaishnavi Patil, Michelle Girvan, Joel Goodman, Cornelia Fermüller, Yiannis Aloimonos
Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for percep- tion-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the pre- dictions. Here, the authors present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture.
International Conference on Artificial Neural Networks (ICANN 2023) as compiled in International Conference on Artificial Neural Networks and Machine Learning by Springer
Nitin Sanket, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos
Robots are active agents that operate in dynamic scenarios with noisy sensors. Predictions based on these noisy sensor measurements often lead to errors and can be unreliable. To this end, roboticists have used fusion methods using multiple observations. Lately, neural networks have dominated the accuracy charts for percep- tion-driven predictions for robotic decision-making and often lack uncertainty metrics associated with the pre- dictions. Here, the authors present a mathematical formulation to obtain the heteroscedastic aleatoric uncertainty of any arbitrary distribution without prior knowledge about the data. The approach has no prior assumptions about the prediction labels and is agnostic to network architecture.
Science Robotics
Daniel Deniz, Eduardo Ross, Cornelia Fermüller, Manuel Rodriguez-Alvarez, Francisco Barranco
Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events when brightness changes occur in the scene. These sensors offer many advantages including very high temporal resolution, no motion blur and smart data compression ideal for real-time process- ing. In this study, the authors introduce an event-based dataset on fine-grained manipulation actions and perform an experimental study on the use of transformers for action prediction with events. There is enormous interest in the fields of cognitive robotics and human-robot interaction on understanding and predicting human actions as early as possible. Early prediction allows anticipating complex stages for planning, enabling effective and real-time interaction. Their transformer network uses events to predict manipulation actions as they occur, using online inference.
arXiv.org
Chinmaya Devaraj, Cornelia Fermüller, Yiannis Aloimonos
GCN-based zero-shot learning approaches commonly use fixed input graphs representing external knowledge that usually comes from language. However, such input graphs fail to incorporate the visual domain nuances. The authors introduce a method to ground the external knowledge graph visually. The method is demonstrated on a novel concept of grouping actions according to a shared notion and shown to be of superior performance in zero-shot action recognition on two challenging human manipulation action datasets, the EPIC Kitchens dataset, and the Charades dataset. They further show that visually grounding the knowledge graph enhances the performance of GCNs when an adversarial attack corrupts the input graph.
Computer Vision Foundation workshop
Snehesh Shrestha, Ishan Tamrakar, Cornelia Fermüller, Yiannis Aloimonos
Haptic sensing can provide a new dimension to enhance people’s musical and cinematic experiences. However, designing a haptic pattern is neither intuitive nor trivial. Imagined haptic patterns tend to be different from experienced ones. As a result, researchers use simple step-curve patterns to create haptic stimul. Here, the authors designed and developed an intuitive haptic pattern designer that lets you rapidly prototype creative patterns. The simple architecture, wireless connectivity, and easy-to-program communication protocol make it modular and easy to scale. In this demo, workshop participants can select from a library of haptic patterns and design new ones. They can feel the pattern as they make changes in the user interface.
arXiv.org
Neal Anwar, Chethan Parameshwara, Cornelia Fermüller, Yiannis Aloimonos
Hyperdimensional Computing (HDC) is an emerging neuroscience-inspired framework wherein data of various modalities can be represented uniformly in high-dimensional space as long, redundant holographic vectors. When equipped with the proper Vector Symbolic Architecture (VSA) and applied to neuromorphic hardware, HDC-based networks have been demonstrated to be capable of solving complex visual tasks with substantial energy efficiency gains and increased robustness to noise when compared to standard Artificial Neural Networks (ANNs). Here, the authors present a bipolar HD encoding mechanism designed for encoding spatiotemporal data, which captures the contours of DVS-generated time surfaces created by moving objects by fitting to them local surfaces which are individually encoded into HD vectors and bundled into descriptive high-dimensional representations.
IEEE 2023 57th Annual Conference on Information Sciences and Systems (CISS)
Snehesh Shrestha, William Sentosatio, Huiashu Peng, Cornelia Fermüller, Yiannis Aloimonos
This work introduces FEVA, a video annotation tool with streamlined interaction techniques and a dynamic interface that makes labeling tasks easy and fast. FEVA focuses on speed, accuracy, and simplicity to make annotation quick, consistent, and straightforward.
arXiv.org
2022
Hussam Amrouch, Mohsen Imani, Xun Jiao, Yiannis Aloimonos, Cornelia Fermuller, Dehao Yuan, Dongning Ma, Hamza E. Barkam, Paul Genssler, Peter Sutor
Hyperdimensional Computing (HDC) is rapidly emerging as an attractive alternative to traditional deep learning algorithms. Despite the profound success of Deep Neural Networks (DNNs) in many domains, the amount of computational power and storage that they demand during training makes deploying them in edge devices very challenging if not infeasible. This, in turn, inevitably necessitates streaming the data from the edge to the cloud which raises serious concerns when it comes to availability, scalability, security, and privacy. Further, the nature of data that edge devices often receive from sensors is inherently noisy. However, DNN algorithms are very sensitive to noise, which makes accomplishing the required learning tasks with high accuracy immensely difficult. In this paper, we aim at providing a comprehensive overview of the latest advances in HDC.
2022 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
Peter Sutor, Dehao Yuan, Douglas Summers-Stay, Cornelia Fermüller, Yiannis Aloimonos
The authors explore the notion of using binary hypervectors to directly encode the final, classifying output signals of neural networks in order to fuse differing networks together at the symbolic level.
arXiv.org
Matthew Evanusa, Snehesh Shrestha, Vaishnavi Patil, Cornelia Fermüller, Michelle Girvan & Yiannis Aloimonos
Echo State Networks are a class of recurrent neural networks that can learn to regress on or classify sequential data by keeping the recurrent component random and training only on a set of readout weights, which is of interest to the current edge computing and neuromorphic community. However, they have struggled to perform well with regression and classification tasks and therefore, could not compete in performance with traditional RNNs, such as LSTM and GRU networks. To address this limitation, the authors have developed a hybrid network called Parallelized Deep Readout Echo State Network that combines the deep learning readout with a fast random recurrent component, with multiple ESNs computing in parallel.
SN Computer Science (Springer)
2021
Chethan M. Parameshwara, Simin Li, Cornelia Fermüller, Nitin J. Sanket, Matthew S. Evanusa, Yiannis Aloimonos
The researchers propose SpikeMS, the first deep encoder-decoder SNN architecture for the real-world large-scale problem of motion segmentation using the event-based DVS camera as input.
arXiv.org
2020
Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
The researchers present a novel Deep Reservoir Network for time series prediction and classification that learns through non-differentiable hidden reservoir layers using a biologically-inspired back propagation alternative. This alternative, called Direct Feedback Alignment, resembles global dopamine signal broadcasting in the brain. The researchers demonstrate its efficacy on two real-world multidimensional time series datasets.
arXiv.org
Matthew Evanusa, Cornelia Fermüller, Yiannis Aloimonos
The researchers show that a large, deep layered spiking neural network with dynamical, chaotic activity mimicking the mammalian cortex with biologically-inspired learning rules, such as STDP, is capable of encoding information from temporal data.
arXiv.org
Matthew Evanusa, Snehesh Shrestha, Michelle Girvan, Cornelia Fermüller, Yiannis Aloimonos
Demonstrates the use of a backpropagation hybrid mechanism for parallel reservoir computingwith a meta ring structure and its application on a real-world gesture recognition dataset. This mechanism can be used as an alternative to state of the art recurrent neural networks, LSTMs and GRUs.
arXiv.org
Eadom Dessalene, Michael Maynord, Chinmaya Devaraj, Cornelia Fermüller, Yiannis Aloimonos
Introduces Egocentric Object Manipulation Graphs (Ego-OMG):a novel representation for activity modeling and anticipation of near future actions.
arXiv.org
Anton Mitrokhin, Peter Sutor, Douglas Summers-Stay, Cornelia Fermüller, Yiannis Aloimonos
By using hashing neural networks to produce binary vector representations of images, the authors show how hyperdimensional vectors can be constructed such that vector-symbolic inference arises naturally out of their output.
Frontiers in Robotics and AI
Anton Mitrokhin, Zhiyuan Hua, Cornelia Fermüller, Yiannis Aloimonos
Presents a Graph Convolutional neural network for the task of scene motion segmentation by a moving camera. Describes spatial and temporal features of event clouds, which provide cues for motion tracking and segmentation.
Computer Vision Foundation
Chethan M. Parameshwara, Nitin J. Sanket, Arjun Gupta, Cornelia Fermüller, Yiannis Aloimonos
A solution to multi-object motion segmentation using a combination of classical optimization methods along with deep learning and does not require prior knowledge of the 3D motion and the number and structure of objects.
arXiv.org
John Kanu, Eadom Dessalene, Xiaomin Lin, Cornelia Fermüller, Yiannis Aloimonos
A novel robotic agent framework for learning to perform temporally extended tasks using spatial reasoning in a deep reinforcement learning framework, by sequentially imagining visual goals and choosing appropriate actions to fulfill imagined goals.
arXiv.org
2022
L. A. Prashanth, Michael Fu
Prashanth and Fu are the editors of this volume in the Foundations and Trends® in Machine Learning series by now Publishers Inc. This publication includes sections on Markov decision processes (MDPs), risk measures, background on policy evaluation and gradient estimation, policy gradient templates for risk-sensitive RL, MDPs with risk as the constraint, and MDPs with risk as the objective.
Foundations and Trends® in Machine Learning series
2019
Michael Fu
The deep neural networks of AlphaGo and AlphaZero can be traced back to an adaptive multistage sampling (AMS) simulation-based algorithm for Markov decision processed published by HS Chang, Michael C. Fu and Steven I Marcus in Operations Research in 2005. Here, Fu retraces history, talks about the impact of the initial research, and suggests enhancements for the future.
Asian-Pacific Journal of Operational Research
2023
Sangeeth Balakrishnan, Francis VanGessel, Brian Barnes, Ruth Doherty, William Wilson, Zois Boukouvalas, Mark Fuge, Peter Chung
Data-driven machine learning techniques can be useful for the rapid evaluation of material properties in extreme environments, particularly in cases where direct access to the materials is not possible. Such problems occur in high-throughput material screening and material design approaches where many candidates may not be amenable to direct experimental examination. In this paper, the authors perform an exhaustive examination of the applicability of machine learning for the estimation of isothermal shock compression properties, specifically the shock Hugoniot, for diverse material systems. A comprehensive analysis is conducted where effects of scarce data, variances in source data, feature choices, and model choices are systematically explored. New modeling strategies are introduced based on feature engineering, including a feature augmentation approach, to mitigate the effects of scarce data. The findings show significant promise of machine learning techniques for design and discovery of materials suited for shock compression applications.
Journal of Applied Physics
2022
Jesse Hearn, Sangeeth Balakrishnan, Francis VanGessel, Zois Boukouvalas, Brian Barnes, Ian Michel-Tyler, Ruth Doherty, William Wilson, William Durant, Mark Fuge, Peter Chung
High energy density polymeric binders are a class of polymer materials that can be used in lieu of inert binders in high energy density mixtures. By using higher energy binders, the overall internal energy of the mixture can be designed intentionally and proactively. This paper presents recent efforts to develop a machine learning approach to learn, predict, and design novel energetic polymers. The scarcity of data available for energetic polymers is a particular challenge that the authors overcome through transfer learning techniques. Generally-speaking, transfer learning is a class of machine learning algorithm that assists the learning of general trends within one dataset using alternate datasets. In their approach, the researchers use a feature transfer learning approach based on low-level physiochemical data that may be obtained for any molecule.
APS 22nd Biennial Conference of the APS Topical Group on Shock Compression of Condensed Matter
Connor O'Ryan, Francis VanGessel, Zois Boukouvalas, Mark Fuge, Peter Chung, Ian Michel-Tyler, Ruth Doherty, William Wilson, Kevin Hayes
Within the past two decades machine-learning algorithms have seen diverse development and implementation in a variety of domains, including those related to shock compression. These developments include advances in computationally assisted synthesis planning and natural language processing for text documents in the context of chemical energy. The objective of this work is to explore the intersection of these emergent research capabilities and develop automatable approaches for extracting synthesis information for chemical storage from text documents to create novel representations via knowledge graphs.
APS 22nd Biennial Conference of the APS Topical Group on Shock Compression of Condensed Matter
Sangeeth Balakrishnan, Francis VanGessel, Zois Boukouvalas, Brian Barnes, Mark Fuge, Peter Chung
Deep learning has shown a high potential for generating molecules with desired properties. But generative modeling can often lead to novel, speculative molecules whose synthesis routes are not obvious. Moreover, the cost and time required to calculate or measure high energy properties have restricted the available data set sizes for this class of materials, thereby limiting the usefulness of deep learning-based methods. As a solution to this problem, the authors propose a deep learning-based method that fuses data from multiple molecule classes, effectively enabling the learning and designing of high energy molecules with the assistance of data for general organic molecules, which tend to be available in massive databases.
APS 22nd Biennial Conference of the APS Topical Group on Shock Compression of Condensed Matter
Allen Garcia, Connor O'Ryan, Gaurav Kumar, Zois Boukouvalas, Mark Fuge, Peter Chung
This paper tests whether statistical relationships exist between the language used to discuss energetic materials and their fundamental physicochemical properties. A surprising and remarkable degree of statistical equivalence is found, in some cases showing >90% confidence levels. This work posits a new means for using automated machine-assisted approaches to learn from technical documents and facilitate the search and discovery of new materials.
APS 22nd Biennial Conference of the APS Topical Group on Shock Compression of Condensed Matter
2021
Qiuyi Chen, Phillip Pope, Mark Fuge
The manifold hypothesis forms a pillar of many modern machine learning techniques. Within the context of design, it proposes that valid designs reside on low dimensional manifolds in the high dimensional design spaces. Here, the authors present the optimal-transport-based sibling of their previous work, BézierGAN, that surpasses its predecessor in terms of both manifold approximating precision and learning speed. They also provide methodology that helps determine the intrinsic dimension of the design manifold beforehand.
AIAA SciTech Forum 2022
Arthur Drake, Qiuyi Chen, Mark Fuge
Complex engineering problems such as compressor blade optimization often require large amounts of data and computational resources to produce optimal designs because traditional approaches only operate in the original high-dimensional design space. To mitigate this issue, the authors develop a simple yet effective autoencoder architecture that operates on a prior ε-frontier from examples of past optimization trajectories. This paper focuses on using such non-linear methods to maximize dimensionality reduction on an easily verifiable synthetic dataset, providing a faster alternative to high-fidelity simulation techniques.
aaas.org
2020
Xiaolong Liu, Seda Aslan, Rachel Hess, Paige Mass, Laura Olivieri, Yue-Hin Loke, Narutoshi Hibino, Mark Fuge, Axel Krieger
Develops a computational framework for automatically designing optimal shapes of patient-specific TEVGs for aorta surgery.
42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Wei Chen, Mark Fuge
The authors propose a Bayesian optimization approach that only needs to specify an initial search space that does not necessarily include the global optimum, and expands the search space when necessary.
arXiv.org
2019
Eliot Rudnick-Cohen, Shapour Azarm and Jeffrey Herrmann
The paper presents Scenario Generation and Local Refinement Optimization (SGLRO), a new approach for solving non-convex robust optimization problems.
Journal of Mechanical Design
2023
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online. However, continued pre-training on out-of-domain distributions often leads to catastrophic forgetting of previously acquired knowledge, leading to sub-optimal ASR performance. This paper presents FusDom, a simple and novel methodology for SSL-based continued pre-training. /p>
arXiv.org
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Continued self-supervised (SSL) pre-training for adapting existing SSL models to the target domain has shown to be extremely effective for low-resource Automatic Speech Recognition (ASR). This paper proposes Stable Distillation, a simple and novel approach for SSL-based continued pre-training that boosts ASR performance in the target domain where both labeled and unlabeled data are limited.
arXiv.org
Puneet Mathur, Zhe Liu, Ke Li, Yingyi Ma, Gil Keren, Zeeshan Ahmed, Dinesh Manocha, Xuedong Zhang
PersonaLM is a domain-distributed, span-aggregated K-nearest N-gram retrieval augmentation to improve language modeling for Automatic Speech Recognition personalization.
Findings of the Association for Computational Linguistics: EMNLP 2023
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha
The choice of input text prompt plays a critical role in the performance of Vision-Language Pretrained (VLP) models such as CLIP. Apollo is a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models. The method is designed to substantially improve the generalization capabilities of VLP models when they are fine-tuned in a few-shot setting.
arXiv.org
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar, Purva Chiniya, Dinesh Manocha
Accurate estimation of Room Impulse Response (RIR), which captures an environment’s acoustic properties, is im- portant for speech processing and AR/VR applications. The authors propose AV-RIR, a novel multi-modal multi-task learning approach to accurately estimate the RIR from a given reverberant speech signal and the visual cues of its corresponding environment.
arXiv.org
Divya Kothandaraman, Tianyi Zhou, Ming Lin, Dinesh Manocha
AerialBooth synthesizes the aerial view from a single input image using its text description. The authors leverage the pretrained text-to-2D image stable diffusion model as prior knowledge of the 3D world. The model is finetuned in two steps to optimize for the text embedding and the UNet that reconstruct the input image and its inverse perspective mapping respectively. The inverse perspective mapping creates variance within the text-image space of the diffusion model, while providing weak guidance for aerial view synthesis. AerialBooth achieves the best viewpoint-fidelity trade-off though quantitative evaluation on 7 metrics analyzing viewpoint and fidelity w.r.t. input image.
arXiv.org
Divya Kothandaraman, Tianyi Zhou, Ming Lin, Dinesh Manocha
Aerial Diffusion generates aerial views from a single ground-view image using text guidance. Aerial Diffusion leverages a pretrained text-image diffusion model for prior knowledge. The authors address two main challenges corresponding to domain gap between the ground-view and the aerial view and the two views being far apart in the text-image embedding manifold. The approach uses a homography inspired by inverse perspective mapping prior to finetuning the pretrained diffusion model. Aerial Diffusion is the first approach that performs single image ground-to-aerial translation in an unsupervised manner.
SIGGRAPH Asia 2023 Technical Communications
Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Reddy Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha
Audio-language models (ALMs) trained using a contrastive approach (e.g., CLAP) that learns a shared representation between audio and language modalities have improved performance in many downstream applications, including zero-shot audio classification, audio retrieval, etc. However, the ability of these models to effectively perform compositional reasoning remains largely unexplored and necessitates additional research. Here, the authors propose CompA, a collection of two expert-annotated benchmarks, with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs. The proposed CompA-order evaluates how well an ALM understands the order or occurrence of acoustic events in audio, and CompA-attribute evaluates attribute binding of acoustic events.
arXiv.org
Biao Jia, Dinesh Manocha
Developing proficient brush manipulation capabilities in real-world scenarios is a complex and challenging endeavor, with wide-ranging applications in fields such as art, robotics, and digital design. Here, the authors introduce an approach designed to bridge the gap between simulated environments and real-world brush manipulation. The framework leverages behavior cloning and reinforcement learning to train a painting agent, seamlessly integrating it into both virtual and real-world environments.
arXiv.org
Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha
RECAP is effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning.
arXiv.org
Puneet Mathur, Rajiv Jain, Jiuxiang Gu, Franck Dernoncourt, Dinesh Manocha, Vlad Morariu
Proposes a new task of language-guided localized document editing, where the user provides a document and an open vocabulary editing request, and the intelligent system produces a command that can be used to automate edits in real-world document editing software.
37th AAAI Conference on Artificial Intelligence (AAAI-23)
Dinesh Manocha, Zherong Pan
AdVerb is a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, the researchers' approach incorporates the complementary visual modality to perform audio dereverberation.
SIGGRAPH 2017 (ACM SIGRAPH Information and Artifacts History website)
Sanjoy Chowdhury, Sreyan Ghosh, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha
AdVerb is a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio. Although audio-only dereverberation is a well-studied problem, the researchers' approach incorporates the complementary visual modality to perform audio dereverberation.
arXiv.org
Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Sakshi Singh, Sanjoy Chowdhury, Dinesh Manocha
Neural image classifiers can often learn to make predictions by overly relying on non-predictive features that are spuriously correlated with the class labels in the training data. This leads to poor performance in real-world atypical scenarios where such features are absent. Supplementing the training dataset with images without such spurious features can aid robust learning against spurious correlations via better generalization. This paper presents ASPIRE (Language-guided data Augmentation for SPurIous correlation REmoval), a simple yet effective solution for expanding the training dataset with synthetic images without spurious features.
arXiv.org
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Furong Huang, Mengdi Wang
consider a bilevel optimization problem and connect it to a principal-agent framework, where the principal specifies the broader goals and constraints of the system at the upper level and the agent solves a Markov Decision Process (MDP) at the lower level. They propose Principal driven Policy Alignment via Bilevel RL (PPA-BRL), which efficiently aligns the policy of the agent with the principal’s goals.
Interactive Learning with Implicit Human Feedback Workshop; ICML 2023
Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Dinesh Manocha, Furong Huang, Mengdi Wang
The authors consider a bilevel optimization problem and connect it to a principal-agent framework, where the principal specifies the broader goals and constraints of the system at the upper level and the agent solves a Markov Decision Process (MDP) at the lower level.
Interactive Learning with Implicit Human Feedback Workshop; ICML 2023
Puneet Mathur, Mihir Goyal, Ramit Sawhney, Ritik Mathur, Jochen L. Leidner, Franck Dernoncourt, Dinesh Manocha
Financial prediction is complex due to the stochastic nature of the stock market. Semi-structured financial documents present comprehensive financial data in tabular formats, and can often contain more than 100s tables worth of technical analysis along with a textual discussion of corporate history, and management analysis, compliance, and risks. Existing research focuses on the textual and audio modalities of financial disclosures from company conference calls to forecast stock volatility and price movement, but ignores the rich tabular data available in financial reports. Moreover, the economic realm is still plagued with a severe under-representation of various communities spanning diverse demographics, gender, and native speakers. In this work, the authors show that combining tabular data from financial semi-structured documents with text transcripts and audio recordings not only improves stock volatility and price movement prediction by 5-12% but also reduces gender bias caused due to audio-based neural networks by over 30%.
Findings of the Association for Computational Linguistics (EMNLP 2022)
Puneet Mathur, Rajiv Jain, Ashutosh Mehra, Jiuxiang Gu, Franck Dernoncourt, Anandhavelu N, Quan Tran, Verena Kaynig-Fittkau, Ani Nenkova, Dinesh Manocha, Vlad I. Morariu
LayerDoc is an approach that uses visual features, textual semantics, and spatial coordinates along with constraint inference to extract the hierarchical layout structure of documents in a bottom-up, layer-wise fashion.
Winter Conference on Applications of Computer Vision (WACV) 2023, The Computer Vision Foundation
Divya Kothandaraman, Sumit Shekhar, Abhilasha Sanchetil, Manoj Ghuhan, Tripti Shukla, Dinesh Manocha
SALAD is a method for the challenging vision task of adapting a pre-trained “source” domain network to a “target” domain, with a small budget for annotation in the “target” domain and a shift in the label space.
Winter Conference on Applications of Computer Vision (WACV) 2023, The Computer Vision Foundation
Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhao
This work introduces bidirectional edge diffraction response function (BEDRF), a new approach to model wave diffraction around edges with path tracing. The diffraction part of the wave is expressed as an integration on path space, and the wave-edge interaction is expressed using only the localized information around points on the edge similar to a bidirectional scattering distribution function (BSDF) for visual rendering. For an infinite single wedge, the authros' model generates the same result as the analytic solution. This approach can be easily integrated into interactive geometric sound propagation algorithms that use path tracing to compute specular and diffuse reflections.
arXiv.org
Sreyan Ghosh, Utkarsh Tyagi, Manan Suri, Sonal Kumar, S. Ramaneswaran, Dinesh Manocha
Complex Named Entity Recognition (NER) is the task of detecting linguistically complex named entities in low-context text. In this paper, the authors present ACLM (Attention-map aware keyword selection for Conditional Language Model fine-tuning), a novel data augmenta- tion approach, based on conditional generation, to address the data scarcity problem in low- resource complex NER. ACLM alleviates the context-entity mismatch issue, a problem exist- ing NER data augmentation techniques suffer from and often generates incoherent augmentations by placing complex named entities in the wrong context. ACLM builds on BART and is optimized on a novel text reconstruction or de-noising task. They use selective masking (aided by attention maps) to retain the named entities and certain keywords in the input sentence that provide contextually relevant additional knowledge or hints about the named entities. Compared with other data augmentation strategies, ACLM can generate more diverse and coherent augmentations preserving the true word sense of complex entities in the sentence. They demonstrate the effectiveness of ACLM both qualitatively and quantitatively on monolingual, cross-lingual, and multilingual complex NER across various low-resource settings. ACLM outperforms all our neural baselines by a significant margin (1%-36%). In addition, they demonstrate the application of ACLM to other domains that suffer from data scarcity (e.g., biomedical). In practice, ACLM generates more effective and factual augmentations for these domains than prior methods.
arXiv.org
Xijun Wang, Ruiqi Xian, Tianrui Guan, Dinesh Manocha
A new general learning approach for action recognition, Prompt Learning for Action Recognition (PLAR), which leverages the strengths of prompt learning to guide the learning process.
arXiv.org
Sreyan Ghosh, Sonal Kumar, Utkarsh Tyagi, Dinesh Manocha
Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Here, the authors present BioAug, a novel data augmentation framework for low-resource BioNER. BioAug, built on BART, is trained to solve a novel text reconstruction task based on selective masking and knowledge augmentation.
arXiv.org
Elizabeth Childs, Ferzam Mohammad, Logan Stevens, Hugo Burbelo, Amanuel Awoke, Nicholas Rewkowski, Dinesh Manocha
Although distance learning presents a number of interesting educational advantages as compared to in-person instruction, it is not without its downsides. Here the authors first assess the educational challenges presented by distance learning as a whole and identify 4 main challenges that distance learning currently presents as compared to in-person instruction: the lack of social interaction, reduced student engagement and focus, reduced comprehension and information retention, and the lack of flexible and customizable instructor resources. After assessing each of these challenges in-depth, they examine how AR/VR technologies might serve to address each challenge along with their current shortcomings, and finally outline the further research that is required to fully understand the potential of AR/VR technologies as they apply to distance learning.
IEEE Transactions on Visualization and Computer Graphics
Anton Ratnarajah, Dinesh Manocha
An end-to-end binaural audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications. The authors propose a novel neural-network-based binaural sound propagation method to generate acoustic effects for 3D models of real environments. Any clean audio or dry audio can be convolved with the generated acoustic effects to render audio corresponding to the real environment. They also include a graph neural network that uses both the material and the topology information of the 3D scenes and generates a scene latent vector, and a conditional generative adversarial network (CGAN) to generate acoustic effects from the scene latent vector.
Harvard University Astrophysics Data System
Souradip Chakraborty, Amrit Singh Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furon Huang
This work focuses on the challenge of detecting outputs generated by Large Language Models (LLMs) from those generated by humans. The ability to distinguish between the two is of utmost importance in numerous applications. However, the possibility and impossibility of such discernment have been subjects of debate within the community. Therefore, a central question is whether we can detect AI-generated text and, if so, when. In this work, we provide evidence that it should almost always be possible to detect the AI-generated text unless the distributions of human and machine generated texts are exactly the same over the entire support. This observation follows from the standard results in information theory and relies on the fact that if the machine text is becoming more like a human, we need more samples to detect it. We derive a precise sample complexity bound of AI-generated text detection, which tells how many samples are needed to detect. This gives rise to additional challenges of designing more complicated detectors that take in n samples to detect than just one, which is the scope of future research on this topic. Our empirical evaluations support our claim about the existence of better detectors demonstrating that AI-Generated text detection should be achievable in the majority of scenarios. Our results emphasize the importance of continued research in this area.
arXiv.org
Jaehoon Choi, Dongki Jung, Taejae Lee, Sangwook Kim, Youngdong Jung, Dinesh Manocha, Donghwan Lee
A new pipeline for acquiring a textured mesh in the wild with a single smartphone which offers access to images, depth maps, and valid poses. This method first introduces an RGBD-aided structure from motion, which can yield filtered depth maps and refines camera poses guided by corresponding depth. Then, the authors adopt the neural implicit surface reconstruction method, which allows for high-quality mesh and develops a new training process for applying a regularization provided by classical multi-view stereo methods. Moreover, they apply a differentiable rendering to fine-tune incomplete texture maps and generate textures which are perceptually closer to the original scene. The pipeline can be applied to any common objects in the real world without the need for either in-the-lab environments or accurate mask images.
arXiv.org
Divya Kothandaraman, Tianyi Zhou, Ming Lin, Dinesh Manocha
This is a novel method for generating aerial views from a single ground-view image using text guidance. Aerial Diffusion leverages a pretrained text-image diffusion model for prior knowledge. The authors address two main challenges corresponding to domain gap between the ground-view and the aerial view and the two views being far apart in the text-image embedding manifold. This approach uses a homography inspired by inverse perspective mapping prior to finetuning the pretrained diffusion model.
arXiv.org
Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
UnFuSeD is a novel approach to leverage self-supervised learning and reduce the need for large amounts of labeled data for audio classification. Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, the researchers use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.
arXiv.org
Pooja Guhan, Saayan Mitra, Somdeb Sarkhel, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, Aniket Bera, Dinesh Manocha
Content personalization is one of the foundations of today’s digital marketing. Often the same image needs to be adapted for different design schemes for content that is created for different occasions, geographic locations or other aspects of the target population. The authors present a novel reinforcement learning (RL) based method for automatically stylizing images to complement the design scheme of media, e.g., interactive websites, apps, or posters. The approach considers attributes related to the design of the media and adapts the style of the input image to match the context.
2022 IEEE International Symposium on Multimedia
Sreyan Ghosh, Manan Suri, Purva Chiniya, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha
CoSyn is a user- and conversational-context synergized network for detecting implicit hate speech in online conversation trees.
arXiv.org
James Mullen, Dinesh Manocha
PACE is a novel method for modifying motion-captured virtual agents to interact with and move throughout dense, cluttered 3D scenes. The approach changes a given motion sequence of a virtual agent as needed to adjust to the obstacles and objects in the environment.
IEEE Transactions on Visualization and Computer Graphics
Anton Jeran Ratnarajah, Dinesh Manocha
An end-to-end binaural impulse response generator (BIR) to generate plausible sounds in real-time for real-world models. This approach uses a novel neural-network-based BIR generator (Scene2BIR) for the reconstructed 3D model.
arXiv.org
2022
Qingyang Tan, Yi Zhou, Tuanfeng Wang, Duygu Ceylan, Xin Sun, Dinesh Manocha
Despite recent success, deep learning-based methods for predicting 3D garment deformation under body motion suffer from interpenetration problems between the garment and the body. To address this problem, this paper proposes a novel collision handling neural network layer called Repulsive Force Unit (ReFU). Based on the signed distance function (SDF) of the underlying body and the current garment vertex positions, ReFU predicts the per-vertex offsets that push any interpenetrating vertex to a collision-free configuration while preserving the fine geometric details.
2022 European Conference on Computer Vision (ECCV 2022)
Xutong Jin, Sheng Li, Guoping Wang, Dinesh Manocha
A novel learning-based modal sound synthesis approach that includes a mixed vibration solver for modal analysis and a radiation network for acoustic transfer.
ACM Transactions on Graphics
Trisha Mittal, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse, Dinesh Manocha
As tools for content editing mature, and AI-based algorithms for synthesizing media grow, the presence of manipulated content across online media is increasing. This causes the spread of misinformation, creating a greater need to distinguish between "real" and "manipulated" content. The authors present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated). Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face. VideoSham, on the other hand, contains more diverse, context-rich, and human-centric, high-resolution videos manipulated using a combination of 6 different spatial and temporal attacks. The researchers' analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham. They perform a user study on Amazon Mechanical Turk with 1200 participants to understand if they can differentiate between the real and manipulated videos in VideoSham. Finally, they dig deeper into the strengths and weaknesses of performances by humans and SOTA-algorithms to identify gaps that need to be filled with better AI algorithms.
arXiv.org, and accepted to WACV2023 - Workshop on Manipulation, Adversarial, and Presentation Attacks in Biometrics
Jiangbei Yue, Dinesh Manocha, He Wang
Trajectory prediction has been widely pursued in many fields, and many model-based and model-free methods have been explored. The former include rule-based, geometric or optimization-based models, and the latter are mainly comprised of deep learning approaches. This paper proposes a method combining both methodologies based on a new Neural Differential Equation model, (Neural Social Physics or NSP). NSP is a deep neural network within which is an explicit physics model with learnable parameters. The explicit physics model serves as a strong inductive bias in modeling pedestrian behaviors, while the rest of the network provides a strong data-fitting capability in terms of system parameter estimation and dynamics stochasticity modeling.
arXiv.org
Geosun Lee, Jennifer Healey, Dinesh Manocha
The paper presents VRDoc tools, designed to facilitate a better reading experience in virtual reality, from automating the selection and positioning of document windows to magnifying text for readability to allowing gaze-based navigation of longer documents. VRDoc provides users a set of three gaze-based interactions that improve users’ reading experience in a virtual environment: Gaze Select-and-Snap, Gaze MagGlass, and Gaze Scroll. The authors evaluate the results and observe considerable improvement over existing interaction methods.
arXiv.org
Puneet Mathur, Atula Neerkaje, Malika Chhibber, Ramit Sawhney, Fuming Guo, Franck Dernoncourt, Sanghamitra Dutta, Dinesh Manocha
Risk prediction and price movement classification are essential tasks in financial markets. Monetary policy calls (MPC) provide important insights into the actions taken by a country's central bank on economic goals related to inflation, employment, prices, and interest rates. Analyzing visual, vocal, and textual cues from MPC calls can help analysts and policymakers evaluate the economic risks and make sound investment decisions. To aid the analysis of MPC calls, the authors curate the Monopoly dataset, a collection of public conference call videos along with their corresponding audio recordings and text transcripts released by six international banks between 2009 and 2022. This dataset is the first attempt to explore the benefits of visual cues in addition to audio and textual signals for financial prediction tasks. The researchers introduce MPCNet, a competitive baseline architecture that takes advantage of the cross-modal transformer blocks and modality-specific attention fusion to forecast the financial risk and price movement associated with the MPC calls.
Proceedings of the 30th ACM International Conference on Multimedia
Xiao-Rui Chen, Min Tang, Cheng Li, Dinesh Manocha, Ruo-Feng Tong
The algorithm BADF (Bounding Volume Hierarchy Based Adaptive Distance Fields) accelerates the construction of ADFs (adaptive distance fields) of rigid and deformable models on graphics processing units. The authors' approach is based on constructing a bounding volume hierarchy (BVH) and that hierarchy is used to generate an octree-based ADF.
Journal of Computer Science and Technology
Shiguang Liu (Tianjin University, China), Dinesh Manocha
This book gives a broad overview of research on sound simulation driven by a variety of applications. Vibrating objects produce sound, which then propagates through a medium such as air or water before finally being heard by a listener. As a crucial sensory channel, sound plays a vital role in many applications. There is a well-established research community in acoustics that has studied the problems related to sound simulation for six decades. Some of the earliest work was motivated by the design of concert halls, theaters, or lecture rooms with good acoustic characteristics. These problems also have been investigated in other applications, including noise control and sound design for urban planning, building construction, and automotive applications. Moreover, plausible or realistic sound effects can improve the sense of presence in a virtual environment or a game. In these applications, sound can provide important clues such as source directionality and spatial size. The book first surveys various sound synthesis methods, including harmonic synthesis, texture synthesis, spectral analysis, and physics-based synthesis. Next, it provides an overview of sound propagation techniques, including wave-based methods, geometric-based methods, and hybrid methods. The book also summarizes various techniques for sound rendering. Finally, it surveys some recent trends, including the use of machine learning methods to accelerate sound simulation and the use of sound simulation techniques for other applications such as speech recognition, source localization, and computer-aided design.
Part of the Springer book series, "Synthesis Lectures on Visual Computing: Computer Graphics, Animation, Computational Photography and Imaging (SLVCCGACPI)"
James F. Mullen Jr, Divya Kothandaraman, Aniket Bera, Dinesh Manocha
Here is a method for placing a 3D human animation into a 3D scene while maintaining any human-scene interactions in the animation. The authors use the notion of computing the most important meshes in the animation for the interaction with the scene, called "keyframes." These keyframes allow better optimization of the placement of the animation into the scene such that interactions in the animations match the affordances of the scene.
WACV 2023 and arXiv.org
Qingyang Tan, Zherong Pan, Breannan Smith, Takaaki Shiratori, Dinesh Manocha
A robust learning algorithm to detect and handle collisions in 3D deforming meshes.
39th International Conference on Machine Learning
Y.D. Li, M. Tang, Y. Yang, Z. Huang, R.F. Tong, S.C. Yang, Y. Li, Dinesh Manocha
A mesh-based learning approach (N-Cloth) for plausible 3D cloth deformation prediction. This approach is general and can handle cloth or obstacles represented by triangle meshes with arbitrary topologies. | View supplementary material here |
Eurographics 2022 (Computer Graphics Forum)
Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of mpersonalization via local information. In this work, the authors calibrate this tradeoff in a quantitative manner through a multi-criterion optimization-based framework, which is cast as a constrained program: the objective for a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model.
arXiv.org
Anton Jeran Ratnarajah, Rohith Chandrashekar Aralikatti, Zhenyu Tang, Dinesh Manocha
The authors propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing.
arXiv.org
Xiaoyu Pan, Jiaming Mai, Xinwei Jiang, Dongxue Tang, Jingxiang Li, Tianjia Shao, Kun Zhou, Xiaogang Jin, Dinesh Manocha
The authors present a learning algorithm that uses bone-driven motion networks to predict the deformation of loose-fitting garment meshes at interactive rates. Given a garment, they generate a simulation database and extract virtual bones from simulated mesh sequences using skin decomposition. At runtime, low- and high-frequency deformations are separately computed in a sequential manner.
Presented at the SIGGRAPH 22 Conference.
Shiguang Liu, Dinesh Manocha
This book chapter, based on a lecture, gives a broad overview of research on sound simulation driven by a variety of applications.
Book chapter in Synthesis Lectures on Visual Computing: Computer Graphics, Animation, Computational Photography and Imaging
Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, and Ram D. Sriram
In this work with NIST researchers, Manocha proposes a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
arXiv.org
Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
Introduces the idea of using reverberant acoustics within a room to help automatic speech recognition systems better separate speakers in a "cocktail party" situation.
2021 IEEE Automatic Speech Recognition and Understanding Workshop
Chunxiao Cao, Zili An, Zhong Ren, Dinesh Manocha, Kun Zhou
In developing virtual acoustic environments, it is important to understand the relationship between the computation cost and the perceptual significance of the resultant numerical error. This paper proposes a quality criterion that evaluates the error significance of path-tracing-based sound propagation simulators. The authors present an analytical formula that estimates the error signal power spectrum. The proposed criterion can explain the human perception of simulation error in a variety of cases.
IEEE Transactions on Visualization and Computer Graphics
Trisha Mittal; Vishy Swaminathan; Somdeb Sarkhel; Ritwik Sinha; David Arbour; Saayan Mitra; Dinesh Manocha
BOhance is an efficient solution for optimizing digital content like images. It effectively extends A/B testing in the continuous domain where A/B testing cannot efficiently test infinitely many variants. BOhance auto-generates candidate content variants to be tested based on the human feedback on prior variants. The authors' experiments show that given a human-enhanced image and an image generated by BOhance, 53% users think that the BOhance image was generated by a human expert.
2021 IEEE International Symposium on Multimedia (ISM)
2021
Niall Williams, Aniket Bera, Dinesh Manocha
A metric to analyze the similarity between the physical environment and the virtual environment for natural walking in virtual reality. ENI is the first general metric that can automatically identify regions of high and low compatibility in physical and virtual environments.
arXiv.org
Rohan Chandra, Aniket Bera, Dinesh Manocha
Socially aware autonomous vehicles can be designed if there exists a mechanism to understand the behaviors of human drivers. This paper present an approach for autonomous vehicles that leverages machine learning to predict the behaviors of human drivers.
arXiv.org
Ruichan Wang, Dinesh Manocha
5G applications have become increasingly popular as 5G network deployment has grown. For vehicular networks, mmWave band signals have been well studied and used for communication and sensing. The authors propose a dynamic ray tracing algorithm that exploits spatial and temporal coherence.
arXiv.org
Yudi Li, Min Tang, Yun Yang, Zi Huang, Schuangcai Yang, Yao Li, Dinesh Manocha
N-Cloth is a mesh-based learning approach for plausible 3D cloth deformation prediction. The general approach can handle cloth or obstacles represented by triangle meshes with arbitrary topology. Graph convolution is used to transform the cloth and object meshes into a latent space to reduce the non-linearity in the mesh space.
arXiv.org
Uttaran Bhattacharya, Gang Wu, Stefano Petrangeli, Viswanathan Swaminathan, Dinesh Manocha
A domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
Computer Vision Foundation
Cheng Li, Min Tang, Ruofeng Tong, Ming Cai, Jieyi Zhao, Dinesh Manocha
A parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes.
arXiv.org
Qingyang Tan, Zherong Pan, Breannan Smith, Takaaki Shiratori, Dinesh Manocha
A robust learning algorithm to detect and handle collisions in 3D deforming meshes. The collision detector is represented as a bilevel deep autoencoder with an attention mechanism that identifies colliding mesh sub-parts.
arXiv.org
Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu
A neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
arXiv.org
Pooja Guhan, Naman Awasthi, Ritwika Das, Manas Agarwal, Kathryn McDonald, Kristin Bussell, Gloria Reeves, Dinesh Manocha, Aniket Bera
With an aim to improve existing telemental health services, the authors present TeleEngage, a novel framework leveraging semi-supervised multimodal GAN to detect engagement levels during conversations from videos. Inspired by psychology practices used to capture patient engagement, the researchers create features for affective and cognitive engagement.
UMD GAMMA Lab
Sijia Li, Shiguang Liu, Dinesh Manocha
A learning-based approach for generating binaural audio from mono audio using multi-task learning.
arXiv.org
Nannan Wu, Qianwen Chao, Yanzhen Chen, Weiwei Xu, Chen Liu, Dinesh Manocha, Wenxin Sun, Yi Han, Xinran Yao, Xiaogang Jin
A CPU-based real-time cloth animation method for dressing virtual humans of various shapes and poses.
IEEE Transactions on Visualization and Computer Graphics
Xutong Jin, Sheng Li, Dinesh Manocha, Guoping Wang
A learning-based approach to compute the eigenmodes and acoustic transfer data for the sound synthesis of arbitrary solid objects. The approach combines two network-based solutions to formulate a complete learning-based 3D modal sound model.
arXiv.org
Dongki Jung, Jaehoon Choi, Yonghan Lee, Deokhwa Kim, Changick Kim, Dinesh Manocha, Donghwan Lee
This approach estimates depth from a monocular camera as it moves through complex and crowded indoor environments, e.g., a department store or a metro station. The approach predicts absolute scale depth maps over the entire scene consisting of a static background and multiple moving people, by training on dynamic scenes. Supplemental material here.
arXiv.org
Uttaran Bhattacharya, Elizabeth Childs, Nicholas Rewkowski, Dinesh Manocha
A generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions.
arXiv.org
Puneet Mathur, Rajiv Jain, Franck Dernoncourt, Vlad Morariu, Quan Hung Tran, Dinesh Manocha
TIMERS is a TIME, Rhetorical and Syntactic-aware model for document-level temporal relation classification. TIMERS leverages rhetorical discourse features and temporal arguments from semantic role labels, in addition to traditional local syntactic features, trained through a Gated Relational-GCN.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (short papers)
Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
A new approach to improving the performance of reverberant speech separation, based on an accurate geometric acoustic simulator (GAS) which generates realistic room impulse responses (RIRs) by modeling both specular and diffuse reflections.
arXiv.org
Niall L. Williams, Aniket Bera, Dinesh Manocha
A new approach for redirected walking in static and dynamic virtual environment scenes that uses techniques from robot motion planning to compute the redirection gains that steer the user on collision-free paths in the physical space.
IEEE Transactions on Visualization and Computer Graphics
Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha
A novel geometric deep learning method to compute the acoustic scattering properties of geometric objects. This learning algorithm uses a point cloud representation of objects to compute the scattering properties and integrates them with ray tracing for interactive sound propagation in dynamic scenes.
arXiv.org
Zhenyu Tang, Hsien-Yu Meng, Dinesh Manocha
A novel hybrid sound propagation algorithm for interactive applications. The approach is designed for dynamic scenes and uses a neural network-based learned scattered field representation along with ray tracing to efficiently generate specular, diffuse, diffraction and occlusion effects.
2021 IEEE Conference on Virtual Reality and 3D User Interfaces
Uttaran Bhattacharya, Nicholas Rewkowski, Abhishek Banerjee, Pooja Guhan, Aniket Bera, Dinesh Manocha
Text2Gestures is a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs.
2021 IEEE Conference on Virtual Reality and 3D User Interfaces
Zhenyu Tang, Dinesh Manocha
A novel method for generating scene-aware training data for far-field automatic speech recognition, using a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples.
arXiv.org
Niall Williams, Aniket Bera, Dinesh Manocha
The authors provide a generalized definition of alignment that allows it to be used in any research problem. They present an example of how alignment can be used to yield significant improvements in VR locomotion with redirected walking.
2021 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)
Trisha Mittal, Aniket Bera, Dinesh Manocha
A learning model for multimodal context-aware emotion recognition that combines multiple human co-occurring modalities(such as facial, audio, textual, and pose/gaits) and two interpretations of context.
IEEE Multimedia
Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
A method for improving the quality of synthetic room impulse responses for far-field speech recognition. The authors bridge the gap between the fidelity of synthetic room impulse responses (RIRs) and real room impulse responses using a novel, TS-RIRGAN architecture.
arXiv.org
Trisha Mittal, Puneet Mathur, Aniket Bera, Dinesh Manocha
Affect2MM is a learning method for time-series emotion prediction for multimedia content. Its goal is to automatically capture varying emotions depicted by characters in real-life human-centric situations and behaviors. This method uses ideas from emotion causation theories to computationally model and determine the emotional state evoked in movie clips.
arXiv.org
Amanuel Awoke, Hugo Burbelo, Elizabeth Childs, Ferzam Mohammad, Logan Stevens, Nicholas Rewkowski, Dinesh Manocha
Distance learning presents a number of challenges. The authors identify four: the lack of social interaction, reduced student engagement and focus, reduced comprehension and information retention, and the lack of flexible and customizable instructor resources. They then examine how AR/VR technologies might address each challenge, and outline the further research that is required to fully understand the potential.
arXiv.org
Uttaran Bhattacharya, Nicholas Rewkowski, Abhishek Banerjee, Pooja Guhan, Aniket Bera, Dinesh Manocha
Text2Gestures is a transformer-based learning method to interactively generate emotive full-body gestures for virtual agents aligned with natural language text inputs. This method generates emotionally expressive gestures by utilizing relevant biomechanical features for body expressions, also known as affective features.
arXiv.org
Nannan Wu, Qianwen Chao, Yanzhen Chen, Weiwei Xu, Chen Liu, Dinesh Manocha, Wenxin Sun, Yi Han, Xinran Yao, Xiaogang Jin
A real-time cloth animation method for dressing virtual humans of various shapes and poses. The approach formulates clothing deformation as a high-dimensional function of body shape parameters and pose parameters.
arXiv.org
2020
Feixiang Lu, Zongdai Liu, Hui Miao, Peng Wang, Liangjun Zhang, Ruigang Yang, Dinesh Manocha, Bin Zhou
Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensure the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, the authors address this important problem for autonomous driving by solving two critical issues using visual data augmentation.
arXiv.org
Sheng Li, Xiang Gu, Kangrui Yi, Yanlin Yang, Guoping Wang, Dinesh Manocha
This experiment investigated the occurrence of self-illusion and its contribution to realistic behavior consistent with a virtual role in virtual environments.
IEEE Transactions on Visualization and Computer Graphics
Yuexin Ma, Xinge Zhu, Xinjing Cheng, Ruigang Yang, Jiming Liu, Dinesh Manocha
A label-free algorithm for trajectory extraction and prediction to use raw videos directly. To better capture the moving objects in videos, the authors introduce dynamic points to model dynamic motions by using a forward-backward extractor to keep temporal consistency and using image reconstruction to keep spatial consistency in an unsupervised manner. The method is the first to achieve unsupervised learning of trajectory extraction and prediction.
2020 European Conference on Computer Vision
Zehui Lin, Sheng Li, Xinlu Zeng, Congyi Zhang, Jinzhu Jia, Guoping Wang, Dinesh Manocha
This chi-squared progressive photon mapping algorithm (CPPM) constructs an estimator by controlling the bandwidth to obtain superior image quality.
ACM Transactions on Graphics
Shiguang Liu, Dinesh Manocha
This is a broad overview of research on sound simulation in virtual reality, games, etc. It first surveys various sound synthesis methods,including harmonic synthesis, texture synthesis, spectral analysis, and physics-based synthesis. Then, it summarizes popular sound propagation techniques, namely wave-based methods, geometric-based methods, and hybrid methods. Next, sound rendering methods are reviewed. The authors also highlight some recent methods that use machine learning techniques for synthesis, propagation, and some inverse problems.
arXiv.org
Pooja Guhan, Manas Agarwal, Naman Awasthi, Gloria Reeves, Dinesh Manocha, Aniket Bera
ABC-Net is a semi-supervised multi-modal GAN framework based on psychology literature that detects engagement levels in video conversations. It uses three constructs—behavioral, cognitive, and affective engagement—to extract various features that can effectively capture engagement levels.
arXiv.org
Uttaran Bhattacharya, Nicholas Rewkowski, Pooja Guhan, Niall L. Williams, Trisha Mittal, Aniket Bera, Dinesh Manocha
This autoregression network generates virtual agents that convey various emotions through their walking styles or gaits.
arXiv.org
Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee
An algorithm for self-supervised monocular depth completion in robotic navigation, computer vision and autonomous driving. The approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels. Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv.org
Qingyang Tan, Zherong Pan, Dinesh Manocha
LCollision is a learning-based method that synthesizes collision-free 3D human poses. LCollision is the first approach that can obtain high accuracy in handling non-penetration and collision constraints in a learning framework.
arXiv.org
Rohan Chandra, Aniket Bera, Dinesh Manocha
Autonomous vehicles behave conservatively in a traffic environment with human drivers and do not adapt to local conditions and socio-cultural norms. However, socially aware AVs can be designed if there exists a mechanism to understand the behaviors of human drivers. In this example of Machine Theory of Mind (M-ToM) the authors infer the behaviors of human drivers by observing the trajectory of their vehicles. "StylePredict" is based on trajectory analysis of vehicles. It mimics human ToM to infer driver behaviors, or styles, using a computational mapping between the extracted trajectory of avehicle in traffic and the driver behaviors using graph-theoretic techniques, including spectral analysis and centrality functions. StylePredict can analyze driver behavior in the USA, China, India, and Singapore, based on traffic density, hetero-geneity, and conformity to traffic rules.
arXiv.org
Angelos Mavrogiannis, Rohan Chandra, Dinesh Manocha
A learning algorithm for action prediction and local navigation for autonomous driving that classifies the driver behavior of other vehicles or road-agents (aggressive or conservative) and takes that into account for decision making and safe driving.
IEEE Robotics and Automation Letters 2022
Divya Kothandaraman, Rohan Chandra, Dinesh Manocha
An unsupervised multi-source domain adaptive semantic segmentation approach for autonomous vehicles in unstructured and unconstrained traffic environments.
arXiv.org
Mingliang Xu, Chaochao Li, Pei Lv, Wei Chen, Zhigang Deng, Bing Zhou, Dinesh Manocha
CubeP is a model for crowd simulation that comprehensively considers physiological, psychological, and physical factors. Inspired by the theory of “the devoted actor”, the model determines the movement of each individual by modeling the physical influence from physical strength and emotion. This is the first time that physiological, psychological, and physical factors are integrated in a unified manner, and the relationship between the factors is explicitly determined. The new model is capable of generating effects similar to real-world scenarios and can also reliably predict the changes in the physical strength and emotion of individuals in an emergency situation.
arXiv.org
Xutong Jin, Sheng Li, Tianshu Qu, Dinesh Manocha, Guoping Wang
Model sound synthesis is a physically-based sound synthesis method used to generate audio content in games and virtual worlds. This paper presents a novel learning-based impact sound synthesis algorithm called Deep-Modal. The approach can handle sound synthesis for common arbitrary objects, especially dynamic generated objects, in real time.
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
Zhenyu Tang, Hsien-Yu Meng, Dinesh Manocha
A novel hybrid sound propagation algorithm for interactive applications.
arXiv.org
Sarala Padi, Dinesh Manocha, Ram Sriram
A novel, Multi-Window Data Augmentation(MWA-SER) approach for speech emotion recognition.MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method to generate additional data samples and building the deep learning models to recognize the underlying emotion of an audio signal.
arXiv.org
Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
The paper presents a Generative Adversarial Network (GAN) based room impulse response generator for generating realistic synthetic room impulse responses.
arXiv.org
Cheng Li, Min Tang, Ruofeng tong, Ming Cai, Jieyi Zhao, Dinesh Manocha
Cloth simulation is an active area of research in computer graphics, computer-aided design (CAD) and the fashion industry. Over the last few decades many methods have been proposed for solving the underlying dynamical system with robust collision handling. The paper presents a novel parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes. It is the first approach that can perform almost interactive complex cloth simulation with wrinkles, friction and folds on commodity workstations.
arXiv.org
Feixiang Lu, Zongdai Liu, Xibin Song, Dingfu Zhou, Wei Li, Hui Miao, Miao Liao, Liangjun Zhang, BinZhou, Ruigang Yang, Dinesh Manocha
The paper presents a robust and effective approach to reconstruct complete 3D poses and shapes of vehicles from a single image. It introduces a novel part-level representation for vehicle segmentation and 3D reconstruction, which significantly improves performance.
arXiv.org
Andrew Best, Sahil Narang, Dinesh Manocha
Sense-Plan-act (SPA) is a new approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments. It extends prior work in propositional planning and natural language processing to enable agents to plan with uncertain information, and leverage question and answer dialogue with other agents and avatars to obtain the needed information and complete their goals. The agents are additionally able to respond to questions from the avatars and other agents using natural-language enabling real-time multi-agent multi-avatar communication environments.
arXiv.org
Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha
RoadTrack is a realtime tracking algorithm for autonomous driving that tracks heterogeneous road-agents in dense traffic videos. The approach is designed for dense traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road.
GAMMA website
Zhiming Hu, Sheng Li, Congyi Zhang, Kangrui Yi, Guoping Wang, Dinesh Manocha
DGaze is a CNN-based model that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions in HMD-based applications. The model can be applied to predict not only real-time gaze positions but also gaze positions in the near future and can achieve better performance than prior method.
IEEE Transactions on Visualization and Computer Graphics
Srihari Pratapa, Dinesh Manocha
RANDM is a random-access depth map compression algorithm for interactive rendering. The compressed representation provides random access to the depth values and enables real-time parallel decompression on commodity hardware. This method partitions the depth range captured in a given scene into equal-sized intervals and uses this partition to generate three separate components that exhibit higher coherence. Each of these components is processed independently to generate the compressed stream.
GAMMA website
Rohan Chandra, Uttaran Bhattacharya, Trisha Mittal, Aniket Bera, Dinesh Manocha
CMetric classifies driver behaviors using centrality functions. The formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. This approach is designed for real-time autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video.
arXiv.org
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
The paper presents a learning-based method for detecting fake videos. The authors use the similarity between audio-visual modalities and the similarity between the affective cues of the two modalities to infer whether a video is “real” or “fake.”
arXiv.org
Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
EmotiCon is a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. It uses multiple modalities of faces and gaits, background visual information and socio-dynamic inter-agent interactions to infer the perceived emotion. EmotiCon outperforms prior context-aware emotion recognition methods.
arXiv.org
Abhishek Kumar, Trisha Mittal, Dinesh Manocha
MCQA is a learning-based algorithm for multimodal question answering that explicitly fuses and aligns the multi-modal input (i.e. text, audio, and video) forming the context for the query (question and answer).
arXiv.org
Zhenyu Tang, Dinesh Manocha
Modern computer graphics applications including virtual reality and augmented reality have adopted techniques for both visual rendering and audio rendering. While visual rendering can already synthesize virtual objects into the real world seamlessly, it remains difficult to correctly blend virtual sound with real-world sound using state-of-the-art audio rendering. When the virtual sound is generated unaware of the scene, the corresponding application becomes less immersive, especially for AR. The authors present their current work on generating scene-aware sound using ray-tracing based simulation combined with deep learning and optimization.
2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops
Micah Taylor, Anish Chandak, Lakulish Antani, Dinesh Manocha
An algorithm and system for sound propagation and rendering in virtual environments and media applications. The approach uses geometric propagation techniques for fast computation of propagation paths from a source to a listener and takes into account specular reflections, diffuse reflections, and edge diffraction.
Intel
SriSai Naga Jyotish Poonganam, Bharath Gopalakrishnan, Venkata Seetharama Sai Bhargav Kumar Avula, K. Madhava Krishna, Arun Kumar Singh, Dinesh Manocha
A new model predictive control framework that improves reactive navigation for autonomous robots. The framework allows roboticists to compute low cost control inputs while ensuring some upper bound on the risk of collision.
IEEE Robotics and Automation Letters
Dinesh Manocha, Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Tanmay Randhavane
The authors' RoadTrack algorithm could help autonomous vehicles navigate dense traffic scenarios. The algorithm uses tracking-by-detection approach to detect vehicles and pedestrians, then predict where they are going.
IROS 2019
2019
Kurt Gray, Tanmay Randhavane, Kyra Kapsaskis, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
A data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures.
arXiv.org
Rohan Chandra, Tianrui Guan, Srujan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
A novel approach for traffic forecasting in urban traffic scenarios using a combination of spectral graph analysis and deep learning.
arxiv.org
Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu
The authors improve the cross-target and cross-scene generalization of visual navigation through a learning agent guided by conceiving the next observations it expects to see. A variational Bayesian model, NeoNav, generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view.
PDF
Uttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra,Aniket Bera, Dinesh Manocha
The paper presents an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or from motion-captured data and represented as sequences of 3D poses.
arxiv.org
Chaochao Li, Pei Lv, Mingliang Xu, Xinyu Wang, Dinesh Manocha, Bing Zhou, Meng Wang
In many applications such as human-robot interaction, autonomous driving or surveillance, it is important to accurately predict pedestrian trajectories for collision-free navigation or abnormal behavior detection. The authors present a novel trajectory prediction algorithm for pedestrians based on a personality-aware probabilistic feature map.
arxiv.org
Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tanmay Randhavane (UNC), Aniket Bera, and Dinesh Manocha
STEP is a novel classifier network able to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network architecture. Given an RGB video of an individual walking, STEP implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. | Watch a video about STEP |
arxiv.org
Rohan Chandra, Uttaran Bhattacharya, Trisha Mittal, Xiaoyu Li, Aniket Bera, Dinesh Manocha
The GraphRQI algorithm identifies driver behaviors from road agent trajectories. It is 25 percent more accurate over prior behavior classification algorithms for autonomous vehicles. | Watch a video about GraphRQI |
arxiv.org
Qingyang Tan, Zherong Pan, Lin Gao, and Dinesh Manocha
A new method bridges the gap between mesh embedding and physical simulation for efficient dynamic models of clothes. The key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies, and a new mesh embedding approach based on physics-inspired loss term. After training, the learned simulator runs10–100 times faster and the accuracy is high enough for robot manipulation tasks. | Watch a video about this method |
arxiv.org
2022
Sarah Buchanan, Karen Gracy, Joshua Kitchens, Richard Marciano
A discussion of the use of Computational Thinking (CT) in Archival Educators’ instruction towards enhancing the training and professional development of the library and archival workforce to meet the needs of their communities, and enhancing digital collection management and access to information and resources through retrospective and born-digital content. Four educators share their teaching strategies aimed at modernizing the way digital LIS and computational education are conducted. Their goal is to create an active and engaged community of future archival practitioners, ready to tackle the digital records and archives future.
ai-collaboratory.net
Richard Marciano, Rosemary Grant, Alexis Hill, Phillip Nicholas, Noah Scheer, Alan Wierdak, Mark Conrad, Arthur Ray McCoy
This paper illustrates how to design and implement an engaged computational archival framework that leverages big archival records to respond to social justice and reparations policy imperatives. The work touches on two themes: (1) how to handle histories of people whose lives were deeply impacted by public authorities, and (2) Archives as Big Data as a potential restorative strategy.
EasyChair.org preprint No. 8443
Kristin Strigel Carter, Abby Gondek, William Underwood, Teddy Randby, Richard Marciano
This paper brings together archivists, scholars, and technologists to demonstrate computational treatments of digital cultural assets using Artificial Intelligence and Machine Learning techniques that can help unlock hard-to-reach archival content. It describes an extended, iterative study applied to digitized and datafied WWII-era records housed at the FDR Presidential Library, rich content that is regrettably under-utilized by scholars examining American responses to the Holocaust.
AI & Society
Jennifer Proctor, Richard Marciano
This paper proposes, tests, and evaluates an innovative Computational Archival Science (CAS) framework to enhance the ability to link people, places, and events depicted in historical photography collections. The protocol combines elements of computer vision with natural language processing, entity extraction, and metadata linking techniques to transform and connect existing archival metadata. Development of the framework is built upon a case study based on the Spelman College Archives Photograph Collection and provides background information, reports on the text processing, image analysis, semantic linking, and evaluation aspects associated with the design and use of the AI-supported framework.
2021 IEEE International Conference on Big Data
Aravind Inbasekaran, Rajesh Kumar Gnanasekaran, Richard Marciano
Spelling correction is one of the well-known tasks in Natural Language Processing. Spelling correction of an individual word could be performed through existing tools, however, correcting a word based on the context of the sentence is a challenging task that requires a human-level understanding of the language. The authors introduce a novel experiment of applying Natural Language Processing using a machine learning concept called Transfer Learning 3 on the text extracted by OCR tools, thereby optimizing the output text by reducing misspelled words.
2021 IEEE International Conference on Big Data
Lencia Beltran, Emily Ping O’Brien, Gregory Jansen, Richard Marciano
The entity resolution technique known as “fuzzy matching” and other computational approaches are used to unlock and link biographical data from WWII Japanese American incarceration camps. The authors demonstrate the construction of social graphs that are able to link people, places, and events; support further scholarship; and reveal hidden stories in historical events, especially in contested archival sources. They also show the power of computational analysis to recreate event networks and represent movements of people using maps.
2021 IEEE International Conference on Big Data
2020
Gregory Jansen, Aaron Coburn, Adam Soroka, Richard Marciano
Describes the development and testing of the next-generation Trellis Linked Data Platform with Memento versioning support.
dcicblog.umd.edu
2019
William Underwood and Richard Marciano
This paper explores whether the computational thinking practices of mathematicians and scientists in the physical and biological sciences are also the practices of archival scientists. It is argued that these practices are essential elements of an archival science education in preparing students for a professional archival career.
Marciano and 20 students in the Digital Curation Innovation Center developed a reframing model for digital curation through computational thinking. Their case study involves adding metadata to non-digital primary records from the WWII Tule Lake Japanese American Internment Camp. Their curation methods led to the discovery of new narratives and connections from this data.
2021
Semih Kara, Nuno Martins
Investigate methods to characterize the stability of a continuous-time dynamical system that models the dynamics of non-cooperative strategic interactions among the members of large populations of bounded rationality agents.
arXiv.org
2020
Murat Arcak, Nuno Martins
Presents dissipativity tools to establish global asymptotic stability of the set of Nash equilibria in a deterministic model of population games.
arXiv.org
2023
Sunandita Patra, Paul Rademacher, Kristen Jacobson, Kyle Hassold, Onur Kulaksizoglu, Laura Hiatt, Mark Roberts, Dana Nau
The complexity of an environment and the difficulty of an actor's goals both impact transfer learning in Reinforcement Learning (RL). Yet, few works have examined using the environment and goals in tandem to generate a learning curriculum that improves transfer. To explore this relationship, we introduce a task graph that quantifies the environment complexity using environment descriptors and the goal difficulty using goal descriptors; edges in the task graph indicate a change in the environment or the goal. We use the task graph in two sets of studies. First, we evaluate the task graph in two synthetic environments where we control environment and goal complexity. Second, we introduce an algorithm that generates a Task-Graph Curriculum to train policies using the task graph. In a delivery environment with up to ten skills, we demonstrate that a planner can execute these trained policies to achieve long-horizon goals in increasingly complex environments. Our results demonstrate that (1) the task graph promotes skill transfer in the synthetic environments and (2) the Task-Graph Curriculum trains nearly perfect policies and does so significantly faster than learning a policy from scratch.
GenPlan 2023
Paul Zaidins, Mark Roberts, Dana Nau
Two recent approaches to HTN replanning, IPyHOP and SHOPFIXER, replan by adapting the previously planned solution when an action fails. IPyHOP replans the entire solution tree after the failure, while SHOPFIXER uses pre-calculated dependency graphs to replace portions of the tree; neither uses forward simulation of the plan to predict where future failures might occur. This paper describes IPyHOPPER, which improves IPyHOP by retaining more of the information provided by the hierarchy and using forward simulation to repair minimal subtrees that contain future failures.
6th ICAPS Workshop on Hierarchical Planning (HPlan 2023)
2022
Vincent Hsiao, Dana Nau
The paper proposes a Mean Field Game (MFG) generalization, denoted Pair-MFG, of the spatial evolutionary game model such that the behavior of a given spatial evolutionary game (or more specifically the behavior of its pair approximation) is a special case trajectory of the corresponding MFG.
Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS '22)
Ruoxi Li, Mark Roberts, Morgan Fine-Morris, Dana Nau
Teachable-HTN-Maker is a modified version of the well-known HTN-Maker algorithm that learns Hierarchical Task Network (HTN) methods. Instead of learning methods from all subsequences of a solution plan as HTN-Maker does, Teachable-HTN-Maker learns from a curriculum consisting of examples that are presented in a meaningful order. When Teachable-HTN-Maker is compared against HTN-Maker in two planning domains, the authors observe that it learns fewer methods and better ones.
5th ICAPS Workshop on Hierarchical Planning (HPlan 2022)
Sunandita Patra, Mark Cavolowsky, Onur Kulaksizoglu, Ruoxi Li, Laura M. Hiatt, Ruoxi Li, Dana Nau
Hierarchy and curricula are two techniques commonly used to improve training for Reinforcement Learning (RL) agents. Yet few works have examined how to leverage hierarchical planning to generate a curriculum for training RL Options. The authors formalize a goal skill that extends an RL Option with state-based conditions that must hold during training and execution. They then define a Goal-Skill Network that integrates a Hierarchical Goal Network, a variant of hierarchical planning, with goal skills as the leaves of the network.
35th International Florida Artificial Intelligence Research Society (FLAIRS) Conference
Yash Bansod, Sunandita Patra, Dana Nau, Mark Roberts
When an actor executes a plan, action failures and exogenous events may lead to unexpected states that require replanning from the middle of plan execution. In Hierarchical Task Network (HTN) planning, unless the HTN methods have been carefully written to work well in unexpected states, replanning may either fail or produce plans that perform poorly. To overcome this problem, the authors introduce IPyHOP, a reentrant version of GTPyhop (a SHOP-like HTN planner), and Run-Lazy-Refineahead, a modification of the Run-Lazy-Lookahead actor that utilizes IPyHOP’s reentrant replanning capability to replan during plan execution.
35th International Florida Artificial Intelligence Research Society (FLAIRS) Conference
2021
Sunandita Patra, James Mason, Malik Ghallab, Paolo Traverso, Dana Nau
The authors demonstrate a system with integrated acting, planning and learning algorithms that uses hierarchical operational models to perform tasks in dynamically changing environments. In this acting and planning engine, both planning and acting use the same operational models. These rely on hierarchical task-oriented refinement methods offering rich control structures.
Demonstration at the Thirty-First International Conference on Automated Planning and Scheduling (ICAPS 2021)
Yash Bansod, Dana Nau, Sunandita Patra, Mark Roberts
A major problem with integrating HTN planning and acting is that, unless the HTN methods are very carefully written, unexpected problems can occur when attempting to replan if execution errors or other unexpected conditions occur during acting. To overcome this problem, we present a re-entrant HTN planning algorithm that can be restarted for replanning purposes at the point where an execution error occurred, and an HTN acting algorithm that can restart the HTN planner at this point. We show through experiments that our algorithm is an improvement over a widely used approach to planning and control.
Proceedings of the 4th ICAPS Workshop on Hierarchical Planning (HPlan 2021)
Dana Nau, Yash Bansod, Sunandita Patra, Mark Roberts, Ruoxi Li
The Pyhop planner, released in 2013, was a simple SHOP-style planner written in Python. It was designed to be easily usable as an embedded system in conventional applications such as game programs. Although little effort was made to publicize Pyhop, its simplicity, ease of use, and understandability led to its use in a number of projects beyond its original intent, and to publications by others.
GTPyhop (Goal-and-Task Pyhop) is an extended version of Pyhop that can plan for both goals and tasks, using a combination of SHOP-style task decomposition and GDP-style goal decomposition. It provides a totally-ordered version of Goal-Task-Network (GTN) planning without sharing and task insertion. GTPyhop’s ability to represent and reason about both goals and tasks provides a high degree of flexibility for representing objectives in whichever form seems more natural to the domain designer.
Proceedings of the 4th ICAPS Workshop on Hierarchical Planning (HPlan 2021)
Ruoxi Li, Sunandita Patra, Dana Nau
The authors describe Dec-RPAE, a system for decentralized multi-agent acting and planning in partially observable and non-deterministic environments. The system includes both an acting component and an online planning component.
Proceedings of the Thirty-First International Conference on Automated Planning and Scheduling (ICAPS 2021)
Sunandita Patra, James Mason, Malik Ghallab, Dana Nau, Paolo Traverso
The authors define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models.
Preprint submitted to Artificial Intelligence
Vincent Hsiao, Xinyue Pan, Dana Nau, Rina Dechter
The authors define a framework for modeling spatial evolutionary games using Dynamic Bayesian Networks that capture the underlying stochastic process. The resulting Dynamic Bayesian Networks can be queried for quantities of interest by performing exact inference on the network. They then propose a method for producing approximations of the spatial evolutionary game through the truncation of the corresponding DBN, taking advantage of the high symmetry of the model.
Proceedings of the ACM 20th International Conference on Autonomous Agents and MultiAgent Systems
Michele Gelfand, Joshua Jackson, Xinyue Pan, Dana Nau, Dylan Pieper, Emmy Denison, Munqith Dagher, Paul Van Lange, Chi-Yue Chiu, Mo Wang
The COVID-19 pandemic is a global health crisis, yet certain countries have had far more success in limiting COVID-19 cases and deaths. The authors suggest that collective threats require a tremendous amount of coordination, and that strict adherence to social norms is a key mechanism that enables groups to do so. The paper examines how the strength of social norms—or cultural tightness–looseness—was associated with countries' success in limiting cases and deaths. The results indicated that, compared with nations with high levels of cultural tightness, nations with high levels of cultural looseness are estimated to have had 4·99 times the number of cases (7132 per million vs 1428 per million, respectively) and 8·71 times the number of deaths (183 per million vs 21 per million, respectively), taking into account a number of controls. A formal evolutionary game theoretic model suggested that tight groups coordinate much faster and have higher survival rates than loose groups. The results suggest that tightening social norms might confer an evolutionary advantage in times of collective threat.
The Lancet Planetary Health
2020
Ruoxi Li, Sunandita Patra,Dana Nau
The paper describes Dec-RAE-UPOM, a system for decentralized multi-agent acting and planning in environments that are partially observable, nondeterministic, and dynamically changing.
Dr. Nau's Computer Science paper archives
Sunandita Patra, Amit Kumar, James Mason, Malik Ghallab, Paolo Traverso, Dana Nau
New planning and learning algorithms for Refinement Acting Engine (RAE), which uses hierarchical operational models to perform tasks in dynamically changing environments.
2020 International Conference on Automated Planning and Scheduling (ICAPS)
2019
Sudanita Patra, Malik Ghallab, Dana Nau, Paolo Traverso
An integrated acting and planning system that addresses the consistency problem by using the actor’s operational models both for acting and for planning.
2023
George Kontoudis and Michael Otte
Active Learning of Gaussian process (GP) surrogates is an efficient way to model unknown environments in various applications. In this paper, we propose an adaptive exploration-exploitation active learning method (ALX) that can be executed rapidly to facilitate real-time decision making. For the exploration phase, we formulate an acquisition function that maximizes the approximated, expected Fisher information. For the exploitation phase, we employ a closed-form acquisition function that maximizes the total expected variance reduction of the search space. The determination of each phase is established with an exploration condition that measures the predictive accuracy of GP surrogates. Extensive numerical experiments in multiple input spaces validate the efficiency of our method.
preprint on Kontoudis website
2020
Rohan Mekala, Adam Porter, Mikael Lindvall
The authors build a black box attack against robust multi-model face recognition pipelines and test it against Google’s FaceNet. They present a novel metamorphic defense pipeline relying on nonlinear image transformations to detect adversarial attacks with a high degree of accuracy. They further use the results to create probabilistic metamorphic relations that define efficient decision boundaries between safe and adversarial examples.
ICSEW'20 May 2020, Seoul, South Korea
2020
Ye Wang, Jian Dong, Yanxin Liu, Chunpei Wang, Gang Qu
A report on ongoing work towards a machine learning based runtime approximate computing (AC) approach that can be applied on the data flow graph representation of any software program. This approach can utilize runtime inputs together with prior information of the software to identify and approximate the noncritical portion of a computation with low runtime overhead. Some preliminary experimental results show that compared with previous runtime AC approaches, our approach can significantly reduce the time overhead with little loss on the energy efficiency and computation accuracy.
2020 International Conference on Embedded Software (EMSOFT)
Zhichao Xu, Aijiao Cui, Gang Qu
The electronics industry has become the main target of counterfeiting. Integrated circuits (ICs) are highly vulnerable to various types of counterfeiting such as recycling. The recycled ICs do not have the performance and service lifetime of the genuine ones, which poses a threat to reliability of electronic systems. This paper proposes a novel recycled IC detection method. An authentication mechanism and a parallel circuit unit structures, as an aging sensor, are used to distinguish recycled ICs from fresh ICs.
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
Ye Wang, Jian Dong, Qian Xu, Zhaojun Lu, Gang Qu
Approximate computing (AC) is an attractive energy efficient technique that can be implemented at almost all the design levels including data, algorithm, and hardware. The basic idea behind AC is to deliberately control the trade-off between computation accuracy and energy efficiency. However, with the introduction of AC, traditional computing frameworks are having many potential security vulnerabilities. This paper analyzes these vulnerabilities and the associated attacks as well as corresponding countermeasures.
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
Jiliang Zhang, Chen Li, Jing Ye, Gang Qu
This article reviews recent research progress on machine learning privacy. First, the privacy threats on data and models in different scenarios are described in detail. Then, typical privacy protection methods are introduced. Finally, the limitations and future development trends of ML privacy research are discussed.
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
2019
Jiqing Xu, Zhengjie Li , Yunbing Pang , Jian Wang , Gang Qu, Jinmei Lai
Under the premise of selecting a large number of typical benchmark circuits, a representative path delay can well represent the overall timing performance of FPGAs.
2019 IEEE 13th International Conference on ASIC (ASICON)
2022
Rahil Parikh, Ilya Kavalerov Carol Espy-Wilson, Shihab Shamma
Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. The authors analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net.
2023
Ozlem Ozmen Garibay, Brent Winslow, Salvatore Andolina, Margherita Antona, Anja Bodenschatz, Constantinos Coursaris, Gregory Falco, Stephen M. Fiore, Ivan Garibay, Keri Grieman, John C. Havens, Marina Jirotka, Hernisa Kacorri, Waldemar Karwowski, Joe Kider, Joseph Konstan, Sean Koon, Monica Lopez-Gonzalez, Iliana Maifeld-Carucci, Sean McGregor, Gavriel Salvendy, Ben Shneiderman, Constantine Stephanidis, Christina Strobel, Carolyn Ten Holter & Wei Xu
Widespread adoption of artificial intelligence (AI) technologies is substantially affecting the human condition in ways that are not yet well understood. Negative unintended consequences abound including the perpetuation and exacerbation of societal inequalities and divisions via algorithmic decision making. The authors present six grand challenges for the scientific community to create AI technologies that are human-centered, that is, ethical, fair, and enhance the human condition. These grand challenges are the result of an international collaboration across academia, industry and government and represent the consensus views of a group of 26 experts in the field of human-centered artificial intelligence (HCAI). In essence, these challenges advocate for a human-centered approach to AI that (1) is centered in human well-being, (2) is designed responsibly, (3) respects privacy, (4) follows human-centered design principles, (5) is subject to appropriate governance and oversight, and (6) interacts with individuals while respecting human’s cognitive capacities. The authors hope that these challenges and their associated research directions serve as a call for action to conduct research and development in AI that serves as a force multiplier towards more fair, equitable and sustainable societies.
International Journal of Human-Computer Interaction
2022
Ben Shneiderman
If Human-Centered AI design scenarios are oriented to amplifying, augmenting, empowering and enhancing human performance, then the chance of successful outcomes will increase. The passionate advocates of Human-Centered AI are devoted to furthering human values, rights, justice, and dignity, by building reliable, safe, and trustworthy systems.
HUMAN '22: Proceedings of the 5th Workshop on Human Factors in Hypertext (2022)
Catherine Plaisant, Ben Shneiderman, Chris Johnson, Dave Kasik, Mary Whitton
The paper details a history of early information visualization research. This discipline, based on graphical user interfaces with pointing devices, became possible as software matured, hardware sped up, and screen resolution improved. Driven by the concepts of direct manipulation and dynamic queries, the authors recount interactive interfaces that empowered users and opened up new possibilities for the next generation of designers. They also explain how they worked with professionals who had real problems and tested real users to get their feedback; many of the new ideas found their way into widely used commercial products.
IEEE Computer Graphics and Applications; access is through the NIH National Library of Medicine
2021
Ben Shneiderman
Shneiderman presents commentary on Peter Hancock's article on avoiding adverse autonomous agent actions. Shneiderman predicts a brighter future and argues that autonomy is not inevitable because it is not desirable and can be deadly.
Human-Computer Interaction
Ben Shneiderman
The high expectations of AI have triggered worldwide interest and concern, generating 400+ policy documents on responsible AI. Intense discussions over the ethical issues lay a helpful foundation, preparing researchers, managers, policy makers, and educators for constructive discussions that will lead to clear recommendations for building the reliable, safe, and trustworthy systems that will be commercial success. This Viewpoint focuses on four themes that lead to 15 recommendations for moving forward. The four themes combine AI thinking with human-centered User Experience Design (UXD).
Communications of the ACM
Ben Shneiderman
Researchers, developers, business leaders, policy makers and others are expanding the technology-centered scope of Artificial Intelligence (AI) to include Human-Centered AI (HCAI) ways of thinking. This expansion from an algorithm-focused view to embrace a human-centered perspective, can shape the future of technology so as to better serve human needs. Educators, designers, software engineers, product managers, evaluators, and government agency staffers can build on AI-driven technologies to design products and services that make life better for the users. These human-centered products and services will enable people to better care for each other, build sustainable communities, and restore the environment.
INTERACT 2021, the IFIP Conference on Human-Computer Interaction
Albrecht Schmidt, Fosca Giannotti, Wendy Mackay, Ben Shneiderman, Kaisa Väänänen
This panel discusses the role of human-computer interaction (HCI) in the conception, design, and implementation of human-centered artificial intelligence (AI). For us, it is important that AI and machine learning (ML) are ethical and create value for humans - as individuals as well as for society. Our discussion emphasizes the opportunities of using HCI and User Experience Design methods to create advanced AI/ML-based systems that will be widely adopted, reliable, safe, trustworthy, and responsible. The resulting systems will integrate AI and ML algorithms while providing user interfaces and control panels that ensure meaningful human control.
INTERACT 2021, the IFIP Conference on Human-Computer Interaction
Dakuo Wang, Pattie Maes, Xiangshi Ren, Ben Shneiderman, Yuanchun Shi, Qianying Wang
Artificial Intelligence (AI) can refer to the machine learning algorithms and the automation applications built on top of these algorithms. Human-computer interaction (HCI) researchers have studied these AI applications and suggested various Human-Centered AI (HCAI) principles for an explainable, safe, reliable, and trustworthy interaction experience. While some designers believe that computers should be supertools and active appliances, others believe that these latest AI systems can be collaborators. We ask whether the supertool or the collaboration metaphors best support work and play? How can we design AI systems to work best with people or for people? What does it take to get there?
2021 ACM CHI Conference on Human Factors in Computing Systems
Ben Shneiderman
Shneiderman's tutorial proposes a new synthesis, in which Artificial Intelligence (AI) algorithms are combined with human-centered thinking to make Human-Centered AI (HCAI). This approach combines research on AI algorithms with user experience design methods to shape technologies that amplify, augment, empower, and enhance human performance. Researchers and developers for HCAI systems value meaningful human control, putting people first by serving human needs, values, and goals.
26th ACM International Conference on Intelligent User Interfaces
Ben Shneiderman
This is a 2006 draft position paper for the emerging science of the web. Readers will gain hope for the future by reading how government services and digital libraries are being redesigned to make them more usable for diverse users. There is a good taste of the breadth of research being done: not only for the diversity of users and their special needs, but for the research methods and outcomes. The breadth of these implications highlights why universal usability research is so important. There is progress and hope, but there are many minds to be changed and much work to be done.
docsbay.net
2020
Ben Shneiderman
This paper bridges the gap between widely discussed ethical principles of Human-centered AI(HCAI) and practical steps for effective governance.
ACM Transactions on Interactive Intelligent Systems
Ben Shneiderman
A commentary that reverses the current emphasis on algorithms and AI methods, by putting humans at the center of systems design thinking. It offers three ideas: (1) a two-dimensional HCAI framework, which shows how it is possible to have both high levels of human control AND high levels of automation, (2) a shift from emulating humans to empowering people with a plea to shift language, imagery, and metaphors away from portrayals of intelligent autonomous teammates towards descriptions of powerful tool-like appliances and tele-operated devices, and (3) a three-level governance structure that describes how software engineering teams can develop more reliable systems, how managers can emphasize a safety culture across an organization, and how industry-wide certification can promote trustworthy HCAI systems.
AIS Transactions on Human-Computer Interaction
Ben Shneiderman
Proposes a two-dimensional framework alternative to autonomous AI systems called Human-Centered Artificial Intelligence that clarifies how to design for high levels of human control and high levels of computer automation to increase human performance, understand the situations in which full human control or full computer control are necessary, and avoid the dangers of either excessive human control or excessive computer control.
ACM International Journal of Human-Computer Interaction
2020
Emre Ozfatura, Sennur Ulukus, Deniz Gunduz
Introduces a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR), which benefits from the advantages of both coded and uncoded computation schemes, and reduces both computation time and decoding complexity by allowing a trade-off between the accuracy and the speed of computation. The approach is extended to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery, where the results of subtasks computed by the workers are coded before being communicated.
2019 IEEE International Conference on Acoustics, Speech and Signal Processin; arXiv.org
Deniz Gunduz, Emre Ozfatura, Sennur Ulukus, Baturalp Buyukates
The age of information (AoI) metric is used to track the recovery frequency of partial computations in distributed gradient descent, the most common approach in supervised machine learning, a new solution to the problem of “straggling” worker machines.
arXiv.org
Brian Kim, Yalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sennur Ulukus
Presents over-the-air adversarial attacks against deep learning-based modulation classifiers, accounting for realistic channel and broadcast transmission effects. A certified defense method using randomized smoothing is also included.
arXiv.org
2021
Behzad Sadrfaridpour, Yiannis Aloimonos, Miao Yu, Yang Tao, Donald Webster
To test the idea that advancements in robotics and artificial intelligence offer the potential to improve the monitoring of oyster beds, the researchers prepared a remote operated underwater vehicle (ROV) with a camera and filmed in the Chesapeake Bay. They then used these videos to train convolutional neural networks (CNNs) to count oysters and track them in consecutive image frames so they are not identified multiple times.
arXiv.org
2023
Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake
Fleets of robots ingest massive amounts of streaming data generated by interacting with their environments, far more than those that can be stored or transmitted with ease. At the same time, the hope is that teams of robots can co-acquire diverse skills through their experiences in varied settings. How can such fleet-level learning be enabled without having to transmit or centralize fleet-scale data? In this paper, the authors investigate distributed learning of policies as a potential solution.
arXiv.org
Qiwen Cui, Kaiqing Zhang, Simon Du
The authors propose a new model, independent linear Markov game, for multi-agent reinforcement learning with a large state space and a large number of agents. This is a class of Markov games with independent linear function approximation, where each agent has its own function approximation for the state-action value functions that are marginalized by other players’ policies.
arXiv.org
Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman
A study of two-player zero-sum stochastic games, where the authors propose a form of in- dependent learning dynamics called Doubly Smoothed Best-Response dynamics, which integrates a discrete and doubly smoothed variant of the best-response dynamics into temporal-difference (TD)-learning and mini-max value iteration. The resulting dynamics are payoff-based, convergent, rational, and symmetric among players.
arXiv.org
Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra
The authors study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. They pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations.
arXiv.org
Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang
This work revisits the linear programming framework for offline reinforcement learning, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size. The key enabler is to introduce proper constraints in the reformulation, instead of using any regularization as in the literature, also with careful choices of the function classes and initial state distributions. The authors hope their insights bring into light the use of LP formulations and the induced primal-dual minimax optimization, in offline RL.
arXiv.org
Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro
A study of the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). The authors prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. This work may be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv.org
2022
Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang (work completed at MIT LIDS Lab)
The authors revisit the LP framework for offline reinforcement learning, and advance the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size.
arXiv.org
Lirui Wang, Kaiqing Zhang, Yunzhu Li, Yonglong Tian, Russ Tedrake (work completed at MIT CSAIL Lab)
The authors study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL), specifically contrastive visual representation learning.
arXiv.org
2023
Zachary Lazri McBride , Ivan Brugere, Xin Tian, Dana Dachman-Soled, Antigoni Polychroniadou, Danial Dervovic, Min Wu
Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat individuals within the same demographic group unfairly. To address this issue, the authors introduce a formal definition of within-group fairness that maintains fairness among individuals from within the same group.
ADS/ABS Labs, Harvard