Feixiang Lu, Zongdai Liu, Hui Miao, Peng Wang, Liangjun Zhang, Ruigang Yang, Dinesh Manocha, Bin Zhou
Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling anautonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensure the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection orpose estimation and rarely tackle these situations. In this paper, the authors address this important problem for autonomous driving by solving two critical issues using visual data augmentation.
Sheng Li, Xiang Gu, Kangrui Yi, Yanlin Yang, Guoping Wang, Dinesh Manocha
This experiment investigated the occurrence of self-illusion and its contribution to realistic behavior consistent with a virtual role in virtual environments.
IEEE Transactions on Visualization and Computer Graphics
Yuexin Ma, Xinge Zhu, Xinjing Cheng, Ruigang Yang, Jiming Liu, Dinesh Manocha
A label-free algorithm for trajectory extraction and prediction to use raw videos directly. To better capture the moving objects in videos, the authors introduce dynamic points to model dynamic motions by using a forward-backward extractor to keep temporal consistency and using image reconstruction to keep spatial consistency in an unsupervised manner. The method is the first to achieve unsupervised learning of trajectory extraction and prediction.
2020 European Conference on Computer Vision
Zehui Lin, Sheng Li, Xinlu Zeng, Congyi Zhang, Jinzhu Jia, Guoping Wang, Dinesh Manocha
This chi-squared progressive photon mapping algorithm (CPPM) constructs an estimator by controlling the bandwidth to obtain superior image quality.
ACM Transactions on Graphics
Shiguang Liu, Dinesh Manocha
This is a broad overview of research on sound simulation in virtual reality, games, etc. It first surveys various sound synthesis methods,including harmonic synthesis, texture synthesis, spectral analysis, and physics-based synthesis. Then, it summarizes popular sound propagation techniques, namely wave-based methods, geometric-based methods, and hybrid methods. Next, sound rendering methods are reviewed. The authors also highlight some recent methods that use machine learning techniques for synthesis, propagation, and some inverse problems.
Pooja Guhan, Manas Agarwal, Naman Awasthi, Gloria Reeves, Dinesh Manocha, Aniket Bera
ABC-Net is a semi-supervised multi-modal GAN framework based on psychology literature that detects engagement levels in video conversations. It uses three constructs—behavioral, cognitive, and affective engagement—to extract various features that can effectively capture engagement levels.
Uttaran Bhattacharya, Nicholas Rewkowski, Pooja Guhan, Niall L. Williams, Trisha Mittal, Aniket Bera, Dinesh Manocha
This autoregression network generates virtual agents that convey various emotions through their walking styles or gaits.
Jaehoon Choi, Dongki Jung, Yonghan Lee, Deokhwa Kim, Dinesh Manocha, Donghwan Lee
An algorithm for self-supervised monocular depth completion in robotic navigation, computer vision and autonomous driving. The approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels. Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
Qingyang Tan, Zherong Pan, Dinesh Manocha
LCollision is a learning-based method that synthesizes collision-free 3D human poses. LCollision is the first approach that can obtain high accuracy in handling non-penetration and collision constraints in a learning framework.
Rohan Chandra, Aniket Bera, Dinesh Manocha
Autonomous vehicles behave conservatively in a traffic environment with human drivers and do not adapt to local conditions and socio-cultural norms. However, socially aware AVs can be designed if there exists a mechanism to understand the behaviors of human drivers. In this example of Machine Theory of Mind (M-ToM) the authors infer the behaviors of human drivers by observing the trajectory of their vehicles. "StylePredict" is based on trajectory analysis of vehicles. It mimics human ToM to infer driver behaviors, or styles, using a computational mapping between the extracted trajectory of avehicle in traffic and the driver behaviors using graph-theoretic techniques, including spectral analysis and centrality functions. StylePredict can analyze driver behavior in the USA, China, India, and Singapore, based on traffic density, hetero-geneity, and conformity to traffic rules.
Angelos Mavrogiannis, Rohan Chandra, Dinesh Manocha
A learning algorithm for action prediction and local navigation for autonomous driving that classifies the driver behavior of other vehicles or road-agents (aggressive or conservative) and takes that into account for decision making and safe driving.
Divya Kothandaraman, Rohan Chandra, Dinesh Manocha
An unsupervised multi-source domain adaptive semantic segmentation approach for autonomous vehicles in unstructured and unconstrained traffic environments.
Mingliang Xu, Chaochao Li, Pei Lv, Wei Chen, Zhigang Deng, Bing Zhou, Dinesh Manocha
CubeP is a model for crowd simulation that comprehensively considers physiological, psychological, and physical factors. Inspired by the theory of “the devoted actor”, the model determines the movement of each individual by modeling the physical influence from physical strength and emotion. This is the first time that physiological, psychological, and physical factors are integrated in a unified manner, and the relationship between the factors is explicitly determined. The new model is capable of generating effects similar to real-world scenarios and can also reliably predict the changes in the physical strength and emotion of individuals in an emergency situation.
Xutong Jin, Sheng Li, Tianshu Qu, Dinesh Manocha, Guoping Wang
Model sound synthesis is a physically-based sound synthesis method used to generate audio content in games and virtual worlds. This paper presents a novel learning-based impact sound synthesis algorithm called Deep-Modal. The approach can handle sound synthesis for common arbitrary objects, especially dynamic generated objects, in real time.
MM '20: Proceedings of the 28th ACM International Conference on Multimedia
Zhenyu Tang, Hsien-Yu Meng, Dinesh Manocha
A novel hybrid sound propagation algorithm for interactive applications.
Sarala Padi, Dinesh Manocha, Ram Sriram
A novel, Multi-Window Data Augmentation(MWA-SER) approach for speech emotion recognition.MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method to generate additional data samples and building the deep learning models to recognize the underlying emotion of an audio signal.
Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
The paper presents a Generative Adversarial Network (GAN) based room impulse response generator for generating realistic synthetic room impulse responses.
Cheng Li, Min Tang, Ruofeng tong, Ming Cai, Jieyi Zhao, Dinesh Manocha
Cloth simulation is an active area of research in computer graphics, computer-aided design (CAD) and the fashion industry. Over the last few decades many methods have been proposed for solving the underlying dynamical system with robust collision handling. The paper presents a novel parallel algorithm for cloth simulation that exploits multiple GPUs for fast computation and the handling of very high resolution meshes. It is the first approach that can perform almost interactive complex cloth simulation with wrinkles, friction and folds on commodity workstations.
Feixiang Lu, Zongdai Liu, Xibin Song, Dingfu Zhou, Wei Li, Hui Miao, Miao Liao, Liangjun Zhang, BinZhou, Ruigang Yang, Dinesh Manocha
The paper presents a robust and effective approach to reconstruct complete 3D poses and shapes of vehicles from a single image. It introduces a novel part-level representation for vehicle segmentation and 3D reconstruction, which significantly improves performance.
Andrew Best, Sahil Narang, Dinesh Manocha
Sense-Plan-act (SPA) is a new approach for generating plausible verbal interactions between virtual human-like agents and user avatars in shared virtual environments. It extends prior work in propositional planning and natural language processing to enable agents to plan with uncertain information, and leverage question and answer dialogue with other agents and avatars to obtain the needed information and complete their goals. The agents are additionally able to respond to questions from the avatars and other agents using natural-language enabling real-time multi-agent multi-avatar communication environments.
Rohan Chandra, Uttaran Bhattacharya, Tanmay Randhavane, Aniket Bera, Dinesh Manocha
RoadTrack is a realtime tracking algorithm for autonomous driving that tracks heterogeneous road-agents in dense traffic videos. The approach is designed for dense traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road.
Zhiming Hu, Sheng Li, Congyi Zhang, Kangrui Yi, Guoping Wang, Dinesh Manocha
DGaze is a CNN-based model that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions in HMD-based applications. The model can be applied to predict not only real-time gaze positions but also gaze positions in the near future and can achieve better performance than prior method.
IEEE Transactions on Visualization and Computer Graphics
Srihari Pratapa, Dinesh Manocha
RANDM is a random-access depth map compression algorithm for interactive rendering. The compressed representation provides random access to the depth values and enables real-time parallel decompression on commodity hardware. This method partitions the depth range captured in a given scene into equal-sized intervals and uses this partition to generate three separate components that exhibit higher coherence. Each of these components is processed independently to generate the compressed stream.
Rohan Chandra, Uttaran Bhattacharya, Trisha Mittal, Aniket Bera, Dinesh Manocha
CMetric classifies driver behaviors using centrality functions. The formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. This approach is designed for real-time autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video.
Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
The paper presents a learning-based method for detecting fake videos. The authors use the similarity between audio-visual modalities and the similarity between the affective cues of the two modalities to infer whether a video is “real” or “fake.”
Trisha Mittal, Pooja Guhan, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
EmotiCon is a learning-based algorithm for context-aware perceived human emotion recognition from videos and images. It uses multiple modalities of faces and gaits, background visual information and socio-dynamic inter-agent interactions to infer the perceived emotion. EmotiCon outperforms prior context-aware emotion recognition methods.
Abhishek Kumar, Trisha Mittal, Dinesh Manocha
MCQA is a learning-based algorithm for multimodal question answering that explicitly fuses and aligns the multi-modal input (i.e. text, audio, and video) forming the context for the query (question and answer).
Zhenyu Tang, Dinesh Manocha
Modern computer graphics applications including virtual reality and augmented reality have adopted techniques for both visual rendering and audio rendering. While visual rendering can already synthesize virtual objects into the real world seamlessly, it remains difficult to correctly blend virtual sound with real-world sound using state-of-the-art audio rendering. When the virtual sound is generated unaware of the scene, the corresponding application becomes less immersive, especially for AR. The authors present their current work on generating scene-aware sound using ray-tracing based simulation combined with deep learning and optimization.
2020 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops
Micah Taylor, Anish Chandak, Lakulish Antani, Dinesh Manocha
An algorithm and system for sound propagation and rendering in virtual environments and media applications. The approach uses geometric propagation techniques for fast computation of propagation paths from a source to a listener and takes into account specular reflections, diffuse reflections, and edge diffraction.
SriSai Naga Jyotish Poonganam, Bharath Gopalakrishnan, Venkata Seetharama Sai Bhargav Kumar Avula, K. Madhava Krishna, Arun Kumar Singh, Dinesh Manocha
A new model predictive control framework that improves reactive navigation for autonomous robots. The framework allows roboticists to compute low cost control inputs while ensuring some upper bound on the risk of collision.
IEEE Robotics and Automation Letters
Dinesh Manocha, Rohan Chandra, Uttaran Bhattacharya, Aniket Bera, Tanmay Randhavane
The authors' RoadTrack algorithm could help autonomous vehicles navigate dense traffic scenarios. The algorithm uses tracking-by-detection approach to detect vehicles and pedestrians, then predict where they are going.
Kurt Gray, Tanmay Randhavane, Kyra Kapsaskis, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
A data-driven deep neural algorithm for detecting deceptive walking behavior using nonverbal cues like gaits and gestures.
Rohan Chandra, Tianrui Guan, Srujan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha
A novel approach for traffic forecasting in urban traffic scenarios using a combination of spectral graph analysis and deep learning.
Qiaoyun Wu, Dinesh Manocha, Jun Wang, Kai Xu
The authors improve the cross-target and cross-scene generalization of visual navigation through a learning agent guided by conceiving the next observations it expects to see. A variational Bayesian model, NeoNav, generates the next expected observations (NEO) conditioned on the current observations of the agent and the target view.
Uttaran Bhattacharya, Christian Roncal, Trisha Mittal, Rohan Chandra,Aniket Bera, Dinesh Manocha
The paper presents an autoencoder-based semi-supervised approach to classify perceived human emotions from walking styles obtained from videos or from motion-captured data and represented as sequences of 3D poses.
Chaochao Li, Pei Lv, Mingliang Xu, Xinyu Wang, Dinesh Manocha, Bing Zhou, Meng Wang
In many applications such as human-robot interaction, autonomous driving or surveillance, it is important to accurately predict pedestrian trajectories for collision-free navigation or abnormal behavior detection. The authors present a novel trajectory prediction algorithm for pedestrians based on a personality-aware probabilistic feature map.
Uttaran Bhattacharya, Trisha Mittal, Rohan Chandra, Tanmay Randhavane (UNC), Aniket Bera, and Dinesh Manocha
STEP is a novel classifier network able to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network architecture. Given an RGB video of an individual walking, STEP implicitly exploits the gait features to classify the emotional state of the human into one of four emotions: happy, sad, angry, or neutral. | Watch a video about STEP |
Rohan Chandra, Uttaran Bhattacharya, Trisha Mittal, Xiaoyu Li, Aniket Bera, Dinesh Manocha
The GraphRQI algorithm identifies driver behaviors from road agent trajectories. It is 25 percent more accurate over prior behavior classification algorithms for autonomous vehicles. | Watch a video about GraphRQI |
Qingyang Tan, Zherong Pan, Lin Gao, and Dinesh Manocha
A new method bridges the gap between mesh embedding and physical simulation for efficient dynamic models of clothes. The key technique is a graph-based convolutional neural network (CNN) defined on meshes with arbitrary topologies, and a new mesh embedding approach based on physics-inspired loss term. After training, the learned simulator runs10–100 times faster and the accuracy is high enough for robot manipulation tasks. | Watch a video about this method |