16 423 designing computer vision apps

Chen-Hsuan Lin

The first name is Chen-Hsuan (neither just Chen nor Hsuan).
Hsuan is pronounced like "shoo-en" with a quick transition.

I am interested in solving 3D reconstruction and view synthesis problems using neural rendering and self-supervised learning techniques. My research goal is to empower AI systems with dense 3D perception and imagination abilities by learning from visual data in the wild, in order to advance towards the next level of visual and 3D spatial artificial intelligence.

Email: chenhsuanl (at) nvidia (dot) com

Updates

10/2021

The oral presentation of our ICCV 2021 paper BARF is now online here!

10/2021

I gave a talk (in person!) at the MIT vision & graphics seminar (recording here).

08/2021

The code of our ICCV 2021 paper BARF is now released, check it out here!

08/2021

I have joined NVIDIA Research as a research scientist!

07/2021

I have one paper accepted to ICCV 2021 as an oral presentation!

04/2021

We released our latest work on training NeRF from unknown camera poses!

10/2020

The code and short talk of our NeurIPS 2020 paper are online! Check out the project page.

09/2020

I have two papers accepted to NeurIPS 2020 and 3DV 2020!

05/2019

I have joined Facebook AI Research (FAIR) for an internship this summer.

02/2019

I have one paper accepted to CVPR 2019!

05/2018

I have joined Adobe Research for a second internship this summer.

02/2018

I have one paper accepted to CVPR 2018!

01/2018

I have one paper accepted to ICRA 2018.

11/2017

I have two papers accepted to AAAI 2018 and WACV 2018.

07/2017

My oral presentation at CVPR 2017 is online here.

08/2017

I have started as a Ph.D. student back at Carnegie Mellon University this fall!

04/2017

I have joined Adobe Research for an internship this summer.

02/2017

I have two papers accepted to CVPR 2017!

07/2016

I have my first (ever) paper accepted to ECCV 2016!

older updates... (show)

Research

BARF: Bundle-Adjusting Neural Radiance Fields

IEEE International Conference on Computer Vision (ICCV), 2021 (oral presentation)
paper • arXiv • project page • presentation • code • BibTex (show)

@inproceedings{lin2021barf,
title={BARF: Bundle-Adjusting Neural Radiance Fields},
author={Lin, Chen-Hsuan and Ma, Wei-Chiu and Torralba, Antonio and Lucey, Simon},
booktitle={IEEE International Conference on Computer Vision ({ICCV})},
year={2021}
}

Neural Radiance Fields can be trained from unknown camera poses! Inspired by classical image alignment, we show that coarse-to-fine optimization is simple yet effective for joint registration and reconstruction on 3D scene representations.

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin, Chaoyang Wang, and Simon Lucey
Advances in Neural Information Processing Systems (NeurIPS), 2020
paper • arXiv • project page • code • BibTex (show)

@inproceedings{lin2020sdfsrn,
title={SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images},
author={Lin, Chen-Hsuan and Wang, Chaoyang and Lucey, Simon},
booktitle={Advances in Neural Information Processing Systems ({NeurIPS})},
year={2020}
}

Implicit 3D shape reconstruction can be trained from static image collections without multi-view supervision! We establish the geometric connection of 2D silhouettes to 3D SDF shapes for scalable single-view training on real-world image data.

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

Chaoyang Wang, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on 3D Vision (3DV), 2020 (oral presentation)

paper (arXiv) • BibTex (show)

@inproceedings{wang2020deep,
title={Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild},
author={Wang, Chaoyang and Lin, Chen-Hsuan and Lucey, Simon},
booktitle={IEEE International Conference on 3D Vision ({3DV})},
year={2020}
}

A self-supervised framework for 3D structure and pose recovery from 2D landmarks, closely related to hierarchical block-sparse coding in non-rigid structure from motion. Our method can handle perspective camera models and missing data.

Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
paper • arXiv • project page • code • BibTex (show)

@inproceedings{lin2019photometric,
title={Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction},
author={Lin, Chen-Hsuan and Wang, Oliver and Russell, Bryan C and Shechtman, Eli and Kim, Vladimir G and Fisher, Matthew and Lucey, Simon},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
year={2019}
}

3D mesh reconstruction from RGB videos using photometric optimization with learned shape priors. This allows 3D object meshes to deform in a learned shape space while being pixel-aligned against RGB videos without depth or silhouettes.

ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing

Chen-Hsuan Lin, Ersin Yumer, Oliver Wang, Eli Shechtman, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
paper • arXiv • project page • code • BibTex (show)

@inproceedings{lin2018stgan,
title={ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing},
author={Lin, Chen-Hsuan and Yumer, Ersin and Wang, Oliver and Shechtman, Eli and Lucey, Simon},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
year={2018}
}

GANs can learn to correct the geometry of objects and create realistic image composites! Our method discovers plausible geometric configurations of objects driven solely by appearance realism, where ground-truth supervision are unavailable.

Deep-LK for Efficient Adaptive Object Tracking

Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, and Simon Lucey
IEEE International Conference on Robotics and Automation (ICRA), 2018
paper • arXiv • BibTex (show)

@inproceedings{wang2018deeplk,
title={Deep-LK for Efficient Adaptive Object Tracking},
author={Wang, Chaoyang and Galoogahi, Hamed Kiani and Lin, Chen-Hsuan and Lucey, Simon},
booktitle={IEEE International Conference on Robotics and Automation ({ICRA})},
year={2018}
}

We can use Siamese neural networks to learn general object tracking by optimizing a registration-based objective function. The learned feature representations adapts to the regression parameters online with respect to the tracked templates.

Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction

Chen-Hsuan Lin, Chen Kong, and Simon Lucey
AAAI Conference on Artificial Intelligence (AAAI), 2018 (oral presentation)
paper • arXiv • project page • code • BibTex (show)

@inproceedings{lin2018learning,
title={Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction},
author={Lin, Chen-Hsuan and Kong, Chen and Lucey, Simon},
booktitle={AAAI Conference on Artificial Intelligence ({AAAI})},
year={2018}
}

We design a novel differentiable point cloud renderer to approximate the rasterization of dense 3D point clouds generated by a 2D convolutional neural network, so that the generated point clouds can be supervised from training depth images.

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, and Simon Lucey
IEEE Winter Conference on Applications of Computer Vision (WACV), 2018
paper • arXiv • extension paper • BibTex (show)

@inproceedings{zhu2017object,
title={Object-Centric Photometric Bundle Adjustment with Deep Shape Prior},
author={Zhu, Rui and Wang, Chaoyang and Lin, Chen-Hsuan and Wang, Ziyan and Lucey, Simon},
booktitle={IEEE Winter Conference on Applications of Computer Vision ({WACV})},
year={2018}
}

3D shape prediction networks can be utilized as a strong semantic prior for object-centric photometric bundle adjustment. We use pretrained 3D point cloud generators to align shapes to videos within an optimization-based inference framework.

Inverse Compositional Spatial Transformer Networks

Chen-Hsuan Lin and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017 (oral presentation)
paper • arXiv • project page • presentation • code • BibTex (show)

@inproceedings{lin2017inverse,
title={Inverse Compositional Spatial Transformer Networks},
author={Lin, Chen-Hsuan and Lucey, Simon},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
year={2017}
}

A redesign of Spatial Transformer Networks inspired by the Lucas-Kanade algorithm. With the same network architecture, our method learns recurrent spatial transformations to resolve geometric redundancies for efficient visual recognition.

Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image

Chen Kong, Chen-Hsuan Lin, and Simon Lucey
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017
paper • BibTex (show)

@inproceedings{kong2017using,
title={Using Locally Corresponding CAD Models for Dense 3D Reconstructions from a Single Image},
author={Kong, Chen and Lin, Chen-Hsuan and Lucey, Simon},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
year={2017}
}

A 3D shape reconstruction method based on CAD model retrieval. By matching keypoint projections from the input image, we can use local landmark correspondences to solve for sparse linear combinations of a prebuilt CAD model dictionary.

The Conditional Lucas & Kanade Algorithm

Chen-Hsuan Lin, Rui Zhu, and Simon Lucey
European Conference on Computer Vision (ECCV), 2016
paper • arXiv • project page • code • BibTex (show)

@inproceedings{lin2016conditional,
title={The Conditional Lucas \& Kanade Algorithm},
author={Lin, Chen-Hsuan and Zhu, Rui and Lucey, Simon},
booktitle={European Conference on Computer Vision (ECCV)},
pages={793--808},
year={2016},
organization={Springer International Publishing}
}

A learning-based image registration method inspired by the seminal Lucas-Kanade algorithm. With structured optimization and a conditional loss, our method significantly improves over classical synthesis-based optimization objective functions.

Ph.D. Dissertation

Learning 3D Registration and Reconstruction from the Visual World

Chen-Hsuan Lin
Carnegie Mellon University, 2021
thesis • dissertation talk (hide) • BibTex (show)

@phdthesis{lin2021learning,
title={Learning 3D Registration and Reconstruction from the Visual World},
author={Lin, Chen-Hsuan},
year={2021},
month={June},
school={The Robotics Institute, Carnegie Mellon University},
address={Pittsburgh, PA},
number={CMU-RI-TR-21-13},
}

Experiences

NVIDIA Research, 2021 – present
Research Scientist
Research in dense 3D reconstruction, self-supervised learning, and neural rendering.

Carnegie Mellon University, 2014 – 2021
Graduate Research Assistant (with Simon Lucey)
Research in geometric image registration, dense 3D reconstruction, and self-supervised learning.

Adobe Research, 2017
Research Intern (with Eli Shechtman, Oliver Wang, and Ersin Yumer)
Learning geometric corrections of composited objects in images driven by appearance realism.

National Taiwan University, 2011 – 2013
Undergraduate Research Assistant (with Homer H. Chen)
Designing rate-distortion optimization for video compression based on perceptual quality metrics.

Teaching

Visual Learning and Recognition (CMU 16-824), Spring 2019
Teaching Assistant / Graduate Student Instructor (with Abhinav Gupta)
(Lectures: 3D Vision & 3D Reasoning, Semantic Segmentation & Pixel Labeling )

Designing Computer Vision Apps (CMU 16-423), Fall 2015
Teaching Assistant (with Simon Lucey)

Academic Projects

Towards a More Curious Agent

CMU 10-703 Deep Reinforcement Learning & Control

paper

We model intrinsic rewards of agents with the causal distribution of visual observations for policy networks, solving navigation problems with very sparse rewards.

Disentangler Networks with Absolute and Relative Attributes

CMU 16-824 Visual Learning & Recognition

paper

A neural network that disentangles image embeddings into controllable attributes for image manipulation that can be learned from relative ranking supervision.

3D Facial Model Fitting from 2D Videos

CMU CI2CV Computer Vision Lab

video (show)

A 3D reconstruction system of metric-scale faces from self-captured 2D videos by solving for sparse 3D facial landmarks followed by dense 3D mesh fitting.

Video Summarization via Convolutional Neural Networks

CMU 10-701 Machine Learning

report

We design a new objective for end-to-end learning of video summarization, which allows K-means clustering of input video frames in the latent space at test time.

Perceptual Rate-Distortion Optimization of Motion Estimation

NTU Multimedia Processing & Communications Lab

paper

An optimization framework for video coding to find the optimal tradeoff between the encoding bitrate and the decoding distortion using perceptual metrics.

Virtual Piano Keyboard System

NTU Digital Circuit Design Lab

presentation • demo

A sensor-based virtual instrumental system with only a paper keyboard using real-time fingertip and keyboard pattern recognition on raw CCD sensory input data.