Invited speakers
We are excited to bring to you world-class speakers to give keynotes at the conference.
Building and using prior shape models 3D tracking and SLAM
Date: Tue. 9 (morning)
Ian Reid
The University of Adelaide, AustraliaShow/hide speaker biography
Ian Reid is a Professor of Computer Science and an ARC Australian Laureate Fellow at the University of Adelaide. Between 2000 and 2012 he was a Professor of Engineering Science at the University of Oxford.
He received a BSc in Computer Science and Mathematics with first class honours from University of Western Australia in 1987 and was awarded a Rhodes Scholarship in 1988 to study at the University of Oxford, where he obtained a D.Phil. in 1992. Between then and 2000 when he was appointed to a Lecturership in Oxford, he held various Research Fellowship posts including an EPSRC Advanced Research Fellowship. His research interests include active vision, visual tracking, SLAM, human motion capture and intelligent visual surveillance, with an emphasis on real-time implementations whenever possible. He has published 150 papers and attracted more than 9000 citations, with prize winning papers BMVC '05, '09, '10, and CVPR '08. He serves on the program committees of various national and international conferences, including as Program Chair for the Asian Conference on Computer Vision 2014. He is also on the editorial board of IEEE T-PAMI and Computer Vision and Image Understanding.
Abstract
Over a number of years the authors have developed various methods for real-time tracking objects or tracking mobile cameras, both of which are essential precursors for many applications, not least augmented reality tasks. Though many algorithms for visual tracking are content with simply reporting a 2D x-y position in the image, a richer set of applications is available when the full pose of the canera, or the 6 degrees of freedom of a 3D object are tracked.
Our formulations rely on probabilistic simultaneous segmentation and tracking, typically using region-based methods. The genesis of this line of work was [1], in which we developed a generative probabilistic model of image formation into foreground and background regions. This model leads to an elegant, robust and fast system for segmentation and tracking of 2D deformable objects within a level-set framework, by considering the pixel-wise posterior (PWP) foreground/background membership of the image data and is the inspiration for much of our more recent work in 3D tracking and model building.
[2] extended the PWP formulation to the case of tracking the full six degrees of freedom of the pose of a known 3D object, and in [3] showed how fusion of cheap and simple MEMS accelerometers can result in a robust 3D hand tracker for use in an augmented reality application. Neverthless, a limitation of this 3D work is the requirement for the tracked target to have a fixed, known geometry. We addressed this in [4], [5], [6] showing how a Gaussian Process Latent Variable Model can effectively learn a shape space for 3D shapes that can then be fit to silhouette data at grun timeh for combined segmentation and tracking. [7] further extended this to show how such models could be incorpoarted into densely estimated maps in a visual SLAM framework.
More recently we have explored the idea of using the silhouette to estimate the camera pose and the shape of a stationary target. [8] demonstrated this idea on a mobile phone, using the phonefs internal MEMS sensing to give the rotational information of the camera, and with the silhouette providing visual cues to resolve the shape and translational pose parameters simultaneously. A similar framework is possible in a 3D tracking scenario for tracking objects using RGB-D data; [9], [10] show how models can be built and tracked from a moving RGB-D sensor.
The work was funded by EU Framework 7 project REWIRE, UK EPSRC project EP/H050795/1, and by the Australian Research Council through the Centre of Excellence for Robotic Vision CE140100016, and Laureate Fellowship FL130100102 to IR.
- C. Bibby and I. Reid, "Robust real-time visual tracking using pixel-wise posteriors," in Proceedings of European Conference on Computer Vision, 2008.
- V. Prisacariu and I. Reid, "Pwp3d: Real-time segmentation and tracking of 3d objects," Intfl J. of Computer Vision, 2012.
- V. A. Prisacariu and I. D. Reid, "Robust 3d hand tracking for human computer interaction," in IEEE Intfl Conference on Automatic Face and Gesture Recognition, march 2011, pp. 368-375.
- -, "Nonlinear shape manifolds as shape priors in level set segmentation and tracking," in IEEE Computer Vision and Pattern Recognition, june 2011, pp. 2185-2192.
- -, "Shared shape spaces," in IEEE Intfl Conf on Computer Vision, nov. 2011, pp. 2587-2594.
- V. A. Prisacariu, A. V. Segal, and I. D. Reid, "Simultaneous monocular 2d segmentation, 3d pose recovery and 3d reconstruction," in Asian Conf. on Computer Vision, 2012.
- A. Dame, V. A. Prisacariu, C. Y. Ren, and I. D. Reid, "Dense reconstruction using 3d object shape priors," in IEEE Computer Vision and Pattern Recognition, Jun 2013.
- V. A. Prisacariu, O. Kahler, D. W. Murray, and I. D. Reid, "Simultaneous 3d tracking and reconstruction on a mobile phone," in Intfl Symp. on Mixed and Augmented Reality, Oct 2013.
- C. Y. Ren, V. A. Prisacariu, D. W. Murray, and I. D. Reid, "Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data," in IEEE Intfl Conf. on Computer Vision, Dec 2013.
- C. Y. Ren, V. A. Prisacariu, O. Kahler, D. W. Murray, and I. D. Reid, "3d tracking of multiple objects with identical appearance using rgb-d input," in Intfl Conf on 3D Vision (3DV), Dec 2014.
Toward geometric foundations for computer vision
Date: Tue. 9 (afternoon)
Jean Ponce
ENS, FranceShow/hide speaker biography
Abstract
I will argue that projective geometry should not be viewed merely as an analytical device for linearizing calculations (its main role in structure from motion), but as the proper framework for studying the relation between shape and its perspective projections.
I will first illustrate this argument with a classical problem from multi-view geometry: When do the visual rays associated with triplets of point correspondences converge, that is, intersect in a common point? Classical models of trinocular geometry based on the fundamental matrix and the trifocal tensor only provide partial answers to this fundamental question. I will use elementary tools from projective line geometry to provide necessary and sufficien conditions for convergence in terms of transversals to triplets of visual rays, and derive a novel and minimal parameterization of trinocular geometry.
I will then switch from points and lines to curved surfaces, and show that classical properties of the outlines of solid shapes bounded by smooth surfaces can be established in a purely projective setting. In particular, I will give new synthetic proofs of Koenderink's famous theorem on convexities and concavities of the image contour, and of the fact that the rim turns in the same direction as the viewpoint in the tangent plane at a convex point, and in the opposite direction at a hyperbolic point.
I will conclude by a brief discussion of open issues and future work.
Capturing and Editing the Real World in Motion
Date: Wed. 10 (morning)
Christian Theobalt Max-Planck-Institute for Informatics, Saarbruecken, Germany
Show/hide speaker biography
Christian Theobalt is a Professor of Computer Science and the head of
the research group "Graphics, Vision, & Video" at the
Max-Planck-Institute for Informatics, Saarbruecken, Germany. From 2007
until 2009 he was a Visiting Assistant Professor in the Department of
Computer Science at Stanford University. He received his MSc degree in
Artificial Intelligence from the University of Edinburgh, Scotland, and
his Diplom (MS) degree in Computer Science from Saarland University, in
2000 and 2001 respectively. In 2005, he received his PhD (Dr.-Ing.) from
Saarland University and Max-Planck-Institute for Informatics.
Most of his research deals with algorithmic problems that lie on the
boundary between the fields of Computer Vision and Computer Graphics,
such as dynamic 3D scene reconstruction and marker-less motion capture,
computer animation, appearance and reflectance modelling, machine
learning for graphics and vision, new sensors for 3D acquisition,
advanced video processing, as well as image- and physically-based rendering.
For his work, he received several awards, including the Otto Hahn Medal
of the Max-Planck Society in 2007, the EUROGRAPHICS Young Researcher
Award in 2009, and the German Pattern Recognition Award 2012. Further,
in 2013 he was awarded an ERC Starting Grant by the European Union. He
is a Principal Investigator and a member of the Steering Committee of
the Intel Visual Computing Institute in Saarbruecken. He is also a
co-founder of a spin-off company from his group - www.thecaptury.com -
that is commercializing a new generation of marker-less motion and
performance capture solutions.
Abstract
Even though many challenges remain unsolved, in recent years computer graphics algorithms to render photo-realistic imagery have seen tremendous progress. An important prerequisite for high-quality renderings is the availability of good models of the scenes to be rendered, namely models of shape, motion and appearance. Unfortunately, the technology to create such models has not kept pace with the technology to render the imagery. In fact, we observe a content creation bottleneck, as it often takes man months of tedious manual work by animation artists to craft models of moving virtual scenes.
To overcome this limitation, the graphics and vision communities has been developing techniques to capture 4D models of dynamic scenes from real world examples, for instance from footage of real world scenes recorded with cameras or other sensors. One example are performance capture methods that measure detailed dynamic surface models, for example of actors or an actor's face, from multi-view video and without markers in the scene. Even though such 4D capture methods made big strides ahead, they are still at an early stage. Their application is limited to scenes of moderate complexity in controlled environments, reconstructed detail is limited, and captured content cannot be easily modified, to name only a few restrictions.
In this talk, I will elaborate on some ideas on how to go beyond this limited scope of 4D reconstruction, and show some results from our recent work. For instance, I will show how we can capture more complex scenes with many objects or subjects in close interaction, and I will show how we can capture higher shape detail as well as material parameters of scenes. The talk will also show how one can effectively reconstruct very challenging scenes of a smaller scale, such a hand motion. Further on, I will discuss how we can capitalize on more sophisticated light transport models to enable high-quality reconstruction in much more uncontrolled scenes, eventually also outdoors, with only few cameras, or just a single one. Ideas on how to perform these reconstructions in real-time will also be presented.
Per-pixel representation for accurate geometry recovery
Date: Wed. 10 (afternoon)
In So Kweon
KAIST, KoreaShow/hide speaker biography
Abstract
Computing the extremely fine detailed shape information of the scene is becoming increasingly important for many applications, such as 3D printing and digital e-heritage. In this talk, we review our recent works on the recovery of extremely fine detailed geometry, in which we fuse shading cues or photometric stereo cues with a given rough shape information. To obtain the high-quality full 3D model by integrating multiple rough 3D shapes by a conventional structure-from-motion approach, we present a per-pixel representation for the planar parameterization of 3D mesh in which the geometry is represented by the base mesh and its 2D displacement map. We demonstrate that our per-pixel planar representation is very effective for fine-detailed geometry recovery since the displacement map can convey millions of photometric surface normal entries very efficiently. We display our high-quality 3D results and discuss its accuracy in many practical scenarios. In the case of Kinect depth map refinement, we exploit the shape-from-shading technique with a per-pixel local lighting representation. Our per-pixel local lighting model is more robust than the conventional global parametric lighting model when there exist interreflections and shadows in the scene. We also present high-quality 3D depth maps based on the rough depth map given by a Kinect sensor.
Democratizing 3D human capture: getting hairy!
Date: Thu. 11 (morning)
Hao Li
University of Southern California, USAShow/hide speaker biography
Hao Li has been an assistant professor of Computer Science at USC since 2013 and works at the intersection of computer graphics and vision. His algorithms on dynamic shape reconstruction, non-rigid registration, and human digitization are widely deployed in the industry ranging from leading VFX studios to medical imaging companies. As a research lead at Industrial Light & Magic, he developed the next generation real-time facial performance capture technologies for virtual production and visual effects. With Artec Group, he also created a 3D scanning software called shapify.me which allows anyone to create their own 3D printed figurine from home using a Kinect. Hao also spent a year as a postdoc at Columbia and Princeton Universities in 2011 after receiving his PhD from ETH Zurich in 2010. He was a visiting professor at Weta Digital in 2014 and visiting researcher at EPFL in 2010, Industrial Light & Magic (Lucasfilm) in 2009, Stanford University in 2008, National University of Singapore in 2006, and ENSIMAG in 2003. He was named one of the worldfs top 35 innovators under 35 by MIT Technology Review in 2013.
Abstract
The age of social media and immersive technologies has created a growing need for processing detailed visual representations of ourselves. With recent advancements in graphics, we can now generate highly realistic digital characters for games, movies, and virtual reality. However, creating compelling digital content is still associated with a complex and manual workflow. While cutting-edge computer vision algorithms can detect and recognize humans reliably, obtaining functional digital models and their animations automatically still remains beyond reach. Such models are not only visually pleasing but would also bring semantical structure into the captured data, enabling new possibilities such as intuitive data manipulation and machine perception. With the democratization of 3D sensors, many difficult vision problems can be turned into geometric ones, where effective data-driven solutions exist. My research aims at pushing the boundaries of data-driven digitization of humans and developing frameworks that are accessible to anyone. Such system should be fully unobtrusive and operate in fully unconstrained environm ents. With these goals in mind, I will showcase several highlights of our current research efforts from dynamic shape reconstruction, human body scanning, facial capture, and the digitization of human hair. By the end of this decade, our homes will be equipped with 3D sensors that digitally monitor our actions, habits, and health. These advances will help machines understand our appearances and movements, revolutionizing the way we interact with computers, and developing new forms of live communication through compelling virtual avatars.