Invited speakers

We are excited to bring to you world-class speakers to give keynotes at the conference.

Ian Reid

Building and using prior shape models 3D tracking and SLAM

Date: Tue. 9 (morning)

Ian Reid

The University of Adelaide, Australia

Show/hide speaker biography

Abstract

Over a number of years the authors have developed various methods for real-time tracking objects or tracking mobile cameras, both of which are essential precursors for many applications, not least augmented reality tasks. Though many algorithms for visual tracking are content with simply reporting a 2D x-y position in the image, a richer set of applications is available when the full pose of the canera, or the 6 degrees of freedom of a 3D object are tracked.

Our formulations rely on probabilistic simultaneous segmentation and tracking, typically using region-based methods. The genesis of this line of work was [1], in which we developed a generative probabilistic model of image formation into foreground and background regions. This model leads to an elegant, robust and fast system for segmentation and tracking of 2D deformable objects within a level-set framework, by considering the pixel-wise posterior (PWP) foreground/background membership of the image data and is the inspiration for much of our more recent work in 3D tracking and model building.

[2] extended the PWP formulation to the case of tracking the full six degrees of freedom of the pose of a known 3D object, and in [3] showed how fusion of cheap and simple MEMS accelerometers can result in a robust 3D hand tracker for use in an augmented reality application. Neverthless, a limitation of this 3D work is the requirement for the tracked target to have a fixed, known geometry. We addressed this in [4], [5], [6] showing how a Gaussian Process Latent Variable Model can effectively learn a shape space for 3D shapes that can then be fit to silhouette data at grun timeh for combined segmentation and tracking. [7] further extended this to show how such models could be incorpoarted into densely estimated maps in a visual SLAM framework.

More recently we have explored the idea of using the silhouette to estimate the camera pose and the shape of a stationary target. [8] demonstrated this idea on a mobile phone, using the phonefs internal MEMS sensing to give the rotational information of the camera, and with the silhouette providing visual cues to resolve the shape and translational pose parameters simultaneously. A similar framework is possible in a 3D tracking scenario for tracking objects using RGB-D data; [9], [10] show how models can be built and tracked from a moving RGB-D sensor.

The work was funded by EU Framework 7 project REWIRE, UK EPSRC project EP/H050795/1, and by the Australian Research Council through the Centre of Excellence for Robotic Vision CE140100016, and Laureate Fellowship FL130100102 to IR.

  1. C. Bibby and I. Reid, "Robust real-time visual tracking using pixel-wise posteriors," in Proceedings of European Conference on Computer Vision, 2008.
  2. V. Prisacariu and I. Reid, "Pwp3d: Real-time segmentation and tracking of 3d objects," Intfl J. of Computer Vision, 2012.
  3. V. A. Prisacariu and I. D. Reid, "Robust 3d hand tracking for human computer interaction," in IEEE Intfl Conference on Automatic Face and Gesture Recognition, march 2011, pp. 368-375.
  4. -, "Nonlinear shape manifolds as shape priors in level set segmentation and tracking," in IEEE Computer Vision and Pattern Recognition, june 2011, pp. 2185-2192.
  5. -, "Shared shape spaces," in IEEE Intfl Conf on Computer Vision, nov. 2011, pp. 2587-2594.
  6. V. A. Prisacariu, A. V. Segal, and I. D. Reid, "Simultaneous monocular 2d segmentation, 3d pose recovery and 3d reconstruction," in Asian Conf. on Computer Vision, 2012.
  7. A. Dame, V. A. Prisacariu, C. Y. Ren, and I. D. Reid, "Dense reconstruction using 3d object shape priors," in IEEE Computer Vision and Pattern Recognition, Jun 2013.
  8. V. A. Prisacariu, O. Kahler, D. W. Murray, and I. D. Reid, "Simultaneous 3d tracking and reconstruction on a mobile phone," in Intfl Symp. on Mixed and Augmented Reality, Oct 2013.
  9. C. Y. Ren, V. A. Prisacariu, D. W. Murray, and I. D. Reid, "Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data," in IEEE Intfl Conf. on Computer Vision, Dec 2013.
  10. C. Y. Ren, V. A. Prisacariu, O. Kahler, D. W. Murray, and I. D. Reid, "3d tracking of multiple objects with identical appearance using rgb-d input," in Intfl Conf on 3D Vision (3DV), Dec 2014.

Jean Ponce

Toward geometric foundations for computer vision

Date: Tue. 9 (afternoon)

Jean Ponce

ENS, France

Show/hide speaker biography

Abstract

I will argue that projective geometry should not be viewed merely as an analytical device for linearizing calculations (its main role in structure from motion), but as the proper framework for studying the relation between shape and its perspective projections.

I will first illustrate this argument with a classical problem from multi-view geometry: When do the visual rays associated with triplets of point correspondences converge, that is, intersect in a common point? Classical models of trinocular geometry based on the fundamental matrix and the trifocal tensor only provide partial answers to this fundamental question. I will use elementary tools from projective line geometry to provide necessary and sufficien conditions for convergence in terms of transversals to triplets of visual rays, and derive a novel and minimal parameterization of trinocular geometry.

I will then switch from points and lines to curved surfaces, and show that classical properties of the outlines of solid shapes bounded by smooth surfaces can be established in a purely projective setting. In particular, I will give new synthetic proofs of Koenderink's famous theorem on convexities and concavities of the image contour, and of the fact that the rim turns in the same direction as the viewpoint in the tangent plane at a convex point, and in the opposite direction at a hyperbolic point.

I will conclude by a brief discussion of open issues and future work.

Christian Theobalt

Capturing and Editing the Real World in Motion

Date: Wed. 10 (morning)

Christian Theobalt

Max-Planck-Institute for Informatics, Saarbruecken, Germany

 

Show/hide speaker biography

 

Abstract

Even though many challenges remain unsolved, in recent years computer graphics algorithms to render photo-realistic imagery have seen tremendous progress. An important prerequisite for high-quality renderings is the availability of good models of the scenes to be rendered, namely models of shape, motion and appearance. Unfortunately, the technology to create such models has not kept pace with the technology to render the imagery. In fact, we observe a content creation bottleneck, as it often takes man months of tedious manual work by animation artists to craft models of moving virtual scenes.

To overcome this limitation, the graphics and vision communities has been developing techniques to capture 4D models of dynamic scenes from real world examples, for instance from footage of real world scenes recorded with cameras or other sensors. One example are performance capture methods that measure detailed dynamic surface models, for example of actors or an actor's face, from multi-view video and without markers in the scene. Even though such 4D capture methods made big strides ahead, they are still at an early stage. Their application is limited to scenes of moderate complexity in controlled environments, reconstructed detail is limited, and captured content cannot be easily modified, to name only a few restrictions.

In this talk, I will elaborate on some ideas on how to go beyond this limited scope of 4D reconstruction, and show some results from our recent work. For instance, I will show how we can capture more complex scenes with many objects or subjects in close interaction, and I will show how we can capture higher shape detail as well as material parameters of scenes. The talk will also show how one can effectively reconstruct very challenging scenes of a smaller scale, such a hand motion. Further on, I will discuss how we can capitalize on more sophisticated light transport models to enable high-quality reconstruction in much more uncontrolled scenes, eventually also outdoors, with only few cameras, or just a single one. Ideas on how to perform these reconstructions in real-time will also be presented.

In So Kweon

Per-pixel representation for accurate geometry recovery

Date: Wed. 10 (afternoon)

In So Kweon

KAIST, Korea

Show/hide speaker biography

Abstract

Computing the extremely fine detailed shape information of the scene is becoming increasingly important for many applications, such as 3D printing and digital e-heritage. In this talk, we review our recent works on the recovery of extremely fine detailed geometry, in which we fuse shading cues or photometric stereo cues with a given rough shape information. To obtain the high-quality full 3D model by integrating multiple rough 3D shapes by a conventional structure-from-motion approach, we present a per-pixel representation for the planar parameterization of 3D mesh in which the geometry is represented by the base mesh and its 2D displacement map. We demonstrate that our per-pixel planar representation is very effective for fine-detailed geometry recovery since the displacement map can convey millions of photometric surface normal entries very efficiently. We display our high-quality 3D results and discuss its accuracy in many practical scenarios. In the case of Kinect depth map refinement, we exploit the shape-from-shading technique with a per-pixel local lighting representation. Our per-pixel local lighting model is more robust than the conventional global parametric lighting model when there exist interreflections and shadows in the scene. We also present high-quality 3D depth maps based on the rough depth map given by a Kinect sensor.

Hao Li

Democratizing 3D human capture: getting hairy!

Date: Thu. 11 (morning)

Hao Li

University of Southern California, USA

Show/hide speaker biography

Abstract

The age of social media and immersive technologies has created a growing need for processing detailed visual representations of ourselves. With recent advancements in graphics, we can now generate highly realistic digital characters for games, movies, and virtual reality. However, creating compelling digital content is still associated with a complex and manual workflow. While cutting-edge computer vision algorithms can detect and recognize humans reliably, obtaining functional digital models and their animations automatically still remains beyond reach. Such models are not only visually pleasing but would also bring semantical structure into the captured data, enabling new possibilities such as intuitive data manipulation and machine perception. With the democratization of 3D sensors, many difficult vision problems can be turned into geometric ones, where effective data-driven solutions exist. My research aims at pushing the boundaries of data-driven digitization of humans and developing frameworks that are accessible to anyone. Such system should be fully unobtrusive and operate in fully unconstrained environm ents. With these goals in mind, I will showcase several highlights of our current research efforts from dynamic shape reconstruction, human body scanning, facial capture, and the digitization of human hair. By the end of this decade, our homes will be equipped with 3D sensors that digitally monitor our actions, habits, and health. These advances will help machines understand our appearances and movements, revolutionizing the way we interact with computers, and developing new forms of live communication through compelling virtual avatars.