Significant modern papers on facial animation:
E. Chuang and C. Bregler. (2002). Performance driven facial animation using blendshapes interpolation. Technical Report CS-TR-2002-02, Stanford University.
Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. (2005). Face transfer with multilinear models. ACM Trans. Graph., 24, 3, 426–433.
2 comments:
Performance Driven...:
To begin, they say a "nearly markerless" capture is needed. I assume the markers they use are the three crosses on the face. How is this really that much of an improvement other than being fewer markers? They seem to emphasize it more than they should. In Section 3 they state "To obtain the database, a handful of images from the video sequence are first labeled with facial contours." Is this an automated process or is it done by hand?
One thing I found severely lacking in their method is the "upper face", especially the eyebrows. If you look at the examples at the bottom, the 3D model mimics the eyes and mouth fairly well (other than the creepy super large irises). Where it fails is that the forehead and brow stay static. Figure 8b's two rightmost faces vary considerably between the human and 3D model. In general, the 3D model--to me--seems to always have a slight look of surprise because the eyebrows seem to be angled upward, whereas the human actor's show variability.
Is the big question mark used as a summation (sigma) or multiplication sequence (pi) or something else?
Multilinear...:
These papers all seem to be the same: data, missing data, tracking, transfer. The modern papers (this and the last for example) are moving towards more sophisticated statistics. The last used PCA and this paper bumps up to using SVD. It also uses multilinear algebra, which I haven't really studied. But, in the end, everything seems very similar to me. Maybe I'm missing something but I read this as "let's do the normal stuff with a different mathematical method!"
They choose key shapes from the source and ask a modeler to create target
models for each key shape. However, facial animators are already aware of the
good "basis set" for mouth shapes, so the traditional method of creating target
models for the known key shapes makes more sense. It doesn't preclude them from
doing automatic blendshape animation using feature tracking exactly the way
they are doing it already.
I think they
used far too many control points to define the curves of lips and eyes.
Less control points would yield better results. Each control point really
represents a muscle action on the curve, and we don't have that many unique
muscles that can be actuated. Reducing the number of points would also be
better from the vision perspective because only good corner points can be tracked
accurately, not arbitrary points.
"Solutions that strictly minimize
the least square error of equation (3) often introduce large
positive and negative weights that counteract and
compensate for one another."
That's a good point. However, it would be better compensated for by looking
for a solution that minimizes the difference between weights in sequential frames.
Post a Comment