Thursday , March 4 2021
Home / Science and technology / The neural network is trained to create talking heads

The neural network is trained to create talking heads

Нейронную сеть обучили создавать говорящие головы

The developers have created a system that long-term study on a large set of video data.

Training convolutional neural networks, the Russian developers of Samsung and the SKOLKOVO Institute of science and technology (Skoltech) animated photography, portraits and paintings.

It is known that to synthesize realistic avatars is difficult for two reasons. First, the human head has a high photometric, geometric and kinematic complexity: difficulties arise not only in the process of face modeling, but the mouth, hair and clothes. Another complicating factor is the acuity of the human visual system, which results in the effect of “sinister valley.” According to the hypothesis, if the robot makes mistakes in trying to emulate someone causes an uncontrollable disgust of people-observers.

To create a personalized model of a talking head with artificial intelligence requires training on a large set of images of the hero. However, in many applications, such models must be obtained from multiple images of a person, perhaps even from one. The developers have created a system that long-term study on a large set of video data and generates the mask of the speaker face. The mask represents the border of the face and the basic facial expressions. The relationship resulting mask with the source video is stored in a vector, so the mask can be transferred to individual images.

In the process of metalocene neural network automatiseret the process of selecting and configuring components. Three models were trained on a large database of video interviews with celebrities, found in the vast Youtube. Network-embedder transformed masks, coupled with the characteristics of the person, in vectors. These vectors were used to initialize the network settings generator. And network generator, in turn, has formed a video which the network discriminator was compared with the original and appreciated the realism of the result.

The system was tested by applying as the lead of the video the video with the front camera, as well as images which is carried out transfer — selfie-photos. 32 images sufficient to obtain high-quality “talking head”.

© 2019, paradox. All rights reserved.

Check Also

As the Chinese are manually cut down these caves?

In 1992, Wu anai, incredibly curious a resident of the Chinese village of Lunjiao, collected …

Leave a Reply

Your email address will not be published. Required fields are marked *