Deepfake: When seeing is no longer believing

Maddali Sowmya
9 min readNov 19, 2020

Deepfake is a combination of “deep learning” and “fake”. Deepfakes are hyper-realistic videos or images digitally manipulated to depict people in scenarios that never actually happened. Though fake content is not a new concept, deepfakes attract more leverage since it uses Machine Learning(ML) and Artificial Intelligence(AI) techniques to manipulate visual and audio content. Though the outcome of deepfakes depends on the person using it, it will not replace the fact that the cons outnumber the pros.

For instance, Deepfakes can be used to bring back the voice of a loved one we have been longing for, badly. We could re-live with many great personalities who are no longer alive and hear them give their famous speeches, which we have often quoted. We could also go back in time to see and hear many great scientists proving their landmark theories.

Deepfakes draw parallels to a time-travelling movie, wherein we could go back and forth in time to experience something which we desire to re-create. However, the number of malicious uses behind deepfakes largely dominates the positive ones, drawing much flak from the community.

For instance, in March of 2019, the CEO of a U.K-based energy firm listened over the phone, as his boss (Leader of the firm’s German parent company) ordered the transfer of €220,000 to a supplier in Hungary. News reports would later detail that the CEO recognised the “slight German accent and the melody” of his boss’s voice and followed the order to transfer the money within an hour. The caller tried several other times to urge second round of cash transfer, but by then the UK executive had grown suspicious and did not make any more transfers.

The €220,000 initially transferred, was tracked to Mexico and channelled to various accounts, and the energy firm reported this incident to their insurance company. The insurance company speculates that the thieves used AI to create a deepfake of the German lead’s voice. Though the reports have since questioned the lack of supporting evidence, we cannot eliminate the preliminary conclusion.

This example is an illustration of how technology can act as a tool to hoodwink people into believing something as real.

Exponents of Deep fakes have a history of targeting social media platforms, where conspiracies, rumours, and misinformation gets beyond bound in no time. Analysis suggests that social media platforms are the perfect place to find training data for deepfake models, which results in unexpected and expected outcomes.

The majority of celebrity images accessible easily on the internet are used to create a variety of memes, which are also deepfakes, as illustrated in Fig: 1 and Fig: 2.

Fig: 1 — On the left, we have Simon Cowell (A judge of the famous reality show, The X Factor) and on the right, we have his lookalike image generated by deepfake.
Fig: 2 — On the left, we have Jennifer Lawrence (Renowned Hollywood actress), and on the right, we have her image generated by deepfake.

How do Deepfakes work?

A deep-learning system has the potential to create a very convincing counterfeit by studying images and videos of the target person from multiple angles and then mimicking behaviours and speech patterns.

Once the initial fake has been produced, a method known as GANs (Generative Adversarial Networks) makes it a lot more believable. The GANs seek to detect flaws in the counterfeit, leading to improvements which address the flaws.

However, this process of detecting flaws does not end in one go. There are multiple iterations over which errors are detected, and suitable improvements are made such that the output in deepfake video/image is at its best.

Basic Concept governing deepfake:

Let us consider the scenario that we intend to transfer the face of person A to person B, in a video.

Firstly, we would like to gather hundreds of images for both persons concerned. We would then build an encoder to encode these pictures into files of smaller sizes, by employing a deep learning CNN network. Then we use a decoder to reconstruct the image from smaller sized files. This autoencoder (the encoder and therefore the decoder) has over 1,000,000 parameters, but still not close enough to remember all the photographs. So, the encoder must extract the key features to recreate the first input, as shown in Fig: 3.

To decode the features, we use separate decoders for person A and person B. Now, we train the encoder followed by the decoders, using the principle of backpropagation, such that input closely matches the output. One disadvantage is that the process is time-consuming (repeated processing of images for about 10 million times) and takes about three days to get decent results, even with the support of GPU.

Fig : 3 — Illustration of Deepfake encoder and decoder

After the training, we process the video frame-by-frame to initiate the swap of a person’s face with another. Using face detection, we extract the features of the face of person A and feed it into the encoder. Rather than feeding encoded data to the decoder of A, we use the decoder of B to reconstruct the image. In essence, we draw person B using the features of A from the original video, as illustrated in Fig: 4. Finally, we merge the newly created face with the original image.

Fig: 4 — Illustration of the decoder of person B to reconstruct the whole picture

Intuitively, the encoder is detecting face angle, skin tone, facial expression, lighting, and other information which are vital to reconstruct person A. When we use the second decoder to reconstruct the image, we are drawing person B but with the context of A.

According to an MIT technology report, a tool that permits deepfakes can be quoted as “a perfect weapon for purveyors of faux news who want to influence everything from stock prices to elections.”

In older times, talented artists were entrusted with the role of editing images into/out of a video. However, in current times, AI tools are being used to achieve the same job, but with greater efficiency. The significant advantages are that much time gets saved and all we need to provide to these AI tools are the images and videos of the target.

Martin Giles, the chief of MIT Technology Review, suggests that “GAN’s didn’t create this problem, but they’ll make it worse.”

GAN

Generative Adversarial Networks (GANs) are an exciting recent innovation in machine learning. They are generative models, which create new data instances that resemble the original training data.

For instance, GANs can create images just like photographs of human faces, just that the faces do not belong to any real person.

Fig: 5 — The leftmost image is Jim Parsons (Actor famously recognised as Sheldon Cooper in Big Bang Theory TV Show) and the other images generated by GANs.

The principle governing GANs is that it describes a system that pits two neural networks against one another to enhance the standard of results.

How to detect deepfake?

Though there is no step by step instruction manual available to detect deepfake, there are some key aspects that help in deciphering whether or not what we are looking at is real, which are described below:

· Face — Is someone blinking their eye too much or too little? Do eyebrows fit their face? Is anyone’s hair in the wrong spot? Does their skin look airbrushed, or on the contrary too wrinkled?

· Audio — Does someone’s voice not match their appearance? Does the voice of a person that is known familiarly well suddenly sound very different than usual?

· Lighting — What sort of reflection, are a person’s glasses giving under a light? (Deepfakes often fail to represent well the natural optics of lighting.)

What are the Benefits of deepfake?

Deepfake technology finds positive applications in many industries like movie production, educational media, games and entertainment, social media, healthcare, material science, e-commerce, among others.

In the case of the film industry, deepfake primarily helps with updation of film footage without having to reshoot. Moviemakers get empowered to create new movies starring long-dead actors, utilising the functionality of computer graphics and advanced face editing in post-production. Deepfake also helps enhance amateur videos towards professional quality ones. Deepfake technology also allows for automatic and realistic voice dubbing for movies in any language.

In the case of games and entertainment, deepfakes enables multiplayer gaming and virtual chat worlds with increased telepresence. Deepfakes also enables reality feel to gamers through natural-sounding, digital doubles of individuals, among others. Further, it can digitally recreate an amputee’s limb and facilitate transgender people to see themselves in the gender of their preference. Deepfake technology also can help people with Alzheimer’s interact with a younger face they could remember, making them not feel out of place.

When it comes to the business field, deepfakes can transform e-commerce and advertising field in significant ways. For example, brands could contact models who are not supermodels yet but show fashion outfits on a variety of different skin tones, heights, giving the feel of supermodels to viewers. The technology has enabled virtual fitting for potential customers to preview how an attire would look on them before purchasing. This scenario generates a potential use case for being able to try on clothes online, synonymous to the dress-up games. Also, AI equips companies with unique artificial voices that help in differentiation of products to make branding distinction easier.

How Dangerous are deepfakes?

As per the replacement report of University College, London, “Deepfakes are the most dangerous form of crime through AI.”

Deepfake content are dangerous for several reasons. Firstly, it is challenging to trace by any detective agency. While deepfake detectors require training through hundreds of videos and must be victorious in every instance, malicious individuals only have to be successful once for causing the damage.

A second reason is that the sort of crimes deepfakes might be used for, such as discrediting a name by impersonation. The long-term effect of these kinds of crimes could lead to a distrust of audio and video evidence in general, which the researchers analyse to cause inherent societal harm.

Deepfakes cause a large majority of individuals to become nervous. Marco Rubio, the Republican senator from Florida and 2016 presidential candidate called them the fashionable equivalent of nuclear weapons. He also quoted that “ In the old days, if you wanted to threaten us, you needed ten aircraft carriers, and nuclear weapons, and long-range missiles. Today, you just need access to our internet system, to our banking system, to our electrical grid and infrastructure, and increasingly, all you need is the ability to produce a very realistic fake video that would undermine our elections, that would throw our country into tremendous crisis internally and weaken us deeply.”

In response to Mr Rubio’s statement, Tim Hwang, director of the Ethics and Governance of AI Initiative at MIT Media Lab, counters saying “As dangerous as nuclear bombs? I don’t think so. I think that certainly, they’re concerning and they raise a lot of questions, but I’m sceptical they modify the sport in a way that tons of individuals are suggesting.”

Who and What is at risk from deepfake?

On the surface level, it might seem like politics and the entertainment industry are the focal areas for combating manipulated video. However, in reality, targets for manipulation are no longer limited to government leaders or famous personalities.

Quoting Francesca Panetta, XR creative director of the MIT Center for Advanced Virtuality, “It doesn’t need to be an official to be a deepfake. It even might be your friend. It could be you that’s targeted”, highlighting the length and breadth of risk involved.

Conclusion

At the outset, deepfake was an exciting technology for the application of AI concepts to implement ideas that were once treated as an impossible task. However, these very ideas, when turned into products, carry with it the risk of extreme trouble, drawing the attention of whole humanity!

In the end, when seeing is not believing, whom can we really trust? Could democracy and truth still survive? are the questions to ponder and deliberate carefully.

This article was only possible thanks to the inspiration of many articles :

  1. What are deepfakes?: https://mitsloan.mit.edu/ideas-made-to-matter/deepfakes-explained
  2. Emergence of Deepfake technology: https://timreview.ca/article/1282
  3. Illustration of falsified videos via Deepfake: https://medium.com/deepfake/what-is-a-deepfake-9bcc72f3eb1e
  4. Learning process in Deepfake: https://medium.com/@jonathan_hui/how-deep-learning-fakes-videos-deepfakes-and-how-to-detect-it-c0b50fbf7cb9
  5. Deepfakes as the crime of the future: https://www.independent.co.uk/life-style/gadgets-and-tech/news/deepfakes-dangerous-crime-artificial-intelligence-a9655821.html

I would like to extend my gratitude to my professor Anand panduranga who guided me through the article and helped me in making it better.

--

--