I used Google Veo to bring my selfies and photos to life – and things got hilariously weird

Tiernan Ray / Elyse Betters Picaro / ZDNET

Google this week made available the latest iteration of its Veo video-generation tool to users of its Gemini artificial intelligence program who have a “Pro” or “Ultra” account.

Also: I used Google’s Flow AI to create my own videos with sound and dialogue – Here’s how it went

Veo has been available in preview for some time now. What’s new with the latest implementation is the ability to begin your video by uploading a still image to serve as the initial frame. (ZDNET’s Prakhar Khanna has reported his experience using the capability as a built-in feature of his Honor 400 phone, versus using it through the website as I did.)

How to use Veo to generate videos from photos

You give the system a prompt, press enter, and Veo creates an eight-second video using your uploaded photo as a reference point from which to build the first frame of video. Veo adds sound, including music, footsteps, and other incidentals.

Videos take several minutes at a time to develop.

Also: This interactive AI video generator feels like walking into a video game – how to try it

In my testing so far, I find Veo’s implementation both fascinating and a bit creepy.

My results with Veo’s photo-to-video feature

I tried several still images I had taken, including a selfie and some street photography. Seeing one’s pictures come to life, if you will, is jarring. It is disconcerting how well it works, and, as the photographer, it’s disconcerting how the result contrasts with one’s memory of the event.

Also: This new AI video editor is an all-in-one production service for filmmakers – how to try it

The good aspects are the quality of the video, which is in keeping with the photographic image. Things such as perspective of a scene are generally well maintained, and moving objects in the background are, in some cases, well-orchestrated to be consistent.

1. Jogger running along the promenade

Here, for example, is a video I took of a jogger on the East River promenade in Manhattan. I gave Veo the prompt, “Please make a video in which the jogger continues to run into the distance along the promenade.”

Below is the original still image followed by the Veo video.

jogger-promenade — Tiernan Ray for ZDNET

The motion of the jogger is good, as is the movement in space as if from the point of view of the photographer.

This is a substantial technical achievement, to my mind, on a very basic level. Remember that this is eight seconds of 720p-quality resolution, which is rendered at the standard film rate of 24 frames per second. That means Veo has to create, in a few minutes, 192 frames from the initial image. Given how little effort it took me as the user, it would be easy to overlook just how significant that is from a purely technical point of view. The power of all that computing in the cloud really shines in something like this.

One also, however, sees the artifacts that crop up from Google’s predictions about the frames, giving the thing a rather eerie quality.

The jogger on the right, for one, doesn’t really look the same as the jogger in my photo, only vaguely similar (hair is different, stride is different).

Also: Forget Sora: Adobe launches ‘commercially safe’ AI video generator. How to try it

Another artifact is that, at the actual moment in time, the figure moving toward the camera on the left-hand side of the picture was strolling, not jogging. I think that’s clear in the image. But Veo rendered that person jogging as well.

Another item emerges on the FDR Drive highway in the upper left. One can see vehicles that mysteriously vanish at some point in their movement. That is a constant theme of the Veo videos, the inability of the program to fully maintain continuity.

2. Woman walking past The Horseshoe Bar

A surprising achievement emerged when I submitted a photograph of a bar on 7th Street in the East Village, called 7B, or The Horseshoe Bar. I added the prompt, “Can you show the woman walking past the building?”

The resulting video shows good street perspective but what’s really surprising is that it managed to fill in the white sign above the door on the unseen side of the building that shows the horseshoe symbol. That suggests Veo was able to find in some data a completion of the bar, which is rather amazing.

Also: Midjourney’s new animation tool turns images into short videos – here’s how

The unseen buildings that Veo fills in, however, as the video turns the corner, are not the actual buildings on that street, a case of Veo coming up with a reasonably decent substitute. Notice a strong artifact: Veo gave the walking individual a blue hat, which it seemed to have added erroneously based on the person in my photograph walking in front of a blue sign on the building.

3. Person in white boots gets up and off train

Some artifacts are more striking. In a second piece of street photography, I uploaded a picture of someone sitting in a subway car with white boots. I gave the prompt, “The person in the white boots gets up from their seat and gets off the train.” What was produced was quite striking, and pretty good for an approximation of how this figure might move. The person doesn’t, however, exit the train.

subway-white-boots — Tiernan Ray for ZDNET

When I persisted with a second prompt, “That’s great, but one adjustment. Is it possible to show the doors of the train car opening and the person in the white boots actually walking out the doors to exit the train?”, Veo produced a second version.

This time, the individual at least is shown moving toward an exit, as doors are shown sliding open. However, several artifacts here fail a reality and consistency test. For one thing, no one exits a New York City subway car at the — end — of the car; they exit at the side doors, as that is where the platform is. Second, the sliding doors depicted at the end of the car do not exist in New York City subway cars. Those exits have one, not two, sliding doors.

Also: You can produce video ads in seconds with Amazon’s new AI tool – here’s how

Third, it’s clear in the original still image, based on the light and the details seen through the rear window of the train car, that this is not the last car in the line; there is another car behind it. Yet, when the doors open in the video, we see the platform and tracks, suggesting this car is now the last car in the line. It’s an inability here for Veo to properly infer from detail the total structure of the environment.

Last but not least, in a fourth inconsistency, we can see through the open doorway that the platform is directly beneath the train, so that the train is — riding over the platform — rather than the tracks.

4. Thunder and lightning with rain

I submitted a rainy night picture on Lexington Avenue in Manhattan and asked for “A video of thunder and lightning and serious rain in this street scene.” The result is rather cartoonish, but it’s certainly a fun moment with the right intent.

5. Dark bathroom selfie

Putting one’s likeness into Veo has its own special creepiness, or amusement, or both, depending on your sense of humor.

Also: The best AI image generators of 2025: Gemini, ChatGPT, Midjourney, and more

I first used a very dark bathroom selfie. I was impressed with the range of imaginative animation. My features, however, seem to morph drastically into someone else’s likeness, and I’m not sure whose. (I’ve been told I look like Thom Yorke of the band Radiohead sometimes.)

tr-bathroom-selfie — Tiernan Ray for ZDNET

6. Professional headshot

In another instance, I used my ZDNET headshot and asked Veo, “Can you make a video of this man doing the cha-cha-cha?” I like the resulting movement, accompanying music, and the very loud boot sounds are very amusing.

Tiernan Ray for ZDNET

However, the creepy part here is that without further prompting, Veo has left my face a rigid mask of expression, which doesn’t make sense in a dance video. In fact, my head doesn’t really move at all; it’s fixed.

7. Las Vegas selfie

I uploaded yet another selfie, taken at Caesar’s Palace casino and hotel in Las Vegas, and prompted, “Please make a video of this man in the leather jacket dancing tango with the statue of Venus that is in the background.” Well, Veo did not succeed in making us dance, but the resulting floor show by my likeness is amusing. So is the music. Notice that the sleeves of my leather jacket turn black, for some reason.

las-vegas-talent — Tiernan Ray for ZDNET

8. A historical mashup with John C. Calhoun

On the hunch that manipulating historical figures might be disallowed, I tried creating a historical mashup to test the matter. I uploaded a picture of onetime US vice president John C. Calhoun from the US Library of Congress, and requested that Veo make a video of Calhoun dancing the cha-cha-cha.

john-c-calhoun-full-length — U.S. Library of Congress

Veo started to make a video, then quit with the message, “I can’t generate that video. Try describing another idea. You can also get tips for how to write prompts and review our video policy guidelines. Learn more.”

9. Making Scarlett laugh

I then tried uploading a picture of actor/director Scarlett Johansson from her Wikipedia page, and requested “a video of this woman laughing.” Again it started and then quit with the same error message.

scarlett-johansson-8588 — Harald Krichel

10. Making myself laugh

I double-checked the matter with my own headshot, as a non-historical, non-famous person, and was able to get Veo to make a video of me laughing (albeit looking not at all like the original headshot).

Tiernan Ray for ZDNET

That suggests that Veo may be built with safeguards against manipulation of historical or pop culture images, though I cannot be certain.

Should you try Google Veo?

The Veo service, in preview, is certainly not without glitches.

After my first couple of successes, I repeatedly got a warning that I would have to wait to do more videos, as the service is rate-limited at the moment. There are complaints about this in the user fora for Gemini, including people being denied the service for over 24 hours, and a long explanation of the matter by a volunteer product “expert.” Basically, video is bandwidth-, compute- and memory-intensive, so it’s not surprising Google would have to limit usage at the outset.

The most direct solution is to upgrade to the higher level of Gemini, the “Ultra” plan, though this means going from $19.99 a month to $249 a month (discounted for the first three months to $125). That’s a steep price just to be able to get around what seem rather harsh limits.

Also: Is Google’s $250-per-month AI subscription plan worth it? Here’s what’s included

Even after subscribing to Ultra, I reached a limit after five videos, with an error message saying “something went wrong.” Another explainer post in the user forum suggests that there is no clear limit for the Ultra plan; it’s an obscure matter of AI “credits” in the cloud service.

That sudden shutdown contradicts Google’s terms of service that say, “You’ll get a notification when you’re close to the limit. The notification will tell you how many videos you have left.” (Learn more in the Gemini apps help section about various Gemini limits.)

The alternative to Ultra is even more complex, using the professional “Flow” development tool instead of the Gemini app.

In addition to usage limits, users have complained of technical glitches, such as videos that lack sound.

Also: I tested Google’s Veo 2 image-to-video generator on Android – here’s my verdict

The overall impression is that this is very much a beta product.

You may wonder about the dangers of deepfake videos. Google has posted a number of points about security measures for Gemini apps generally, but there is no clear statement about Veo videos.

Overall, Veo seems to me an interesting trick, though Veo doesn’t hold my interest after the initial fascination has worn off. As a photographer, I’m more interested in a single authentic moment than I am in 192 inauthentic moments.

For those not involved in the film industry, Veo may provide a window into how AI can increasingly be used to fill in for actors, or extend likenesses to create action without actually employing the actors.

Given stronger algorithms and additional data (scene data, character data, etc.), I can imagine Hollywood could use this technology to produce moving images that serve real stories. It’s an eye-opener about where video is going in an age of AI.

Get the morning’s top stories in your inbox each day with our Tech Today newsletter.

Original Source: zdnet

Discovery Tech

I used Google Veo to bring my selfies and photos to life – and things got hilariously weird

How to use Veo to generate videos from photos