I am currently thinking of is to use ControlNet with stable diffusion to generate images/pose and combine them into a video. Another solution may be generate the video using text to video first and use deepfake/ReActor to replace the character.
However, I’m open to exploring other tools or libraries that might better suit this requirement. As I want the character keep changing pose, Heygen may not work.
I have looked into basic image-to-video conversion tools, but haven’t found many that explicitly support adding prompts or annotations in a flexible manner. Any suggestions would be greatly appreciated.
Thank you in advance for your help!