Media synthesis is pretty awesome. After trying a few things, I figured out a way to make decent avatars using a bunch of machine learning models so I figured I could try to create a text-to-video platform.
What this means is that with this project, you can simply enter your text in the user interface and the platform will generate a unique avatar and make it speak the text you entered!
But first things first, let me show you what it looks like so far:
https://www.youtube.com/watch?v=pKTBFBCvLkE
I also added a few things in the project to make it better, like customizable video templates to create amazing backgrounds for any videos. That’s right, you can upload some content, change some texts and the video template engine will render an awesome video background.
Right now, the main challenges are getting the best avatar quality possible and reducing render time.
As far as getting better avatar quality, i’m currently working on implementing a new process for generating avatars, which involves a lot of work… 😅
And in order to reduce rendering time, i’m considering compiling my machine learning models using NVIDIA TensorRT which is pretty cool because it can supposedly accelerate inference by up to 6x!
I’m also working on getting funding to purchase some NVIDIA Tesla T4 GPU instances. AWS and GCP are both good candidates.
Well, of course it still needs some work before I can publish the MVP but i’m getting there! Right now, what I need the most is to get my kickstarter project funded so I can purchase some GPU instances and work on my infrastructure.
Anyway, I hope you’ll like this project as much as I do!