Automating Videos

DaveMc

Newbie
Hey all, I am wanting to experiment video automation.

I know Corey has some insight and would love some tips on where to start if anything to try youtube video automation. I have basic coding knowledge and would love to try set something up however I am unsure where to start.

Thanks!
 
  • Like
Reactions: Ted

Automated & Human Video Workflow Overview


Hey Dave!


I did a ton of automated video in 2024 and pivoted to human content for 2025 with Vidzilla, but I’m still doing a lot of automation—it works. For automated content, I typically target 60-90 seconds in length, prioritizing quality.


Here's an example video:
🔗




Basic Workflow


  1. Script
  2. Audio
  3. Images
  4. Animation/Sync
  5. Render



1. Script


If you're working at scale, you might want to define:


  • Number of paragraphs & sentences
  • Purpose of each sentence

For example:


"Prompt: First paragraph should introduce our product. Sentence one gives the name and a quick overview. Sentence two highlights the benefits. Sentence three explains how easy it is to get started."

This ensures consistency across thousands of videos. Not mandatory but useful.


  • I use GPT to write but I'm testing Deepseek. I always use the best model for the task.
  • For 10-20 videos, I hand-edit scripts. For 1,000+, I run a helper script to filter out weaker scripts.
  • I don’t use brand voice but assign emotiondynamically, e.g.:
    • “For the third sentence, emphasize excitement in plain language.”
    • “Explain it clearly, highlighting family fun.”
  • CTAs (Call-to-Actions) are injected, e.g.:
    • “In the last sentence of the second paragraph, invite users to reach out for assistance.”

At the end, I split the script so TTS can process each paragraph separately.


  • Example: A 5-paragraph script becomes TTS1.txt, TTS2.txt, TTS3.txt, etc.



2. Audio


  • Eleven Labs API for TTS
  • Saved favorite voices & settings for easy recall
  • Highest quality settings → Outputs: TTS1.wav, TTS2.wav, etc. as mono 48K, 16bit

Audio Processing


  • Python to trim & reduce large gaps in .wav files
  • Optional VST plugin stack(classic broadcast voiceover chain):
    • 1176 compressor → LA2A compressor → EQ
    • Some Eleven Labs voices need EQ (sibilance or lack of body)
  • Loudness settings:
    • Voice files: -18 LUFS (Adobe Audition)
    • Mixed music files: -14 LUFS (Python & Audition)
  • Python stitches audio files into master.wav



3. Images


Models change, so the best options vary.
Current go-to: Ideogram, Recraft, Stability (all via API).


Image Strategy


  • Avoid AI for technical items (it makes up fake objects). Use real photos.
  • Number of images per video: Usually 8 images per video, with transitions.
  • Consistent image types per video template (e.g., same type of image at each spot).
  • Sometimes, I use Python to analyze scripts & generate 8 matching image prompts.
  • Highly targeted prompts, e.g.:
    • “A local HVAC company owner interacting with his community at a neighborhood picnic.”
  • Focus on emotion & connection, not decoration. Images sell the story.

Thumbnail Creation


  • AI-generated via Ideogram or Python.
  • Python detects product photo size/shape, fits it to a canvas, and overlays text.
  • Python checker script for detecting bad hands & artifacts (MS Phi works well).

No AI Video Tools


  • AI video doesn't connect with humans.
  • Pexels for real video (avoid Storyblocks—legal risks).



4. Animation & Sync


After Effects via CLI → Better results.


Steps


  • Python generates static PNGs for end, side, and image panels separately.
  • Animated end slides → Exported as MP4, then subtract duration from master.wav to get image slide durations.
  • Image slides animated into an MP4 (typically using slide-and-blur transitions with gradual scale-up expressions).
  • Optional overlays (e.g., light leaks) for style.
  • Text slides animated to match script segment durations (from split scripts).
  • Final text animation video: textSlides.mp4.



5. Render


Final render components:


  • intro.mp4
  • slides.mp4
  • textSlides.mp4
  • outro.mp4
  • master.wav

Rendering Process


  • Python stacks MP4s (slides.mp4 underneath textSlides.mp4 in Z-order).
  • Scene transitions applied with Python.
  • Final video matches master.wav duration, ensuring smooth sync.
  • YouTube-ready quality settings (balanced file size vs. quality).
  • Python generates SEO-friendly file names for thumbnails & videos.



Bonus: Automating YouTube Upload & SEO


  • AI agent handles YouTube uploads, descriptions, and metadata.
  • Python script generates a unique blog post with video schema markup.
  • AI agents can repurpose content for social media, Vimeo, etc., maximizing reach.



Final Thoughts


Hope this was helpful—just a quick brain dump. Let me know if you have any questions! I formatted this using GPT so I could be more like Lucian! 🚀
 
I know this isn't really totally automated but...

I've tried inVideo and you can get something that is reasonable in pretty quick time.

It will all depend on what you want to use it for whether you think its appropriate.
 
Last edited:

Automated & Human Video Workflow Overview


Hey Dave!


I did a ton of automated video in 2024 and pivoted to human content for 2025 with Vidzilla, but I’m still doing a lot of automation—it works. For automated content, I typically target 60-90 seconds in length, prioritizing quality.


Here's an example video:
🔗




Basic Workflow


  1. Script
  2. Audio
  3. Images
  4. Animation/Sync
  5. Render



1. Script


If you're working at scale, you might want to define:


  • Number of paragraphs & sentences
  • Purpose of each sentence

For example:




This ensures consistency across thousands of videos. Not mandatory but useful.


  • I use GPT to write but I'm testing Deepseek. I always use the best model for the task.
  • For 10-20 videos, I hand-edit scripts. For 1,000+, I run a helper script to filter out weaker scripts.
  • I don’t use brand voice but assign emotiondynamically, e.g.:
    • “For the third sentence, emphasize excitement in plain language.”
    • “Explain it clearly, highlighting family fun.”
  • CTAs (Call-to-Actions) are injected, e.g.:
    • “In the last sentence of the second paragraph, invite users to reach out for assistance.”

At the end, I split the script so TTS can process each paragraph separately.


  • Example: A 5-paragraph script becomes TTS1.txt, TTS2.txt, TTS3.txt, etc.



2. Audio


  • Eleven Labs API for TTS
  • Saved favorite voices & settings for easy recall
  • Highest quality settings → Outputs: TTS1.wav, TTS2.wav, etc. as mono 48K, 16bit

Audio Processing


  • Python to trim & reduce large gaps in .wav files
  • Optional VST plugin stack(classic broadcast voiceover chain):
    • 1176 compressor → LA2A compressor → EQ
    • Some Eleven Labs voices need EQ (sibilance or lack of body)
  • Loudness settings:
    • Voice files: -18 LUFS (Adobe Audition)
    • Mixed music files: -14 LUFS (Python & Audition)
  • Python stitches audio files into master.wav



3. Images


Models change, so the best options vary.
Current go-to: Ideogram, Recraft, Stability (all via API).


Image Strategy


  • Avoid AI for technical items (it makes up fake objects). Use real photos.
  • Number of images per video: Usually 8 images per video, with transitions.
  • Consistent image types per video template (e.g., same type of image at each spot).
  • Sometimes, I use Python to analyze scripts & generate 8 matching image prompts.
  • Highly targeted prompts, e.g.:
    • “A local HVAC company owner interacting with his community at a neighborhood picnic.”
  • Focus on emotion & connection, not decoration. Images sell the story.

Thumbnail Creation


  • AI-generated via Ideogram or Python.
  • Python detects product photo size/shape, fits it to a canvas, and overlays text.
  • Python checker script for detecting bad hands & artifacts (MS Phi works well).

No AI Video Tools


  • AI video doesn't connect with humans.
  • Pexels for real video (avoid Storyblocks—legal risks).



4. Animation & Sync


After Effects via CLI → Better results.


Steps


  • Python generates static PNGs for end, side, and image panels separately.
  • Animated end slides → Exported as MP4, then subtract duration from master.wav to get image slide durations.
  • Image slides animated into an MP4 (typically using slide-and-blur transitions with gradual scale-up expressions).
  • Optional overlays (e.g., light leaks) for style.
  • Text slides animated to match script segment durations (from split scripts).
  • Final text animation video: textSlides.mp4.



5. Render


Final render components:


  • intro.mp4
  • slides.mp4
  • textSlides.mp4
  • outro.mp4
  • master.wav

Rendering Process


  • Python stacks MP4s (slides.mp4 underneath textSlides.mp4 in Z-order).
  • Scene transitions applied with Python.
  • Final video matches master.wav duration, ensuring smooth sync.
  • YouTube-ready quality settings (balanced file size vs. quality).
  • Python generates SEO-friendly file names for thumbnails & videos.



Bonus: Automating YouTube Upload & SEO


  • AI agent handles YouTube uploads, descriptions, and metadata.
  • Python script generates a unique blog post with video schema markup.
  • AI agents can repurpose content for social media, Vimeo, etc., maximizing reach.



Final Thoughts


Hope this was helpful—just a quick brain dump. Let me know if you have any questions! I formatted this using GPT so I could be more like Lucian! 🚀
That is awesome thanks @Corey !
 
Back
Top