Automating Videos

DaveMc · Feb 1, 2025

Hey all, I am wanting to experiment video automation.

I know Corey has some insight and would love some tips on where to start if anything to try youtube video automation. I have basic coding knowledge and would love to try set something up however I am unsure where to start.

Thanks!

Corey · Feb 2, 2025

Automated & Human Video Workflow Overview

Hey Dave!

I did a ton of automated video in 2024 and pivoted to human content for 2025 with Vidzilla, but I’m still doing a lot of automation—it works. For automated content, I typically target 60-90 seconds in length, prioritizing quality.

Here's an example video:

Basic Workflow

Script
Audio
Images
Animation/Sync
Render

1. Script

If you're working at scale, you might want to define:

Number of paragraphs & sentences
Purpose of each sentence

For example:

"Prompt: First paragraph should introduce our product. Sentence one gives the name and a quick overview. Sentence two highlights the benefits. Sentence three explains how easy it is to get started."

This ensures consistency across thousands of videos. Not mandatory but useful.

I use GPT to write but I'm testing Deepseek. I always use the best model for the task.
For 10-20 videos, I hand-edit scripts. For 1,000+, I run a helper script to filter out weaker scripts.
I don’t use brand voice but assign emotiondynamically, e.g.:
- “For the third sentence, emphasize excitement in plain language.”
- “Explain it clearly, highlighting family fun.”
CTAs (Call-to-Actions) are injected, e.g.:
- “In the last sentence of the second paragraph, invite users to reach out for assistance.”

At the end, I split the script so TTS can process each paragraph separately.

Example: A 5-paragraph script becomes TTS1.txt, TTS2.txt, TTS3.txt, etc.

2. Audio

Eleven Labs API for TTS
Saved favorite voices & settings for easy recall
Highest quality settings → Outputs: TTS1.wav, TTS2.wav, etc. as mono 48K, 16bit

Audio Processing

Python to trim & reduce large gaps in .wav files
Optional VST plugin stack(classic broadcast voiceover chain):
- 1176 compressor → LA2A compressor → EQ
- Some Eleven Labs voices need EQ (sibilance or lack of body)
Loudness settings:
- Voice files: -18 LUFS (Adobe Audition)
- Mixed music files: -14 LUFS (Python & Audition)
Python stitches audio files into master.wav

3. Images

Models change, so the best options vary.
Current go-to: Ideogram, Recraft, Stability (all via API).

Image Strategy

Avoid AI for technical items (it makes up fake objects). Use real photos.
Number of images per video: Usually 8 images per video, with transitions.
Consistent image types per video template (e.g., same type of image at each spot).
Sometimes, I use Python to analyze scripts & generate 8 matching image prompts.
Highly targeted prompts, e.g.:
- “A local HVAC company owner interacting with his community at a neighborhood picnic.”
Focus on emotion & connection, not decoration. Images sell the story.

Thumbnail Creation

AI-generated via Ideogram or Python.
Python detects product photo size/shape, fits it to a canvas, and overlays text.
Python checker script for detecting bad hands & artifacts (MS Phi works well).

No AI Video Tools

AI video doesn't connect with humans.
Pexels for real video (avoid Storyblocks—legal risks).

4. Animation & Sync

After Effects via CLI → Better results.

Steps

Python generates static PNGs for end, side, and image panels separately.
Animated end slides → Exported as MP4, then subtract duration from master.wav to get image slide durations.
Image slides animated into an MP4 (typically using slide-and-blur transitions with gradual scale-up expressions).
Optional overlays (e.g., light leaks) for style.
Text slides animated to match script segment durations (from split scripts).
Final text animation video: textSlides.mp4.

5. Render

Final render components:

intro.mp4
slides.mp4
textSlides.mp4
outro.mp4
master.wav

Rendering Process

Python stacks MP4s (slides.mp4 underneath textSlides.mp4 in Z-order).
Scene transitions applied with Python.
Final video matches master.wav duration, ensuring smooth sync.
YouTube-ready quality settings (balanced file size vs. quality).
Python generates SEO-friendly file names for thumbnails & videos.

Bonus: Automating YouTube Upload & SEO

AI agent handles YouTube uploads, descriptions, and metadata.
Python script generates a unique blog post with video schema markup.
AI agents can repurpose content for social media, Vimeo, etc., maximizing reach.

Final Thoughts

Hope this was helpful—just a quick brain dump. Let me know if you have any questions! I formatted this using GPT so I could be more like Lucian!

davidcronk · Feb 6, 2025

I know this isn't really totally automated but...

I've tried inVideo and you can get something that is reasonable in pretty quick time.

It will all depend on what you want to use it for whether you think its appropriate.

DaveMc · Feb 6, 2025

Corey said:
Automated & Human Video Workflow Overview

Hey Dave!

I did a ton of automated video in 2024 and pivoted to human content for 2025 with Vidzilla, but I’m still doing a lot of automation—it works. For automated content, I typically target 60-90 seconds in length, prioritizing quality.

Here's an example video:

Basic Workflow

Script

Audio

Images

Animation/Sync

Render

1. Script

If you're working at scale, you might want to define:

Number of paragraphs & sentences

Purpose of each sentence

For example:

This ensures consistency across thousands of videos. Not mandatory but useful.

I use GPT to write but I'm testing Deepseek. I always use the best model for the task.

For 10-20 videos, I hand-edit scripts. For 1,000+, I run a helper script to filter out weaker scripts.

I don’t use brand voice but assign emotiondynamically, e.g.:

“For the third sentence, emphasize excitement in plain language.”

“Explain it clearly, highlighting family fun.”

CTAs (Call-to-Actions) are injected, e.g.:

“In the last sentence of the second paragraph, invite users to reach out for assistance.”

At the end, I split the script so TTS can process each paragraph separately.

Example: A 5-paragraph script becomes TTS1.txt, TTS2.txt, TTS3.txt, etc.

2. Audio

Eleven Labs API for TTS

Saved favorite voices & settings for easy recall

Highest quality settings → Outputs: TTS1.wav, TTS2.wav, etc. as mono 48K, 16bit

Audio Processing

Python to trim & reduce large gaps in .wav files

Optional VST plugin stack(classic broadcast voiceover chain):

1176 compressor → LA2A compressor → EQ

Some Eleven Labs voices need EQ (sibilance or lack of body)

Loudness settings:

Voice files: -18 LUFS (Adobe Audition)

Mixed music files: -14 LUFS (Python & Audition)

Python stitches audio files into master.wav

3. Images

Models change, so the best options vary.
Current go-to: Ideogram, Recraft, Stability (all via API).

Image Strategy

Avoid AI for technical items (it makes up fake objects). Use real photos.

Number of images per video: Usually 8 images per video, with transitions.

Consistent image types per video template (e.g., same type of image at each spot).

Sometimes, I use Python to analyze scripts & generate 8 matching image prompts.

Highly targeted prompts, e.g.:

“A local HVAC company owner interacting with his community at a neighborhood picnic.”

Focus on emotion & connection, not decoration. Images sell the story.

Thumbnail Creation

AI-generated via Ideogram or Python.

Python detects product photo size/shape, fits it to a canvas, and overlays text.

Python checker script for detecting bad hands & artifacts (MS Phi works well).

No AI Video Tools

AI video doesn't connect with humans.

Pexels for real video (avoid Storyblocks—legal risks).

4. Animation & Sync

After Effects via CLI → Better results.

Steps

Python generates static PNGs for end, side, and image panels separately.

Animated end slides → Exported as MP4, then subtract duration from master.wav to get image slide durations.

Image slides animated into an MP4 (typically using slide-and-blur transitions with gradual scale-up expressions).

Optional overlays (e.g., light leaks) for style.

Text slides animated to match script segment durations (from split scripts).

Final text animation video: textSlides.mp4.

5. Render

Final render components:

intro.mp4

slides.mp4

textSlides.mp4

outro.mp4

master.wav

Rendering Process

Python stacks MP4s (slides.mp4 underneath textSlides.mp4 in Z-order).

Scene transitions applied with Python.

Final video matches master.wav duration, ensuring smooth sync.

YouTube-ready quality settings (balanced file size vs. quality).

Python generates SEO-friendly file names for thumbnails & videos.

Bonus: Automating YouTube Upload & SEO

AI agent handles YouTube uploads, descriptions, and metadata.

Python script generates a unique blog post with video schema markup.

AI agents can repurpose content for social media, Vimeo, etc., maximizing reach.

Final Thoughts

Hope this was helpful—just a quick brain dump. Let me know if you have any questions! I formatted this using GPT so I could be more like Lucian!

That is awesome thanks @Corey !

paulandre · Mar 6, 2025

If you record with Riverside.fm, they have an AI clips feature that produces shorts with a button push. If you record long form content, their AI is also pretty good. This was entirely edited with their AI

Automating Videos

DaveMc

Newbie

Corey

Moderator

Automated & Human Video Workflow Overview​

Basic Workflow​

1. Script​

2. Audio​

Audio Processing​

3. Images​

Image Strategy​

Thumbnail Creation​

No AI Video Tools​

4. Animation & Sync​

Steps​

5. Render​

Rendering Process​

Bonus: Automating YouTube Upload & SEO​

Final Thoughts​

davidcronk

Member

DaveMc

Newbie

Automated & Human Video Workflow Overview​

Basic Workflow​

1. Script​

2. Audio​

Audio Processing​

3. Images​

Image Strategy​

Thumbnail Creation​

No AI Video Tools​

4. Animation & Sync​

Steps​

5. Render​

Rendering Process​

Bonus: Automating YouTube Upload & SEO​

Final Thoughts​

paulandre

Member

Similar threads

Automated & Human Video Workflow Overview

Basic Workflow

1. Script

2. Audio

Audio Processing

3. Images

Image Strategy

Thumbnail Creation

No AI Video Tools

4. Animation & Sync

Steps

5. Render

Rendering Process

Bonus: Automating YouTube Upload & SEO

Final Thoughts

Automated & Human Video Workflow Overview

Basic Workflow

1. Script

2. Audio

Audio Processing

3. Images

Image Strategy

Thumbnail Creation

No AI Video Tools

4. Animation & Sync

Steps

5. Render

Rendering Process

Bonus: Automating YouTube Upload & SEO

Final Thoughts