Screencasts are an increasingly common way of explaining software products—I probably prefer Turbogears to Django because of the 20-minute wiki screencast by Kevin Dangoor. So, this month, let us create a movie. A programmed slide show has been chosen as a simple illustration. The ideas can be expanded to create effective and compelling screencasts. The same concept can transform your digital images into an exciting audio/video treat for your parents.
A movie is really a sequence of images displayed at a predefined rate. The images are synchronised with the soundtrack. A group of images along with the corresponding audio is regarded as a ’scene’ in a movie. Independently created scenes can be pieced together to create the illusion of a longer movie.
Defining the screencast
In this tutorial, start with taking a sequence of screenshots you wish to elaborate on. Then write a script for each of the screenshots. The application—Festival or eSpeak—will be the ‘actor’ that converts the dialogue in the script into a voice. Each scene will comprise the screenshot being displayed for as long as it takes to speak the corresponding dialogue.
Create a set of screenshots for the product you wish to talk about in a directory, numbering them sequentially—for example, PhotoApp00.png, PhotoApp02.png, … PhotoApp05.png.
You can either write the script in a separate text file for each screenshot or in a single file. In this tutorial, we will look at writing it in a single file—a header line followed by the script and a blank line for each of the slides in order. The first two characters of the header will be the image number.
00 Start the python application my_photos from the terminal 01 A new image will be displayed to you. 02 Type the text you would like to appear as a caption in the text box. 03 Once you press enter, the text will be displayed on the image as you can see. 04 Now, click on the save and next button. The image will be saved. 05 And you will be shown the next picture. Repeat the steps until all the photographs are processed. Note, that if you do not wish to put a caption on a picture and save it, you can press the next button.
The implementation
The core logic of your application will be as follows:
#!/usr/bin/env python
import os, sys
import wave
import Image, ImageTk, ImageDraw
script_file = open('Script.txt')
# iterate over each scene
for scene_id, image, text in scene_data(script_file):
duration = text_to_speech(text)
# create frames assuming 25 frames per sec
for frame_no in range(25*duration):
image.save(scene_id + "%03d"%frame_no +".jpg")
# convert the frames into a scene
os.system('mencoder -audiofile ' + scene_id + 'text.wav -oac mp3lame "mf://' \
+ scene_id +'*.jpg" -mf fps=25 -o out_' \
+ scene_id + '.avi -ovc lavc -lavcopts vcodec=mpeg4')
# Create an animated scene to end using the last image
animated_scene(image)
# combine the scenes into a single film
os.system('mencoder -ovc copy -oac mp3lame -o output.avi out_*.avi')
The script file is opened. It’s best for the scene ID to be a numeric string of a fixed number of digits. That will ensure the order of scenes is easily maintained.
An image and the corresponding text are selected. The text is converted to a speech file. The image is copied as many times as the number of frames that will be needed for the duration of the speech file.
The speech file and the images (using the mf://xx*.jpg URL) are combined and converted into an AVI file by using mencoder. The sound file is converted to an MP3. If you are familiar with ffmpeg, you may use that instead of mencoder.
Finally, all the AVI files are combined into a single AVI file.
The code for the generator to fetch the image and the text file will be as follows:
def scene_data(script_file):
while True:
# the first two characters in the script are the scene id
scene_id = script_file.readline()[:2]
# readline will return an empty string after EOF
if scene_id.strip() == '':
break
# the images are png files in screencast subdirectory
im_file = 'screencast/PhotoApp' + scene_id + '.png'
image = Image.open(im_file)
frame = image.resize((640,480))
# read lines until an empty line
text = ''
while True:
line = script_file.readline()
if line.strip() == '':
break
# Append replacing new line by a space
text += line.replace('\n', ' ')
yield scene_id, frame, text
The script file structure was explained above. The code keeps reading the script file until there is no more data. The first two characters of the header line are the scene ID. The images must be named as per a fixed format with two characters being the scene ID.
The image is resized to a fixed size. The generator yields the values of the scene ID, the resized image and the text associated with that image.
The next step is the code to convert the text to speech.
def text_to_speech(text):
# uncomment ESPEAK or FESTIVAL command and system call
#ESPEAK = 'espeak -w text.wav -s120 "%s"'
#os.system(ESPEAK % (text))
FESTIVAL = 'echo %s | text2wave -o text.wav -F 44100 -scale 2.0'
os.system(FESTIVAL % (text))
win = wave.open('text.wav')
# modify the wave file to add a short silence
# before the start and at the end
wout = wave.open(scene_id + 'text.wav','w')
# create the wave file with same parameters in the input file
wout.setnchannels(win.getnchannels())
wout.setsampwidth(win.getsampwidth())
wout.setframerate(win.getframerate())
# half a second of silence
silence_frames = win.getframerate()/2
# mono 16bit sound frames
silence_data = silence_frames*'\x00\x00'
wout.writeframes(silence_data)
data = win.readframes(win.getnframes())
wout.writeframes(data)
wout.writeframes(silence_data)
# divide the number of frames by frame rate
duration = float(wout.getnframes())/wout.getframerate()
win.close()
wout.close()
return duration
The code to get a wave file of the speech is a mere two lines. You can use the espeak or the text2wave command from the Festival package. The latter’s voice quality is better. (I needed the frequency of the wave file to be 44100 for the sound for various scenes to be synchronised after conversion to MP3 audio.)
You can use the wave module to improve the presentation by inserting short silences at the start and the end of the wave file. This makes the presentation sound more natural.
End with a little animation
On the final image, a red square moves from left to right with ‘The’ written on it. You also create a green circle that moves from right to left with ‘End’ written on it. The two merge at the centre. The image is frozen for a second. You add the logout sound of the desktop to the scene. As it is the final image, you can ignore differences in the duration of the sound file and the video.
def animated_scene(bg_image):
"""A square with 'the' and a circle with 'end'
float across a background image from opposite sides
and merge.
"""
duration = 2
nframes = 25*duration
box_size = (100,100)
x_step = (640 - 100)/(2*nframes)
# create a red square 100x100
im_square = Image.new('RGB', box_size)
draw_s = ImageDraw.Draw(im_square)
draw_s.rectangle([(0,0), box_size], fill='RED')
draw_s.text((25,25),'The')
# create a green circle with diameter 100
im_circle = Image.new('RGB',box_size)
draw_c = ImageDraw.Draw(im_circle)
draw_c.ellipse([(0,0),box_size], fill='GREEN')
draw_c.text((25,75),'End')
# create a mask to show only the (green) circle
r,mask,b = im_circle.split()
# create the frames
scene_id = '99'
x_s = 0
x_c = 640 - 100
for frame_no in range(nframes - 1):
image = bg_image.copy()
image.paste(im_square, (x_s,200))
image.paste(im_circle, (x_c,200), mask)
x_s += x_step
x_c -= x_step
image.save(scene_id + "%03d"%frame_no +".jpg")
# freeze the final frame for a second
for frame_no in range(nframes, nframes + 25):
image = bg_image.copy()
image.paste(im_square, (270,200))
image.paste(im_circle, (270,200), mask)
image.save(scene_id + "%03d"%frame_no +".jpg")
# convert the frames into a scene. Use the logout sound
os.system('mencoder -audiofile /usr/share/sounds/logout.wav \
-oac mp3lame "mf://' + scene_id +'*.jpg" -mf fps=25 -o out_' \
+ scene_id + '.avi -ovc lavc -lavcopts vcodec=mpeg4')
Even a small bit of animation takes some code. The core concept is that you create the foreground images that will appear to move. Make a copy of the original image that will serve as the background. Paste the foreground images at a new location and save the resulting image as a frame.
In our restless world, time is at a premium. So, go ahead and create 30-second just-in-time tutorials for your application, and they are sure to be a hit.
































