Embed a HTMLMedia/<audio> element in the player with a tiny, empty, silent file equal in duration to the cast

Hello! thanks for creating asciinema.org (and open-sourcing it too)!

For years, I’ve always wanted to be able to use my home row keys to control the asciinema-player, with the browser extension I have, I use (HJKL) to seek forward and backwards on videos and I use the comma key to toggle play and resume, but these keys do not work with the asciinema-player, because the asciinema-player is not a real HTMLMediaElement, like <video> or <audio>, so the extension I use to control media element playback does not work on asciinema.org (or with asciinema-player elements anywhere).

The browser extension I use is Video Speed Controller, it’s a really popular extension, with 3,000,000 users, many people I know do use it, I suspect you do use it too.

Instead of submitting a patch/PR to the extension, so it would use the HTMLMediaElement properties, methods and events asciinema-player implements, so it can work correctly with it, I think it should be the other way, asciinema-player should embed a <audio> element with a tiny file, that is smaller than the .cast file, with silent audio.

I’ve done some experiments using ffmpeg, and I found that using FLAC (a codec that, according to MDN, is widely supported in all browsers, since a long time) --generates the smallest silent audio files, smaller than mp3, opus, and wav (.wav files generated were smaller, only a few hundreds of bytes big for a 4 minute file, but Chrome was unable to play them, MPV plays them though).

Generating a silent flac audio file with the smallest size possible I found can be done with this ffmpeg command:

 ffmpeg -f lavfi -i anullsrc=r=3000:cl=mono -t 10:00 -b:a 1 file.flac
  • anullsrc makes ffmpeg generate a silent file
  • 3000 is the sample rate (3000 Hz because Chrome does not play audio files with samples rates lower than that, for some reason)
  • -b:a is the audio bitrate, in kbps

the command generates a 10 minute long silent flac audio file,
with the flac codec, the command takes only 283 milliseconds on my laptop machine, and the file is only 102kB, that is 10.2kB per minute, or 1.4kilobits per second, very small, for context how small that is, a normal audio stream on YouTube or Spotify is 128kilobits per second.
I downloaded a few .cast files, and I found that few minutes long .cast files were on average bigger than 150kB, so a 10 minute long silent flac audio file would be much smaller than it’s a accompanying .cast file.

On the server-side you wouldn’t need to generate an audio file for every cast uploaded to asciinema.org, you can generate one file for each cast duration, e.g. all 3:58 casts use the 3:58 long .flac file.
Generating all the files for 0 seconds long to 10 minute long casts would take only 31.5MB of server storage.

Calculated with this Python script
def total_seconds(minutes, seconds):
    """Calculates the total number of seconds for a given duration."""
    return minutes * 60 + seconds

# Initialize total seconds sum
total_seconds_sum = 0

# Starting time of 10 minutes
current_minutes = 10
current_seconds = 0

# Calculate until we reach 0 minutes and 0 seconds
while current_minutes > 0 or current_seconds > 0:
    total_seconds_sum += total_seconds(current_minutes, current_seconds)
    # Decrease by one second
    if current_seconds == 0:
        current_minutes -= 1
        current_seconds = 59
    else:
        current_seconds -= 1

# Print the total sum of seconds
print("Total seconds:", total_seconds_sum) 
# Total seconds: 30650.999999999996
kilobits_per_second = 1.4 # kbps of silent flac file
print("Total storage requirement server-side:", total_seconds_sum * kilobits_per_second / 8 / 1000, "MegaBytes")
# Total storage requirement server-side: 31.552499999999995 MegaBytes

A benefit to having an actual media element in the player is that you’d be able to attach session controls to the media element, when it is played, through the Media Session API:

JS example from MDN, on how Media Session API is used with media element playback UI event
// code taken from https://developer.mozilla.org/en-US/docs/Web/API/Media_Session_API
playButton.addEventListener("pointerup", (event) => {
  const audio = document.querySelector("audio");
  // User interacted with the page. Let's play audio!
  audio
    .play()
    .then(() => {
      /* Set up media session controls, as shown above. */
    })
    .catch((error) => {
      console.error(error);
    });
});

the media session controls would look something like this:

Edit: I forgot to mention that the purpose for having an actual HTMLMediaElement is that you would be able to listen for events on it, for playback, seeking etc, and then forward the events to the asciinema-player accordingly. I also forgot to mention that I suspect that having an actual media element helps improve accessibility for those using assistive technology software (e.g. a screen reader).

Edit2: thinking more about it, I don’t see why you need to have an an audio file that is equal in duration to the cast, it can be 1 second long, and the player keeps it playing or looping for as long as the asciinema-player is playing, and pause it when paused, you’d still be able to listen for events on the element and forward them accordingly.

Thanks for the comprehensive overview.

I’ve looked into MediaSession - Web APIs | MDN and it seems to me that this API alone, without additional audio element, would be enough to support the control of the player via Video Speed Controller or standard operating system UI controls. I haven’t tested this but I think playbackState + setActionHandler(...) could do the trick.