• Connection error with server. Refresh the page.
  • Activity or inactivity time has been reached. Refresh the page

CREATING A LIVE STREAMING PROTOTYPE

Image Text

image image

0

image
[+]

Close

Joe Esteves

Joe Esteves

image
[+]

Close

1 month ago

1 month ago

image

77 views

image
[+]

Close

Professional

Professional

image
[+]

Close

English

English

#livestreaming #engineering #plaudere


Background

Transferring video and audio is a very challenging task, especially when using no framework. As part of an experiment to add the feature of live streaming within Plaudere, an option for a basic streaming model was created, as a way to learn the fundamentals and progressively grow in enhancing such features.

How Plaudere streams audio and video

In this solution, the main idea is to capture media from the client side of the streamer, using the Media Recorder API depending on the browser. For audio captured from Chrome and other similar browsers, the format used is Webm with the Opus codec. Compression is maximised in the Media Recorder to reach approximately 50 to 100 kilobits per second, while suppressing noise cancellation to avoid noticeable loss of audio quality. In Safari, audio is recorded in mp4 format with the mp4a.40.2 codec. Unfortunately, the resulting files are significantly larger, typically around 150 to 200 kilobits per second. Efforts to optimise audio capture in Safari, including codec and bit rate handling, are still ongoing. Audio data is created in blobs of 250 milliseconds, extended up to 500 milliseconds to allow a fade out and fade in effect during playback. Early versions had some gaps between blobs, which were minimised by increasing the initial blob duration. These audio blobs are transferred via WebSockets by encoding the data as text in a JSON array for easier transmission.

For video, media capture is performed by taking snapshots of the video camera feed displayed in the browser during streaming. Initially, video segments were recorded using the Media Recorder; though this approach struggled to handle packet loss smoothly. Therefore, the current method captures four snapshots per second. This is indeed a significant limitation, but testing suggests it is a reasonable compromise to keep streaming lightweight. This is particularly important when hosting on servers with limited memory, a common scenario for early-stage websites such as Plaudere, as such optimisation helps to reduce overhead. When more users stream and watch, the server dynamically reduces the quality of video snapshots. On average, video snapshots contribute approximately 200 kilobits per second to the overall stream alongside audio.

Dealing with congestion

There are two main scenarios to consider:

High number of streaming users:
When many users stream or watch, the server’s memory and CPU resources may reach their limits. To mitigate resource strain, the server reduces the quality of video snapshots according to demand. If usage surpasses a defined threshold, streaming stops, and users are informed that the limit has been reached. The key performance indicator (KPI) to monitor here is the number of users streaming and watching.

Bandwidth constraints for streamer or audience:
For the streamer, upload bandwidth limits can cause some video snapshots or audio parts to be lost during transmission. The server only keeps the last five seconds of video and audio to save memory. Packet loss impacts the audience’s experience. Two KPIs help decide client-side actions:

  • Time from media creation to server confirmation
  • Number of lost media parts

If these KPIs exceed certain thresholds, the streamer’s client will start skipping video snapshots and eventually pause audio transfer until bandwidth improves. Once conditions allow, transfers resume gradually. For the audience, a similar system applies. Clients request media parts, and the server delivers them to keep the live playback. The KPIs measured are the time between each media request and its arrival, as well as the number of media parts lost during transmission. If limits are exceeded, the client requests fewer media parts to adapt to bandwidth constraints, first limiting video snapshots and then stopping media reception until bandwidth allows recovery.

Sources combination and challenges:
Additionally, one goal was to allow combining different media sources, such as live devices and audio or video files. A streamer can select files and devices to create a combined stream. This remains transparent to the audience, who see a single continuous stream. A challenge is synchronising these sources because the Media Recorder API does not precisely align the capture timing of audio and video. Delays between device capture and file playback vary. Measuring this delay automatically is complex. One possible method is analysing volume fluctuations, but this requires more client-side processing. For now, streamers can manually adjust synchronization by listening to playback and setting a delay in milliseconds. Once set, the delay can be reused if the device does not change. This manual step is especially important for musicians or those needing precise audio sync. For casual users, delays between 30 and 100 milliseconds are usually acceptable.

Playback

After media parts reach the audience, the challenge is to play them as a single continuous stream.

Audio playback:
Audio parts may arrive out of order. They are temporarily stored in an array, and playback begins only after enough parts arrive to cover a few seconds. The earliest media part is decoded from JSON to an audio buffer and played using the Web Audio API. The playback timing uses metadata such as the part’s order and creation time. In order to prevent overloading the client device, played parts are removed from the buffer. Lost audio parts cause gaps, which are monitored by a KPI tracking the difference between scheduled and actual playback time. If delays grow too large, the buffer size is automatically increased to wait for missing parts, or restarted if it grows beyond a certain limit.

Video playback:
Video consists of snapshots displayed on an HTML canvas in sync with audio playback. Video timing is aligned with the audio playback logic, scheduling snapshot display along with audio parts.

Streaming challenges

This prototype shows a simple live streaming website is possible, but many factors require careful design.

Memory and CPU management:
Despite efficient bandwidth use and adaptive streaming, server resources must be closely monitored. The number of connected users directly impacts memory and CPU usage. If resources are limited, media processing such as encoding, decoding, and formatting must be reduced to prevent overload. Also, the number of media parts held on server and client should be minimised.

Stream distribution:
The prototype uses a basic architecture with a single server instance serving all users. As streaming scales, this model becomes unsuitable. WebSockets cannot easily share media across multiple server instances. Using a fast in-memory database like Redis could solve this by storing short-term media parts, allowing horizontal scaling. Media servers like Icecast or Shoutcast offer better performance but are more expensive. Ultimately, balancing cost and scalability depends on the expected user volume.

Cross-browser testing:
Compatibility issues were found, especially with Safari, which does not always play webm media parts created by Chrome. Fortunately, current Safari versions support webm playback, the Media Recorder API, and Web Audio API, making the solution feasible. That said, media parts recorded in Safari require more bandwidth than those from Chrome, which affects overall performance. Also, user interactions such as file selection and audio playback require user gestures on browsers, which can limit programming options.

Worker:
Running JavaScript code in background threads (Web Workers) is very useful for processing streaming data and KPIs. However, APIs that interact directly with the webpage (such as Media Recorder and Web Audio) cannot run in workers. Therefore, the best approach is to capture media on the main thread, then transfer it to a worker for storage, metadata processing, and server transmission.

Further discussion

This article explains how a simple live streaming prototype can be integrated into small or medium websites by small teams serving limited audiences. It also opens the door to gradually replacing parts of this solution with third-party services to build a more robust streaming platform.

Joe Esteves

Plaudere © 2025

  • Sign in
  • Example text

    Example text

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text