• Connection error with server. Refresh the page.
  • Activity or inactivity time has been reached. Refresh the page

SYNCING LIVE CONTENT

Image Text

image image

0

image
[+]

Close

Joe Esteves

Joe Esteves
image
[+]

Close

about 8 months ago

about 8 months ago

image

303 views

image
[+]

Close

Professional

Professional

image
[+]

Close

English

English

#livestreaming #engineering #plaudere


Leer en Español

Background

In the late 1990s, websites focused on exchanging text as useful information for the community began to be published online. Some were advertising services, while others were born from a pure passion for sharing information. Since then, there have also been attempts to share multimedia content. Indeed, multimedia has always been a fascination of mine, since the early days when my favourite singers started to upload their latest concert video clips online in the early 2000s. Back then, we had to wait for buffering to watch just a few seconds of video, thinking it was top-notch technology and dreaming of a future where we would turn off our televisions to start watching everything online.

Shortly after, streaming technology improved, and the exchange of multimedia content became a better experience alongside the increase in internet speeds. The most common application was peer-to-peer communication using a camera and microphone, with apps that allowed for chat and video calls. Another type of application started to host online videos, available to everyone without the need to save the files to the device. Both types of streaming became very popular and started shaping the internet we know today.

I also remember some technical entrepreneurs dreaming of having a digital version of a concert online, not for professionals, but for amateur artists. It required dedicated servers to share the stream across many viewers. I witnessed these types of applications in the early 2010s, and since I had performed gigs in the past with musician colleagues, I was curious about the possibilities this technology could offer. I remember connecting my camera and microphone, inviting a friend to perform music at home, and using an online service to stream it to a few people, including loyal friends who supported us. However, this was a one-to-many type of streaming, similar to peer-to-peer communication but involving many viewers.

The Challenge of Synchronisation

Other types of streaming applications I researched involved performers who were in different locations, connected via the internet, and streaming to the same audience. This was indeed a very challenging type of streaming. The main challenge for applications that combine more than one virtual streamer into a single transmission is handling the natural delay that exists between capture devices (such as microphones and cameras) and the signal being received by the device and later the server. Besides this, the delay produced by the server when receiving and transmitting the stream to the audience is also a significant factor to consider.

In fact, if we sum up all potential delays, the total must be lower than 30 milliseconds to make the transmission feasible for streamers to react naturally in real time. In the early 2010s, having high-speed internet bandwidth, high-quality audio devices to ensure the lowest delay, and a powerful server that could handle the reception and transmission of streaming content in no time were features that were difficult or costly to obtain. To be honest, I tested many services with musician colleagues from my school days, desiring to perform an online gig for friends and people at a distance, and it was not possible. Even in the 2020s, I trust there are advancements making this approach feasible, but it remains costly even when using powerful cloud services. For this reason, during the pandemic, I tried to solve this problem by taking a less expensive approach, and I found many discoveries worth sharing.

Physical vs. Digital Distance

In real life, if you grab a guitar and play while a friend sings for an audience, you have your gig. If you amplify both with microphones, it is still easy. However, imagine if you as the guitarist or your friend as the singer are physically separated by more than 12 to 15 metres. Even if both are separately amplified, music synchronisation starts to be hard because the delay between the musicians is higher than 30 milliseconds due to the speed of sound, and the audience would hear both out of time. To solve this, monitors allow the audio to travel faster electronically compared to the speed of sound, making both musicians hear each other instantly. But if musicians are so far away that they cannot connect to the same audio system, you must rely on internet systems, which poses the problem we want to approach.

Indeed, even with high-speed internet and a geographic distance of 1500km to 2000km, assuming optical fibre speeds of 200,000km/s, audio interfaces, buffers, and routers, we can try to have streamers playing together with a delay close to 30 milliseconds. However, many factors can still make the stream go out of sync. After analysing this problem, I was convinced that many factors could break the stream, and it was necessary to look for a less expensive and less sophisticated approach to achieve the same result.

A Less Expensive Approach to Live Streaming

Imagine a school colleague you used to play with. The connection between two musicians is magical when they collaborate, singing the hits of the time and feeling the emotion of their first fans. However, life is complex, and it is often impossible to maintain that link or find free time to play together. If there is geographical distance and busy lives, it usually means the end for friends who shared that special musical connection. I performed an interesting experiment in the early 2010s: I asked my friend to call my phone and sing only his part of a song while imagining I was playing the guitar (or using a karaoke track in his earphones). I received his audio via the phone, hearing only his voice. I put my phone close to a microphone and completed my friend's voice by playing the guitar live as I listened to him. My microphone combined his phone-call voice with my guitar, and we transmitted this to a single-streamer service. Our friends were very surprised that we were able to play together from different locations.

This experiment became an obsession: trying to create a service that allowed the same type of collaborative streaming, but without needing an mp3 player, earphones, and phone calls. I was interested in having one single service do this. I began researching the possibility of creating this prototype in the early 2020s, as the need to create joint transmissions returned to my mind. As I did not have experience developing applications, I discarded mobile and PC development and gave an opportunity to web development. It is a technology fully compatible with all devices and accessible through a browser without needing installations or updates. Also, the learning curve for web development was faster, so I started coding in the early 2020s.

As web development was not my domain, I started learning the programming fundamentals: HTML, JS, and CSS, as well as server and database coding. I decided to use Node.js as a backend language because it is based on JavaScript, allowing me to develop both frontend and backend logic with one language. I created many prototypes to see the pros and cons of server-side versus client-side processing, as well as the limits of NoSQL databases like MongoDB. On the other hand, powerful APIs like the Web Audio API for audio processing, the Media Recorder API to record audio and video chunks, and WebSockets for instant communication became the backbones of my solution.

Audio Streaming Experiments

A "walkie-talkie" approach was my first experiment. The website connects to the microphone and records an audio message using the Media Recorder API. Once finished, recording is sent to the server and finally requested by a consumer who plays the audio, thanks to WebSockets. I then explored how to send continuous audio by repeating this approach, recording chunks with a fixed duration to be played in order. However, multiple issues can occur, such as lost connections or limited bandwidth. This was a unique challenge, as the system had to respond to these situations and always show the most recent content to the consumer, even after a disconnection.

Once I created a service capable of audio streaming while handling interruptions and web constraints (like requiring a user gesture to start multimedia), I returned to the initial problem: combining two streamers. As explained, connecting streamers to listen to each other in real-time online is a far-fetched dream due to potential delays. For this reason, streaming audio from the first streamer to the second, and then combining both in the second streamer's transmission, is a cheaper and less complicated approach. However, the first streamer does not hear the second streamer.

Due to the nature of the proposed solution, audio flows from the first streamer to the second without feedback. To keep the first streamer engaged, they must rely on a background track that simulates what the second streamer will perform. This way, the first streamer reacts live to a recording, and the second streamer reacts live to the first, fixing any musical errors and ensuring both transmissions combine perfectly for the audience.

Streams are currently based on microphones and background tracks, but mixing can also include other sources, such as karaoke files, to enrich the content. However, when combining two streams using the Media Recorder and Web Audio APIs, there can be a delay in the mix. For this reason, it is necessary to define a delay to sync multiple sources. Fortunately, the Web Audio API can perform a delay on a node to synchronise it with another. It is complex to measure the exact delay needed to keep both streams in sync, so this must be parametrised in the service options.

Video Streaming

I also explored audio-visual transmissions. The constraint is that video is difficult to handle; if a chunk is lost due to bandwidth, the transmission can become unstable. To solve this, instead of recording video chunks, I decided to take snapshots (an array of images) while recording the audio. These can be modified in quality and frame rate depending on constraints and displayed using a canvas. If the connection is slow, the system asks for fewer images per second, prioritising the audio. As the server is limited, quality may decrease to balance server load. Unfortunately, this model does not yet prioritise the first user to enter the transmission; others entering later can cause quality to drop for everyone. Improving this remains a goal for the future.

Potential of the Solution and Conclusions

This solution covers single-streamer cases where device streams are combined with pre-recorded content, adjusting to network restrictions via client-side operations. The server is used to store and share chunks, dropping frames if necessary to maintain the stream in real-time with the lowest delay possible. Furthermore, the two-streamer system allows the first streamer to send a stream to a second, who then combines both for the audience.

Implemented in Plaudere, this effort increases expression possibilities for users. Whether you are a writer or a musician, you can create a space to share content or invite a colleague to join a transmission, giving the illusion that you are in the same place. In Plaudere, the current development is being optimised so that network use reflects only what is strictly needed to guarantee the transmission.

Plaudere © 2025

  • Sign in
  • Example text

    Example text

    Example text

    Example text

    Example text

    More info

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text

    image
    [+]

    Close

    example-test

    example-test

    image
    [+]

    Close

    Just now

    Just now

    Example text