Plaudere Engineering

Connection error with server. Refresh the page.
Activity or inactivity time has been reached. Refresh the page

PLAUDERE ENGINEERING

SYNCING LIVE CONTENT

Image Text

441 views

#livestreaming #engineering #plaudere

Less info

More info

Leer en Español.

Background:

In the late 1990s, the web focused primarily on exchanging text as useful information for the community. Some sites were born as commercial or advertising services, while others stemmed from a pure passion for sharing knowledge; however, since then, multimedia has remained my true fascination. I vividly remember the early 2000s when my favourite artists began uploading their latest concert video clips; we would wait for minutes of buffering just to watch a few seconds of footage, convinced we were witnessing top-notch technology. Even back then, we were already dreaming of a future where we would turn off our televisions to start watching everything online.

Shortly after improvements in broadband speeds transformed the experience but the early attempts to stream live video were truly special in the history of streaming. Going back to 1994 The Rolling Stones performed their first live musical broadcast over the internet marking one of the most significant technological milestones in web history. Even though only the first 20 minutes were streamed watching a live video at barely 10 frames per second in black and white was history in the making despite only 200 people being able to connect due to the limitations of the time. In fact another band made up of engineers called Severe Tire Damage played a set just before the Stones using the same setup. I found this story absolutely fascinating and it always made me want to build my own application capable of streaming live audio and video knowing full well that while many solutions exist today there is something unique about building it yourself.

Article about the first live internet broadcast by The Rolling Stones.

While streaming technology for both live and on demand content seemed viable to the average user back in the 90s and matured over time, it was far from being a mass market solution and was riddled with flaws. The main issue was that scalability was a territory that early tech enthusiasts had not yet fully considered. However, as the internet became more mainstream and broadband started to support much higher data throughput, web applications gained a broader reach and the scalability problem finally emerged. It was no longer just about making the application viable for its intended purpose, such as streaming video, but also about managing the sheer number of users connecting simultaneously and the resources that had to be allocated to the network.

Critical challenges appeared, such as the varying connection qualities among users and, above all, the management of delay or latency, which can make the experience feel far from satisfactory. This is particularly noticeable if there is a group chat during the stream, where any lag makes the interaction feel disjointed and clunky. To truly work, streaming technology requires a series of robust elements such as redundancies to prevent downtime, Content Delivery Networks (CDNs) to distribute the load, and specialised media servers for data transmission. Furthermore, it is essential to handle packet management, ACKs (acknowledgement of receipt) to ensure data integrity, and the constant analysis of usage statistics and error logs to determine exactly whether the transmission was successful or not.

As time went on, for instance, in the mid-2000s, and developers began incorporating streaming technologies to varying degrees, entrepreneurs emerged working on many different fronts. Applications for video on demand, audio on demand and videoconferencing appeared, among others, but some entrepreneurs explored the possibilities of interlacing simultaneous live transmissions. This is not a priority in any of the aforementioned streaming applications, but it is crucial when trying to help musicians rehearse musical pieces together over the internet. This latter case struck me as a fascinating area for research, although I was fully aware from the start that it is a highly sophisticated problem to solve. It involves a specific type of streaming that requires extreme technical conditions to allow for a feedback loop between transmitting users with a delay not exceeding 30 milliseconds. This is the critical latency threshold that musicians can tolerate for a joint performance to remain viable while playing over the internet and broadcasting to their audience.

Actually, I studied music during my childhood and had fellow musicians with whom, over time, I stopped playing due to the difficulty of synchronising schedules or the hassle of travelling to a rehearsal space, tasks made nearly impossible by studies, work and life responsibilities. I remember that making music with others, or performing rehearsed pieces, is a true delight, even more so when that music is shared live with an audience. When everything goes right, a human connection is achieved that has been developing for centuries, from the virtuosos of classical music to the massive spectacles created by Beatlemania. Music is a very special human bond and, with the advancement of the internet and streaming techniques, it seemed possible to recreate that experience in a virtual environment. It is indeed possible, but it requires addressing many technical nuances. I have had to explore different paths to achieve a seamless experience, prioritising intelligent architecture over raw hardware power. The result is Plaudere, where I have managed to make this remote musical connection viable, at least in part, by focusing on efficiency and synchronisation.

The Challenge of Synchronisation:

The primary challenge when connecting performers in different locations lies in the inherent latency, which is the cumulative delay between capture devices, server processing and the final reception by the audience. For a musical interaction to feel natural, the total delay must be kept strictly below 30 milliseconds. Back in the 2010s, achieving this level of synchronisation required high end hardware and extremely powerful servers, infrastructure that was costly and well beyond the reach of most. After personally testing numerous services with musician colleagues from my school days in an attempt to perform online rehearsals, I concluded that the technology of that time simply did not make it possible.

Even today, in the era of cloud computing, maintaining this technical standard in a stable way remains a complex and expensive challenge, especially when the application must combine multiple data streams into a single broadcast. This technical barrier is not just a matter of bandwidth, but of managing the delay introduced by every node in the network, from capture to distribution. For this reason, during the pandemic, I decided to approach the problem from a different perspective, seeking a more efficient architecture that could eliminate the dependency on prohibitive infrastructure. It was during this research that I made interesting technical discoveries which I am now finally sharing through Plaudere.

Physical vs. Digital Distance:

In a live musical performance, if the guitarist and the singer are separated by more than 12 or 15 metres, the speed of sound creates a delay exceeding 30 milliseconds, breaking the natural synchrony. On stage, this is resolved by using audio monitors that transmit the signal electronically, allowing both performers to hear each other instantly. However, when the physical distance is so great that it prevents a direct electronic connection, we must rely on internet infrastructure to bridge the gap between musicians and allow them to continue making music together. In this scenario, the network is no longer just a data channel, but the environment where we must fight against time to preserve the synchrony of the performance. Although fibre optic cables allow for high speed transmission, the journey through various routers, device buffers and audio interfaces introduces a cumulative delay that desynchronises the data stream.

This is where conventional streaming models fail. Attempting to emulate physical immediacy through prohibitively expensive hardware is not the ultimate solution, as the problem persists due to network architecture and packet management. After thoroughly analysing this phenomenon, I came to the conclusion that trying to force physics with sheer brute force was a misguided path. Instead, it was necessary to adopt a more creative software engineering approach, one capable of optimising the transmission chain from source to destination without relying on extremely expensive equipment.

A Less Expensive Approach to Live Streaming:

Many musical bonds are broken due to geographical distance and the responsibilities of adult life, making it nearly impossible to find time to rehearse or maintain that magical connection from the past. In the early 2010s, I performed an experiment to challenge this limitation: I asked a colleague to call my phone and sing his part of a song while I, listening to him through the handset, played the guitar live in front of a microphone. My microphone captured both sources, his voice filtered through the phone and my acoustic guitar, and sent them to a conventional streaming service. Those listening were surprised by the coordination achieved from different locations, which planted a technical obsession in me: to create a service that allowed this collaboration without depending on external calls, earphones or additional hardware.

In the early 2020s, driven by the need to resume these joint broadcasts, I decided to build a prototype. As I did not have prior experience in developing native applications for mobile or desktop, I opted for web development. It is a universal technology, fully compatible with any device and accessible via a browser without the need for installations or constant updates. Although web development was not my initial domain, learning the fundamentals of HTML, CSS and JavaScript allowed me to progress quickly towards server logic and databases, always seeking an approach that was cost effective yet technically sound.

To streamline the development cycle and maintain a coherent stack, I decided to use Node.js for the backend to unify the development language across the entire stack, allowing me to manage both frontend and backend logic with JavaScript. I implemented MongoDB for data persistence and created multiple prototypes to analyse the pros and cons of server side versus client side processing. For the application architecture, I utilised high performance APIs that became the backbone of my solution: the Web Audio API for signal processing, the Media Recorder API for capturing audio and video chunks, and WebSockets to ensure instantaneous bidirectional communication.

Audio Streaming Experiments:

My first technical experiment involved a walkie-talkie approach using the Media Recorder API. The application captured an audio message, sent it to the server and distributed it to a consumer via WebSockets. From there, I explored continuous streaming by sending audio chunks of fixed duration to be played in sequence. The primary challenge was managing continuity amidst bandwidth fluctuations or lost connections, ensuring the application responded dynamically to always present the most recent content to the listener, even after a disconnection.

Once I achieved a stable broadcast that overcame browser constraints, such as the requirement for user interaction to trigger multimedia, I returned to the problem of combining two performers. Since mutual real-time monitoring remains a far-fetched dream due to latency, I opted for a more efficient and less complex approach: a unidirectional audio flow. In this model, the audio travels from the first musician to the second, and it is at this second node where both signals are combined for the final transmission.

Due to the nature of this solution, audio flows without feedback towards the first performer. To maintain musical cohesion, the first streamer relies on a backing track that simulates what the second performer will play. In this way, the first musician reacts live to a recording, and the second musician reacts live to the first streamer's feed. This architecture allows for the correction of musical errors in real time and ensures both sources integrate perfectly for the audience, with the added possibility of enriching the mix with karaoke files, background audios or other external sources.

However, when combining streams using the Media Recorder and Web Audio APIs, synchronisation issues can arise in the final mix. To mitigate this, it is essential to parameterise a delay to sync the multiple sources. Fortunately, the Web Audio API allows for the implementation of programmable delay nodes to align one signal with another. Since measuring the exact delay automatically is extremely complex, this solution allows for manual synchronisation adjustments within the service options, achieving a precise balance between technical simplicity and artistic quality.

Video Streaming:

After consolidating the audio, I explored audio-visual transmissions, where video presents critical bandwidth challenges. If a video chunk is lost during transmission, the session can become unstable and out of sync. To resolve this, instead of recording video chunks, I opted to capture snapshots (an array of images) synchronised with the audio recording. This technique allows for adaptive performance, where image quality and frame rate are dynamically adjusted using a canvas. If the connection weakens, the application reduces the image frequency to always prioritise the integrity and continuity of the audio.

Currently, the application manages quality globally to balance server load, though this presents significant challenges. One of the current limitations is that the model does not prioritise any user, regardless of when they joined the transmission or whether they are the ones actually performing, meaning that new viewers joining can degrade the broadcast quality for everyone involved. Improving this load balancing and ensuring the stability of the main performers against the audience remains one of my priority goals for the future of Plaudere. I have found that the path to true artistic collaboration on the internet does not depend on the brute force of hardware, but on an architecture capable of adapting to the human and technological constraints of our current network.

Potential of the Solution and Conclusions:

This technical solution covers everything from single streamer cases, where device feeds are combined with pre recorded content, to complex dual collaboration arrangement. In Plaudere, the server is used to store and distribute data chunks, dropping frames if necessary to always prioritise real time delivery with the lowest possible delay. Through operations executed on the client side, the application manages to adapt to network restrictions, allowing the first performer's stream to travel towards the second to be integrated into a single coherent signal for the audience.

The implementation of this technological effort in Plaudere multiplies the possibilities for user expression. Whether you are a musician seeking that lost connection or a writer sharing your creative process, the platform allows you to invite a colleague and create the illusion of being in the same physical space. Currently, we continue to optimise the development so that network usage reflects only what is strictly necessary, guaranteeing a stable and accessible broadcast. Ultimately, Plaudere is the result of understanding that technology should not be a barrier, but the bridge that allows us to reclaim the magic of creating together, regardless of the distance.