Research background

Live audio-visual performance is an type of art performance that has proliferated in the art and entertainment market. While such performances have taken place with electronics and projections since at least the 1970s and historical precedent through theater goes back thousands of years, recent technologies have increased the possibilities for performers to improvise such performances and achieve new types of expression (Cooke 2010). The visual component of performances adds important information in communication of the meaning of a performance (Platz and Kopiez 2012). One mode of interaction with real-time systems is live coding, a practice which begun in the late 20^th century and one which has also seen increasing numbers of practitioners and tools (Blackwell et al. 2022). One relatively new system is Konduktiva, for live coding in the JavaScript programming language (Bell et al. 2024). Considering the advantages it can provide, some practitioners have sought to use it for audio-visual performances. One the methods for generation of visual material which has become more practical in the 2020s is generative AI based on machine learning (Epstein and Hertzmann 2023). Stable Diffusion is one of those systems (Zhang et al. 2023). It is then possible to consider the combination of these two and to ask: How can live coding be integrated as a control mechanism for the generation of realtime video produced through generative AI?

Research contribution

In this performance, Beyond 1&0 Opening Performance, new systems are used for both the generation of music and graphics. This research presents one technical approach to answering the research question described above. A real-time system has been programmed in Touch Designer by Giang Nguyen Hoang to use Stable Diffusion. Stable Diffusion generates images based on prompts generated by a custom algorithm written by Giang and Renick Bell in a JavaScript live coding session using Konduktiva on another device. Those prompts are sent to Touch Designer over OSC rhythmically according to the rhythms defined in the live coding system. The images are then styled and composed with the text of the prompt and other messages from the live coder in Resolume and displayed for the audience along with the live coded musical accompaniment by Bell. By developing this system, the researchers have shown one possible technical solution for the combination of the approaches presented above in the Research Background section. Such a workflow is novel, as the authors are unaware of work by others combining these technologies in this way. Through this implementation, a new latent space of possibility is identified for future work to explore, in which variations of prompt algorithms and frequency of prompting can generate a range of visual outputs, all of which depend on the particular model being prompted. This research helps us to understand concretely the latent space provided by generative AI such as Stable Diffusion, as well as prove the efficacy of live coding as a means of interfacing with such a system.

A page for the performance exists here: https://www.rmit.edu.vn/students/student-news-and-events/student-events-2024/beyond-1-0-showcase-celebrating-10-years-of-digital-media

Research significance

Real-time control of image generation through Stable Diffusion through live coding is a novel approach which has not been explored. These technical approaches are very new and few researchers have attempted to use them together. By presenting this approach, we have created a foundation for further research in this area. We also present this workflow as a model for others to follow and iterate on to find new or improved methods for similar purposes. The curators of the exhibition selected this performance, which was presented along with an exhibition of work by faculty and staff which displayed excellence according to the curators.

References

Bell, R., Wang, S., Huang, Y., Chen, R., 2024. Live Coding Melody and Harmony in JavaScript, in: Proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures, AM ’24. Association for Computing Machinery, New York, NY, USA, pp. 362–372. https://doi.org/10.1145/3678299.3678336

Blackwell, A.F., Cocker, E., Cox, G., McLean, A., Magnusson, T., 2022. Live coding: a user’s manual. MIT Press. https://doi.org/10.5281/zenodo.7383847

Cooke, G., 2010. Start making sense: Live audio-visual media performance. International Journal of Performance Arts and Digital Media 6, 193–208. https://doi.org/10.1386/padm.6.2.193_1

Epstein, Z., Hertzmann, A., The Investigators of Human Creativity, 2023. Art and the science of generative AI. Science 380, 1110–1111. https://doi.org/10.1126/science.adh4451

Platz, F., Kopiez, R., 2012. When the Eye Listens: A Meta-analysis of How Audio-visual Presentation Enhances the Appreciation of Music Performance. Music Perception 30, 71–83. https://doi.org/10.1525/mp.2012.30.1.71

Zhang, Chenshuang, Zhang, Chaoning, Zhang, M., Kweon, I.S., 2023. Text-to-image diffusion models in generative ai: A survey. arXiv preprint arXiv:2303.07909.