The Future of Music in the Age of Spiritual Machines

October 13, 2003

Highlights of the Richard C. Heyser Memorial Lecture to the 115th Annual Convention of the Audio Engineering Society on Oct. 11, 2003. Published on KurzweilAI.net Oct. 13, 2003.

Music technology is about to be radically transformed. Communication bandwidths, the shrinking size of technology, our knowledge of the human brain, and human knowledge in general are all accelerating. Three-dimensional molecular computing will provide the hardware for human-level "strong" AI well before 2030. The more important software insights will be gained in part from the reverse-engineering of the human brain, a process well under way. Once nonbiological intelligence matches the range and subtlety of human intelligence, it will necessarily soar past it because of the continuing acceleration of information-based technologies, as well as the ability of machines to instantly share their knowledge.

The impact of these developments will deeply affect all human endeavors, including music. Music will remain the communication of human emotion and insight through sound from musicians to their audience, but the concepts and process of music will be transformed once again.

The Coming Revolution in Intellectual Property

The issue of protecting intellectual property goes far beyond music and audio technologies, but the crisis has started in the music industry. Already, music recording industry revenues are down sharply, despite an overall increase in the distribution of music. The financial crisis has caused music labels to become cautious and conservative, investing in proven artists, with less support available for new and experimental musicians.

The breakdown of copyright protection is starting to impact musical instruments themselves. Synthesizers, samplers, mixers, and audio processors can all be emulated in software. It has been estimated that at least 90 percent of the copies of “Reason,” one of the emulation software leaders, are pirated.

Music controllers still require hardware, but when full- immersion visual-auditory virtual reality environments become ubiquitous, which I expect by the end of this decade, we’ll be using virtual controllers that are essentially comprised of “just” software. When we have the full realization of nanotechnology-based assembly in the 2020s, we will be creating actual hardware at almost no cost from software.

We are not far from that reality today, and for the recording industry it is already clear that the principal product – music – is pure information. In all industries, the portion of products and services represented by their information content is rapidly increasing. By the time we get to the nanotechnology era, most products will be essentially information.

With file sharing, we’ve seen a breakdown of copyright protection. With streaming and remote access technologies, the problem will become even worse because existing copyright law doesn’t even cover these situations. If I call up a friend on the phone and play a new CD that I purchased, that’s not a violation of copyright law, nor should it be. But what is a phone call? It’s a streaming connection. File sharing networks will evolve into file streaming networks.

So if you want to listen to a song, the network finds a machine with that file and it is played on that machine. You listen in on a streaming connection. No files or information are ever copied. Copyright law is based entirely on the concept of copying, so if we bypass copying, there is no violation. We can extend this concept to all forms of software, including interactive software. In this case, the user effectively uses someone else’s machine using remote access software (such as pcAnywhere or Microsoft’s Remote Desktop). With continued acceleration in hardware power, running software on someone else’s machine is likely to occupy only a small fraction of the power of the computers involved.

Clearly, intellectual property licenses, and copyright law itself, can be amended to try to deal with this situation, but there are still problems. How do you define what is to be proscribed? Playing songs or demonstrating software to friends should still be allowed. Obviously, vast sharing networks go beyond friendship. So the law will need to define what constitutes a friend. Obviously there are some very slippery slopes here.

The educational challenge will be even greater. If consumers today understand copyright at all, they understand it in terms of making copies of information. How is the public to understand the concept if no actual copying takes place?

There are workable schemes for protecting software by building in locks that prevent software from working on machines other than authorized ones. These rely on means to identify what computer is being used, and these systems work reasonably well today. But the streaming approach bypasses this form of protection.

Having cited some of the difficulties, we need to recognize that protection of intellectual property is critical, otherwise we destroy the business model that provides for the capital formation required to create the intellectual property in the first place.

We could discuss at length various technical means for protecting information such as music files, but the bottom line is that all of these systems are easily breakable if that is what the public wants to do.

It may seem obvious that this is indeed what the public wants to do, but that does not need to be the case. Educating consumers on the value to them of protecting intellectual property is feasible, and without such a social compact, technical approaches will inevitably fail.

Is such a social expectation feasible? We do have a successful example: the cell-phone industry. Unlike the recording industry, this communications industry did not stick with the business model of the 1950s and 1960s, which included very high charges for a long distance call. The cost of a long distance call has fallen from tens of dollars to pennies. Had that not been the case, you can be sure that people would be routinely breaking cell phone network access just as readily as they now share music files. Although there are people who do break cell phone access codes, this is not considered a cool thing to do.

In the recording industry, the fault lies primarily with the industry for not having budged from a business model of charging tens of dollars for an album, a pricing model that existed when my father was a child.

The current lawsuits may have an educational effect, but the industry is being disingenuous in the extreme by launching these suits before they have provided a viable legitimate system of file downloading. However, this is all about to change. Downloading services have been launched by Musicmatch and Roxio (Napster 2.0). And Apple Computer is expected to announce on October 16 that it will expand its online music service to Windows-based computers. Yahoo and Amazon.com are also expected to jump into this market.

As we’ve seen in the case of cell phones, people won’t go to the trouble of breaking technical protection schemes if an industry provides a system of access and competitive pricing that the public views as tolerable and fair.

With the entire economy headed towards the complete dominance of information, this remains a critical challenge.

New Ways to Create Music

Musical expression also offers new challenges. It has always used the most advanced technologies available, from ancient drums, the cabinet-making crafts of the eighteenth century, the mechanical linkages of the nineteenth century, the analog electronics of the mid-20th century, the digital technology of the 1980s and 1990s to the artificial intelligence coming in the 21st century.

With digital samplers and synthesizers, we were able for the first time in human history to create sounds that had the complexity of acoustic sounds, but that did not originate from purely acoustic instruments. For example, we could start with piano samples and modify them with a variety of digital synthesis techniques to create sounds that had the richness of the piano, but were impossible with acoustic means alone.

A particular challenge that we dealt with in creating the Kurzweil 250 was how to recreate the enharmonic overtones of a piano. Most instruments have harmonic overtones, that is the overtones are perfect multiples of the fundamental frequency. In a piano, the overtones are slightly different from being perfect multiples, and this is one of the features that gives a piano its unique timbre. Conventional samplers at the time looped the last waveform and applied a decay envelope. But their piano samples sounded like organ samples (at the point of looping) because the overtones were simple multiples of the fundamental frequency, lacking the subtlety of the complex waveforms generated by the piano and other natural instruments.

In recent years, we’ve seen the emergence of software-based samplers, synthesizers, mixers, and sound processors. Although there still are significant performance benefits in using hardware DSP-based devices, software-based systems such as Reason are adequate to create professional recordings, such as movie soundtracks.

The next wave of instruments will be based on physical modeling, actually simulating the interaction of sound with the strings, curved wood, and other components of physical instruments. It is then possible, of course, to create simulated instruments that would be impossible to render physically. The concept of physical modeling has been around for over a decade, but available systems are limited to building instruments from limited sets of building blocks.

Future physical modeling systems will allow detailed emulation of highly complex shapes and materials, including, for example, the special resins used to create fine violins. The state of the art in physical modeling requires high-end DSP chips today, but software-based physical modeling synthesizers will be ubiquitous within five years. However, PCs will increasingly include DSPs, particularly since they are targeted at applications with audio and image processing that can benefit from DSPs. Intel experimented with this with a special version of the Pentium (Pentium MMX).  This is likely to continue to happen. Microprocessors used in synthesizers and consumer products will also increasingly include DSP functionality.

We are also moving towards an era of intelligent accompanists. We’ve had for many years “autoplay” features on home pianos for beginning students, but these are largely unsatisfactory because they require the human player to keep up with the automated players. What is needed in an intelligent accompanist is a system that follows the user, not the other way around. With such a system, a student could be playing a simple one-line melody, and the system would fill in with appropriate walking bass lines, rhythmic patterns, and harmonic progressions.

Tod Machover has developed a series of interactive instruments that he calls hyperinstruments. They effectively provide the serious musician with intelligent accompanists. Although the human player stays in control, a single player can match the richness and intricacy of an entire ensemble.

Music is a means of communicating human feelings and ideas from composers and performers to an audience. It is a language, or we might say a set of languages, that allows us to communicate emotions ranging from humor to sorrow. Machines can amplify our ability to communicate musically by providing richer palettes of sounds and means of manipulating and controlling them.

Machines can also provide narrow forms of intelligence that work in close concert with human intelligence. The closeness of this connection will grow over time, reflecting the overall growing intimacy between humans and their machines.