Abstracts of Selected Publications by Stephen T. Pope
Topics
Best-sellers
Automatic Labeling and Control of Audio Algorithms by Audio Recognition
(with Jason LeBoeuf)
U. S. Patent Application 9,031,243
Controlling a multimedia software application using high-level metadata
features and symbolic object labels derived from an audio source, wherein a
first-pass of low-level signal analysis is performed, followed by a stage of
statistical and perceptual processing, followed by a symbolic
machine-learning or data-mining processing component is disclosed. This
multi-stage analysis system delivers high-level metadata features, sound
object identifiers, stream labels or other symbolic metadata to the
application scripts or programs, which use the data to configure processing
chains, or map it to other media. Embodiments of the invention can be
incorporated into multimedia content players, musical instruments, recording
studio equipment, installed and live sound equipment, broadcast equipment,
metadata-generation applications, software-as-a-service applications, search
engines, and mobile devices. Get
the
PDF file
Method and apparatus for analyzing animal vocalizations, extracting
identification characteristics, and using databases of these
characteristics for identifying the species of vocalizing animals (with
Tom Stephenson)
U. S. Patent 9,177,559
A method for capturing and analyzing audio, in particular vocalizing animals
including birds, frogs, and mammals, and which uses the resulting analysis
parameters to establish a database of identification characteristics for the
vocalizations of known species. This system of analysis can then be used on
the files of unknown species to identify the species producing that
vocalization type. The method uses a unique multi-stage method of analysis
that includes first-stage analysis followed by segmentation of a
vocalization into its structural components, such as Parts, Elements, and
sections. Further analysis of the individual Parts, Elements, sections and
other song structures produces a wide range of parameters which are then
used to assign groups of identical, known species a diagnostic set of
structural and qualitative criteria. Subsequently, the vocalizations of
unknown species can be similarly analyzed and the resulting parameters can
be used to match the unknown data sample to the database of similarly
analyzed audio data features from a plurality of known species. Get
the
PDF file
Method and system for scalable multi-stage music cover-song detection
(with D. Della Santa and J. Trevino)
Provisional patent application U.S. 62/944,798 filed December 6, 2019
A cover-song detection (CSD) method and system used to determine whether one
musical selection is a variation or “cover” of another. A computer processor
is presented with a plurality of musical selections in digital form (the
song database), and these data files are analyzed to generate one or more
multi-valued feature vectors for each. Later, a musical selection (called
the query, assumed not to be one of the plurality of musical selections) is
presented to the system, and similar analysis is used to generate one or
more multi-valued feature vectors for it.
The Big MAT Book: Courseware for Audio & Multimedia Engineering (in
3 volumes)
MAT/CREATE, 2008, 665 pages
Multimedia engineering is a broad and complex topic. It is also one of the
fastest-growing and most valuable fields of research and development within
electronic technology. The book before you is an anthology of curriculum
materials developed over the space of 12 years at the University of
California, Santa Barbara for students in UCSB’s Graduate Program in Media
Arts and Technology.
The BigMATBook consists of the presentation slides for eleven ten-week
courses, amounting to almost 500 hours of presentation time. For each of the
eleven courses, the presentation slides are accompanied by the tables of
contents of the course readers, and an overview of the example code
archives. These resources are available for down-load from the MAT or
HeavenEverywhere web sites (see http://HeavenEverywhere.com/TheBigMATBook).
The multimedia engineering courses included here cover theory and practice,
hardware and software, visual and audio media, and arts as well as
entertainment applications. Some of the courses (the first two chapters) are
required of all MAT graduate students, and thus must target less-technical
and also non-audio-centric students. The bulk of this material, though,
consists of elective courses that have somewhat higher-level prerequisites
and assume basic knowledge of acoustics and some (minimal) programming
experience in mainstream programming languages.
Get the PDF file
The Allosphere: An Immersive Multimedia Instrument for Scientific Data
Discovery and Artistic Exploration (with Xavier Amatriain, JoAnn
Kuchera-Morin and Tobias Hollerer)
IEEE Transactions on Multimedia, 2008.
The UCSB Allosphere is a 3-story high spherical space in which fully
immersive environments can be experienced. It allows for the
exploration of large-scale data sets in an environment that is at the same
time multimodal, multimedia, multi-user, immersive, and interactive. The
Allosphere is being used for research into scientific
visualization/auralization and data exploration but also as a research
environment for behavioral/cognitive scientists and artists. The facility
consists of a perforated aluminum sphere, ten meters in diameter, suspended
inside a near-anechoic cube. The Allosphere is being equipped with
high-resolution active stereo projectors, a complete 3D sound system with
hundreds of speakers and novel interfaces. Once fully equipped it will
enable seamless immersive projection and 3D audio. In this article we give
an overview of the purpose of the instrument as well as the systems that are
being put in place to equip such a unique environment. We also review the
first results and experiences in developing and using the Allosphere in
several prototype projects.
Get the PDF file
“The Acoustics of a large 3D Immersive Environment: The Allosphere at
UCSB,” (with D. Conant, T. Hoover and K. McNally)
Proc. 2008 ASA-EAA Joint Conference on
Acoustics. Paris.
The Allosphere is a new audio/visual immersion space for the California
Nanosystems Institute at the University of California, Santa Barbara, used
for both scientific and performing-arts studies. This 3-story sphere with
central-axis catwalk permits at unusually large experiential region. The
huge perforated-metal visual projection sphere, with its principle listening
locations centered inside the sphere, introduces multiple considerations and
compromises, especially since the ideal acoustical environmental is
anechoic. Video projection requires opaque light reflectivity of the concave
projection surface, while audio solicits extreme sound transmissibility of
the screen plus full-range sound absorptivity outside the sphere. The design
requires high-fidelity spatialization of a large number of simulated sound
sources over a large region near the core, and support of vector-based
amplitude panning, Ambisonic playback, and wave-field synthesis. This paper
discusses considerations that both conform to, and lie outside of,
traditional acoustical analysis methodologies, and briefly reviews the
electroacoustic systems design.
Get
the
PDF file
“Interchange Formats for Spatial Audio”
(invited position paper) Proc. 2008 Int’l
Computer Music Conference (ICMC), Belfast.
Space has been a central parameter in electroacoustic music composition and
performance since its origins. Nevertheless, the design of a standardized
interchange format for spatial audio performances is a complex task that
poses a diverse set of constraints and problems. This position paper
attempts to describe the current state of the art in terms of what can be
called “easy” today, and what areas pose as-yet unsolved technical or
theoretical problems. The paper ends with a set of comments on the process
of developing a widely useable spatial sound interchange format.
Get the PDF file
Scripting and Tools for Analysis/Resynthesis of Audio
Proceedings of the 2007 International
Computer Music Conference.
Software tools for audio analysis, signal processing and synthesis come in
many flavors; in general they fall into one of two categories: interactive
tools with limited extensibility, or non-graphical scripting languages. It
has been our attempt to combine the best features of these two worlds into
one framework that supports both (a) the easy development of GUI-based
applications for digital audio signal processing (DASP), and (b) an
extensible text-based scripting language with built-in libraries for DASP
applications. The goal is to combine the good performance of optimized
low-level code for the signal processing number-crunching, with a powerful,
flexible scripting language and GUI construction tools for application
development. We investigate the solutions to this dilemma on the basis of
four concrete examples in which DASP tools have been used together with the
Siren music/sound package for Smalltalk.
Get the PDF file
Teaching Digital Audio Programming: Notes on a Two-year Course Sequence
Proceedings of the 2007 International
Computer Music Conference.
The MAT 240 Digital Audio Programming course sequence is a six-quarter
(i.e., two-year) practical workshop class devoted to teaching digital audio
processing techniques and software development at the graduate level. It has
been delivered through several complete iterations at UCSB since 2000. In
this paper, we will introduce the course sequence topics, describe what
students actually do and learn in the course, and evaluate our challenges,
successes and failures.
Get the PDF file
Immersive Audio and Music in the Allosphere (with Xavier Amatriain,
Tobias Hollerer, and JoAnn Kuchera-Morin)
Proceedings of the 2007 International
Computer Music Conference.
The UCSB Allosphere is a 3-story-high spherical instrument in which virtual
environments and performances can be experienced in full immersion. It is
made of a perforated aluminum sphere, ten meters in diameter, suspended
inside an anechoic cube. The space is now being equipped with
high-resolution active stereo projectors, a 3D sound system with several
hundred speakers, and with tracking and interaction mechanisms. The
Allosphere allows for the exploration of large-scale data sets in an
environment that is at the same time multimodal, multimedia, multi-user,
immersive, and interactive. This novel and unique instrument will be used
for research into scientific visualization/auralization and data
exploration, and as a research environment for behavioral and cognitive
scientists. It will also serve as a research and performance space for
artists exploring new forms of art. In particular, the Allosphere has been
carefully designed to allow for immersive music applications. In this paper,
we give an overview of the instrument, focusing on the audio subsystem. We
present first results and our experiences in developing and using the
Allosphere in several prototype projects.
Get the PDF file
The Siren 7.5 Package for Music and Sound in Smalltalk
MAT/CREATE Internal Report, 2007
Siren is a programming framework for developing music/sound applications in
the Smalltalk programming system. It has been under development for more
than 20 years, and the newest version (7.5) has a collection of major
updates and new subsystems. This paper briefly introduces Siren, and then
concentrates on the significant new features, interfaces, and applications
in Siren 7.5.
Get the PDF file
Software Models and Frameworks for Sound Composition, Synthesis, and
Analysis: The Siren, CSL, and MAK Music Languages
Anthology, June, 2005, updated May, 2007,
462 pages
Music is an undeniably complex phenomenon, so the design of abstract
representations, formal models, and description languages for music-related
data can be expected to be a rich domain. Music-making consists of a variety
of diverse activities, and each of these presents different requirements for
developers of new abstract and concrete data formats for musician users.
The topic of this work is the design of formal models and languages for a
set of common musical activities including (but not limited to) composition,
performance and production, and semantic analysis. The background of this
work is the 50-year history of computer music programming languages, which
began with low-level and (by today’s standards) simplistic notations for
signal synthesis routines and compositional algorithms. Over these 50 years,
many generations of new ideas have been applied to programming language
design, and the topics of formal modeling and explicit knowledge
representation have arisen and taken an important place in computer science,
and thus in computer music.
The three concrete systems presented in this anthology have been developed
and refined over a period of 25 years, and address the areas, respectively,
of (a) music composition (Siren), (b) sound synthesis and processing (CSL),
and (c) music data analysis for information retrieval (MAK). In each
successive generation of refinement of these concrete languages, the
underlying models and metamodels have been considered and incrementally
merged, so that the current-generation (Siren 7, CSL 4 and MAK 4)
share both superficial and deep models and expressive facilities. This
allows the user (assumed to be a composer, performer, or musicologist) to
share data and functionality across these domains, and, as will be
demonstrated, to extend the models and frameworks into new areas with
relative ease.
The significant contributions of this work to the literature can be found in
(a) the set of design criteria and trade-offs developed for music language
developers, (b) the new object-oriented design patterns for computer music
systems, and (c) the trans-disciplinary design of the three specific
languages for composers, performer/producers, and musicologists presented
here. Get
the PDF file
MODE & Siren: Smalltalk and Music
The Siren 7.5 Package for Music and Sound in Smalltalk
MAT/CREATE Internal Report, 2007
Siren is a programming framework for developing music/sound applications in
the Smalltalk programming system. It has been under development for more
than 20 years, and the newest version (7.5) has a collection of major
updates and new subsystems. This paper briefly introduces Siren, and then
concentrates on the significant new features, interfaces, and applications
in Siren 7.5.
Get the PDF file
Metamodels and Design Patterns in CSL4 (with Xavier Amatriain, Lance
Putnam, Jorge Castellanos, and Ryan Avery)
Proceedings of the 2006 International
Computer Music Conference
The task of building a description language for audio synthesis and
processing consists of balancing a variety of conflicting demands and
constraints such as easy learning curve, usability, flexibility,
extensibility, and run-time performance. There are many alternatives as to
what a modern language for describing signal processing patches should look
like. This paper describes the object-oriented models and design patterns
used in version 4 of the CREATE Signal Library (CSL), a full rewrite that
included an effort to use concepts from the ”4MS” metamodel for multimedia
systems, and to integrate a set of design patterns for signal processing. We
refer the reader to other publications for an introduction to CSL, and will
concentrate on design and implementation choices in CSL4 that simplify the
kernel classes, improve their performance, and ease their extension while
using best-practice software engineering techniques.
Get the PDF file
Recent Developments in Siren: Modeling, Control, and Interaction for
Large-scale Distributed Music Software (with Chandrasekhar Ramakrishnan)
Proceedings of the 2003 International Computer Music Conference.
This paper describes recent advances in platform-independent object-oriented
software for music and sound processing. The Siren system is the result of
almost 20 years of continuous development in the Smalltalk programming
language; it incorporates an abstract music representation language,
interfaces for real-time I/O in several media, a user interface framework,
and connections to object databases. To support ambitious compositional and
performance applications, the system is integrated with a scalable realtime
distributed processing framework. Rather than presenting a system overview
(Siren is exhaustively documented elsewhere), we discuss the new features of
the system here, including its integration with new DSP frameworks, new I/O
interfaces, and its use in several recent compositions.
Get the PDF file
Music and Sound Processing in Squeak Using Siren
Invited Chapter in Squeak: Open Personal Computing and Multimedia
edited by Mark Guzdial and Kim Rose. Prentice-Hall, 2002.
The Siren system is a general-purpose music composition and production
framework integrated with Squeak Smalltalk (1); it is a Smalltalk class
library of about 200 classes for building musical applications. Siren runs
on a variety of platforms with support for real-time MIDI and multi-channel
audio I/O. The system's source code is available for free on the Internet;
see the Siren home page at the URL http://www.create.ucsb.edu/Siren. This
chapter concentrates on (a) the Smoke music description language, (b) the
real-time MIDI and sound I/O facilities, and (c) the GUIs for the 2.7
version of Siren. It is intended for a Squeak programmer who is interested
in music and sound applications, or for a computer music enthusiast who is
interested in Squeak applications.
Get the PDF file
The Musical Object Development Environment (MODE)--Ten Years of Music
Software in Smalltalk
Proceedings of the 1994 International Computer Music Conference.
The author has developed a family of software tool kits for composers with
the Smalltalk-80 programming sys tem over the last decade. The current MODE
Version 2 system supports structured composition, flexible graphical editing
of high- and low-level musical objects, real-time MIDI I/O, software sound
synthesis and processing, and other tasks. This poster will introduce the
MODE and SmOKe, its representation language, and survey the various end-user
applications it includes. The discussion will evaluate the system's
performance and requirements.
Get the PDF file
The Interim DynaPiano: An Integrated Tool and Instrument for Composers
Computer Music Journal 16:3, Fall, 1992, 21 p.
The Interim DynaPiano (IDP) is an integrated computer hardware/software
configuration for music composition, production, and performance based on a
Sun Microsystems Inc. SPARCstation computer and the Musical Object
Development Environment (MODE) software. The IDP SPARCstation is a powerful
hardware-accelerated color graphics RISC- (reduced instruction set computer)
based workstation computer running the UNIX operating system. It is
augmented by large RAM and disk memories and coprocessors and interfaces for
real-time sampled sound and MIDI I/O. The MODE is a large hierarchy of
object-oriented software components for music written in the Smalltalk-80
language and programming system. MODE software applications in IDP support
flexible structured music composition, sampled sound recording and
processing, and real-time music performance using MIDI or sampled
sounds. The motivation for the development of IDP is to build a
powerful, flexible, and portable computer-based composer's tool and musical
instrument that is affordable by a professional composer (i.e., around the
price of a good piano or MIDI studio). The hardware and low-level software
of the system consist entirely of off-the-shelf commercial components. The
goal of the high-level and application software is to exhibit good
object-oriented design principles and elegant modern software engineering
practice. The basic configuration of the system is consistent with a whole
series of "intelligent composer's assistants" based on a core technology
that has been stable for a decade. This article presents an overview of the
hardware and software components of the current IDP system. The background
section discusses several of the design issues in IDP in terms of
definitions and a set of examples from the literature. The hardware system
configuration is presented next, and the rest of the article is a
description of the MODE signal and event representations, software
libraries, and application examples.
Get
the
PDF file
The SmOKe Music Representation, Description Language, and Interchange
Format
Proceedings of the 1992 International Computer Music Conference.
The Smallmusic Object Kernel (SmOKe) is an object-oriented representation,
description language and interchange format for musical parameters, events,
and structures. The author believes this representation, and its proposed
linear ASCII description, to be well-suited as a basis for: (1) concrete
description interfaces in other languages, (2) specially-designed binary
storage and interchange formats, and (3) use within and between interactive
multimedia, hypermedia applications in several application do mains. The
textual versions of SmOKe share the terseness of note-list-oriented music
input languages, the flexibility and extensibility of "real" music
programming languages, and the non-sequential description and annotation
features of hypermedia description formats. This description
defines SmOKe's basic concepts and constructs, and presents examples of the
music mag nitudes and event structures. The intended audience for this
discussion is programmers and musicians working with digital-
technology-based multimedia tools who are interested in the design issues
related to music representations, and are familiar with the basic concepts
of software engineering. Two other documents ([Smallmusic 1992] and [Pope
1992]), describe the SmOKe language, and the MODE environment within which
it has been implemented, in more detail.
Get the PDF file
Modeling Musical Structures as EventGenerators
Proceedings of the 1989 International Computer Music Conference.
There is a broad range of music description languages. The common terms for
describing musical structures define a vocabulary that every musician learns
as part of his or her training. The terms we take for granted in de scribing
music can be used for building generative software description languages.
This paper describes recent work modeling higher-level musical structures in
terms of objects that understand specialized sub-languages for creation
of-and interaction with-musical structures. The goal is to provide tools for
composers to describe compositions by incrementally refining the behaviors
of a hierarchical collection of structure models.
Get the PDF file
T-R Trees in the MODE (A Tree Editor Based Loosely on Fred's Theory)
Proceedings of the 1991 International Computer Music Conference.
The T-R Trees software system is a set of software tools for the graphical
and programmatic manipulation of expressive and structural hierarchies in
music composition. It is loosely based on the hierarchies described in Fred
Lerdahl and Ray Jackendoof's landmark book A Generative Theory of Tonal
Music--weighted grouping and prolongational reduction trees (also
called tension-relaxation or T-R trees). This article describes T-R tree
derivation, editing, and application in score representation and management.
Get the PDF file
Distributed Processing
The Distributed Processing Environment for High-Performance Distributed
Multimedia Applications (with Andreas Engberg, Frode Holm, and Ahmi Wolf)
Proc. 2001 IEEE Multimedia Technology and Applications Conference
Our group is involved in implementing large-scale multimedia software for
application areas ranging from multi-user virtual worlds to complex
real-time sound synthesis. We call this class of system High-Performance
Distributed Multimedia (HPDM) software. The Distributed Processing
Environment (DPE) is an infrastructure for configuring and managing HPDM
software. It consists of several components that allow the start-up,
monitoring, and shut-down of software services on a network. This report
describes the design and implementation of the prototype DPE system, which
we built for the ATON project.
Get the PDF file
The Real-time (Multimedia) Interface Description Language: RIDL (with
Andreas Engberg and Frode Holm)
Proc. 2001 IEEE Multimedia Technology and Applications Conference
The Real-time Multimedia Interface Description Language—RIDL—is an extension
of the CORBA IDL for use in building distributed real-time multimedia
software systems. We designed RIDL to integrate quality-of-service (QoS)
information, as well as configuration requirements, into the IDL interface
descriptions of our software components. We have built a flexible
first-generation RIDL compiler and associated repositories.
Get the PDF file
All About CRAM: The CREATE Real-time Application Manager
CREATE Internal Report
The CREATE Real-time Applications Manager (CRAM) is a framework for
developing, deploying, and managing distributed real-time software. It has
evolved in our group at UCSB through three implementations over the space of
five years. The background of CRAM is the work done since the early 1990s on
distributed processing environments (DPEs), which started in the
telecommunications industry (see Appendix 1). CRAM is unusual among DPEs in
that it is very light-weight and efficient, but also fault-tolerant, and
that it supports both planning-time and run-time load balancing as required
by real-time applications. Its main application areas to date are
large-scale music performance systems and distributed virtual environments.
Get the PDF
file.
ATON Report 2001.06.1: ATON/UCSB Final Report
CREATE Internal Report
The ATON Project was an ambitious, large-scale, multi-year R&D effort
undertaken by three teams collaborating across several disciplines. The
original project description (see the ATON web site
http://www.create.ucsb.edu/ATON/overview.html) stated, “The project involves
topics as diverse as robotics, computer vision, distributed multimedia
processing, and virtual reality.“ For the ATON system, we need to build a
virtual environment (VE) that allows one or more users to control robots and
video cameras located anywhere in the state of California, and to “see
through the eyes” of the robots to manage traffic incidents. This implies a
kind of widearea distributed real-time multimedia system that we call
High-Performance Distributed Multimedia (HPDM) software. This report
summarizes the work carried out in the CREATE Lab at UCSB as part of the
DiMI ATON Project between 1999 and 2001. We describe the background of the
ATON Project, and discuss our efforts, relating them to our published
reports and concrete deliverables. Get
the
PDF file.
Computer Music and Music Composition
Producing Kombination XI: Using Modern Hardware and Software Systems for
Composition
Leonardo Music Journal, 2(1): 23-28, 1992.
This article discusses two topics related to the realization of my
composition "Kombination XI: A Ritual Place for Live and Processed Voices."
These are the score's structure representation language and the software
tools for manipulating it using graphical structure editors, and the process
of realization using several different digital signal processing software
and hardware systems. The reason for focusing on the first issue is the
attempt to built a notation and set of software tools based on weighted
trees that span the expressive and structural domains of music. The second
topic is of interest as an example of the possibility of using several types
of computer hardware and software in consort as one instrument. Numerous
score and structure description and editing examples, and documentation of
the realization process are presented.
Get the PDF file
Fifteen Years of Computer-assisted Composition
Proceedings of the 2nd Brazilian Symposium on Computer Music, 1995.
This paper describes several generations of computer music systems and the
music they have enabled. It will introduce the software tools used in some
of my music com positions realized in the years 1979-94 at a variety of
studios using various software and hardware systems and programming
languages. These tools use a wide range of compositional methods, including
(among others): high-level graphical notations, lim ited stochastic
selection, Markov transition tables, forward-chaining expert systems,
non-deterministic Petri networks, and hierarchical rule-based knowledge
systems. The paper begins by defining several of the terms that are
frequently used in the computer music literature with respect to
computer-aided composition and realization, and intro duces several of the
categories of modern models of music composition. A series of in- depth
examples are then drawn from my works of the last 15 years, giving
descriptions of the models, the software tools, and demonstrating the
resulting music.
Get the PDF file
Computer Music Workstations I Have Known and Loved
Proceedings of the 1995 International Computer Music Conference.
This paper introduces a set of design criteria and points of current
debate in the development of computer music workstations. It surveys the
systems of the last ten years and makes several subjective comments on the
design and implementation of computer-based tools for music composition,
production, and live performance. The intent is to focus the reader's
attention on the issues of hardware architecture and soft ware support in
defining computer-based tools and instruments.
Get the PDF file
Why is Good Electroacoustic Music So Good? Why is Bad Electroacoustic
Music So Bad?
(expanded version of the Editor's Note in CMJ 18:3 with responses). YLEM
Newsletter 15:4 (July/August, 1995), 4 p.
Get the ASCII text file
Real-Time Performance via User Interfaces to Musical Structures
Proceedings of the Int'l Workshop on Man-Machine Interaction in Live
Performance, Pisa, Italy, June, 1991. reprinted in Interface
22(3): 195-212. 9 p.
This informal and subjective presentation will introduce and compare several
software systems written by the myself and others for computer music
composition and perfor mance based on higher-level abstractions of musical
data structures. I will then evaluate a few of the issues in real-time
interaction with structural descriptions of musical data. The premise
is that very interesting live-performance software environments could be
based in existing technology for structural music description, but that much
of the current real-time performance-oriented software for music is rather
limited in that it supports only very low-level notions of musical
structures.The examples will demonstrate various systems for graphical
interaction with procedural, knowledge-based, hierarchical and/or stochastic
music description systems that could be used for live performance.
Get the PDF file (without figures) Read
the
HTML version (*with* figures)
Web.La.Radia: Social, Economic, and Political Aspects of Music and
DigitalMedia
Invited Paper, Salzburg Symposium on New Media Technology and Networking
for Creative Applications (1997). Reprinted in Proceedings of the 1997
International Computer Music Conference, Thessaloniki. Reprinted in
Computer Music Journal 23:1, Spring, 1999, 10 p.
This informal essay addresses the current status and trajectory of media art
and media technology. In formulating my ideas on these topics, I found
myself being drawn away from my usual technical concerns, and increasingly
to the sociology, economics, and political relationships of electronic media
art and its modes of production and dissemination. There are several rather
bold statements below on the subject of new media art and art-making on the
world-wide web, and I rely heavily on a series of quotes taken from the
literature to make my points, without the implication that I necessarily
agree with every one of them. I take a critical stance in these comments,
but still do not wish to be considered a ìweb-Luddite.î I use the web daily,
and it is a major component of my research. On the other hand, I am very
concerned by several trends I see in the web culture and feel that it is
necessary to draw attention to them.
Get the PDF File
Music Information Retrieval and Databases
Automatic Labeling and Control of Audio Algorithms by Audio Recognition
(with Jason LeBoeuf)
U. S. Patent Application 20110075851, 2010
Controlling a multimedia software application using high-level metadata
features and symbolic object labels derived from an audio source, wherein a
first-pass of low-level signal analysis is performed, followed by a stage of
statistical and perceptual processing, followed by a symbolic
machine-learning or data-mining processing component is disclosed. This
multi-stage analysis system delivers high-level metadata features, sound
object identifiers, stream labels or other symbolic metadata to the
application scripts or programs, which use the data to configure processing
chains, or map it to other media. Embodiments of the invention can be
incorporated into multimedia content players, musical instruments, recording
studio equipment, installed and live sound equipment, broadcast equipment,
metadata-generation applications, software-as-a-service applications, search
engines, and mobile devices. Get
the
PDF file
Feature Extraction and Database Design for Music Software (with Frode
Holm and Alexandre Kouznetsov)
Proceedings of the 2004 International Computer Music Conference
Persistent storage and access of sound/music meta-data is an increasingly
relevant topic to the developers of multimedia software. This paper focuses
on the design of music signal analysis tools and database formats for modern
applications. It is partly tutorial in nature, and partly a discussion of
design issues. We begin with a high-level overview of the dimensions of
music database (MDB) software, and then walk through the common feature
extraction techniques. A requirements analysis of several application
categories will allow us to carefully determine which features might be most
useful for them. This leads us to suggest concrete architectural and design
criteria, and to close by introducing several of our recent implemented
systems. The authors believe that much current MDB software suffers due to
ad-hoc design of analysis systems and feature vectors, which often
incorporate only low-level features and are not tuned for the application at
hand. Our goal is to advance the state of the art of music meta-data
extraction and database design by fostering a better engineering practice in
the construction of high-level feature vectors and analysis engines for
music software.
Get the PDF file
The FASTLab Music Analysis Kernel
FASTLab Internal Report
The FASTLab Music Analysis Kernel (FMAK) is a software package for building
and using music and sound databases. It consists of four main interfaces:
analysis, segmentation, clustering, and classification. The FMAK analyzer
computes both low-level and high-level features (called feature vectors or
meta-data) from musical selections. The segmenter takes these feature
vectors and finds the phrase, verse, and section breaks in music, thus
discovering the musical form and allowing us to reduce the number of feature
vectors we need to store. The clustering functions support data mining in
large databases of feature vectors by grouping the data into well-defined
genre clusters. The classifier adds customizable database pruning and
run-time distance metrics for using genre databases. These four components
can be used in a variety of ways to build software applications that
processes large volumes of multimedia data.
Get
the
PDF file
Expert Mastering Assistant (EMA) Version 2.0 Technical Documentation
(with Alex Kouznetsov)
FASTLab Internal Report
This document describes the design, and implementation of the “Expert
Mastering Assistant” (EMA) tool version 2.0 developed by UCSB Center for
Research in Electronic Art Technology (CREATE), and FastLAB Inc. for
Panasonic Spin-Up Fund. The “expert mastering assistant” (EMA) is a
prototype artificial-intelligence-based software tool that “listens” to a
set of musical selections and gives expert advice to a mastering engineer,
suggesting parameters for signal processing modules that perform the signal
processing: equalization, compression, reverberation, etc. EMA suite
consists of two major components: the interactive EMA application that
analyses and processes individual songs with real-time interactivity, and a
number of development applications that are required as a part of the expert
system training process (Figure 1).
Get
the PDF file
The Open Music Network Infrastructure (OMNI)
CREATE Internal Report
This proposal describes the Open Music Network Infrastructure (OMNI), an
Internet-based music service that aims to provide music content providers
with a new forum in which to attract music consumers, enabling the so-called
“second music industry.” The OMNI system consists of a content provider
interfaces, a large-scale artificial-intelligence-assisted “smart”
music/sound database, and listener services that allow users to select
musical selections based on their personal taste. The most unique feature of
OMNI relative to other web-based music services is this use of a smart
indexing and search component in the database, which facilitates
little-known musicians finding an audience that would like their songs. This
document is aimed at a semi-technical reader.
Get the PDF file
Content Analysis and Queries in a Sound and Music Database
Proceedings of the 1999 International Computer Music Conference.
The Paleo database project at CREATE aims to develop and deploy a
large-scale integrated sound and music database that supports several kinds
of content and analysis data and several domains of queries. The basic
components of the Paleo system are: (1) a scalable general-purpose object
database system, (2) a comprehensive suite of sound/music analysis (feature
extraction) tools, (3) a distributed interface to the database, and (4)
prototype end-user applications The Paleo system is based on a rich set of
signal and event analysis programs for feature extraction from sound and
music data. The premise is that, in order to support several kinds of
queries, we need to extract a wide range of different kinds of features from
the data as it is loaded into the database, and possibly to analyze still
more in response to queries. The results of these analyses will be very long
³feature vectors² (or multi-level indices) that describe the contents of the
database. To be useful for a wide range of applications, the Paleo system
must allow several different kinds of queries, i.e., it needs to manage
large and changing feature vectors. As data in the database is used,
the feature vectors can be simplified. This might mean discarding spectral
analysis data for speech sounds, or metrical grouping trees for unmetered
music. This is what sets Paleo apart from most other media database
projects‹the use of complex and dynamic feature vectors and indices.
This paper introduces the Paleo system¹s architecture, and then focusses on
three issues: the signal and event analysis routines, the use of constraints
in analysis and queries, and the object storage layer and formats. Some
examples of Paleo usage are also given. Get
the PDF file of the text Get
the PDF File of the presentation slides
Spatial and 3-D Sound Systems
Immersive Audio and Music in the Allosphere (with Xavier Amatriain,
Tobias Hollerer, and JoAnn Kuchera-Morin)
Proceedings of the 2007 International
Computer Music Conference.
The UCSB Allosphere is a 3-story-high spherical instrument in which virtual
environments and performances can be experienced in full immersion. It is
made of a perforated aluminum sphere, ten meters in diameter, suspended
inside an anechoic cube. The space is now being equipped with
high-resolution active stereo projectors, a 3D sound system with several
hundred speakers, and with tracking and interaction mechanisms. The
Allosphere allows for the exploration of large-scale data sets in an
environment that is at the same time multimodal, multimedia, multi-user,
immersive, and interactive. This novel and unique instrument will be used
for research into scientific visualization/auralization and data
exploration, and as a research environment for behavioral and cognitive
scientists. It will also serve as a research and performance space for
artists exploring new forms of art. In particular, the Allosphere has been
carefully designed to allow for immersive music applications. In this paper,
we give an overview of the instrument, focusing on the audio subsystem. We
present first results and our experiences in developing and using the
Allosphere in several prototype projects.
Get the PDF file
Audio in the UCSB CNSI AlloSphere
MAT/CNSI Internal Report
The UCSB AlloSphere is a joint effort of the California NanoSystems
Institute (CNSI) and the graduate program in Media Arts and Technology (MAT)
at the University of California Santa Barbara (UCSB). It is currently under
construction, with completion scheduled for the first half of 2006. The
AlloSphere is designed as an immersive computational interface for 10 to 20
users, featuring surround-sound data sonification and immersive
visualization (i.e., 3D audio and video projection) on a spherical surface.
It will provide interactive control by the means of microphone arrays,
cameras, and mechanical, and magnetic input tracking. The actual shape of
the AlloSphere can be described as two hemispheres with 16-foot radii pulled
8 feet apart, placed in a 3-story anechoic chamber. A 7-foot-wide bridge
runs across the center, supporting the users. This document describes the
requirements for the audio component of the AlloSphere, introduces the three
prevalent spatial sound processing technologies in use today, and outlines
the AlloSphere audio input and projection design and implementation plan,
from low-level transducer elements to high-level network protocols.
Get the PDF file
The State of the Art in Sound Spatialization
There are several aspects to the field of spatial sound, each of which pose
different chalenges and offer different potential applications. Although our
understanding of aural perception is still incomplete, we are able to both
synthesize and record spatial sound fields, and to render sound such that
the fidelity of localization is very high (for a specific listener). There
are several well-known and effective techniques for creating the perceptual
cues that our brains use to localize sound, but the systems that scale well
to large spaces or to many listeners are not the same ones that give the
best localizational fidelity. The formal study of spatial sound performance
in larger space (e.g., concert halls) is still in its (relative) infancy.
Most work in this area has been ad hoc, treating the spatial sound
performance situation more as an instrumental performance than as a
controlled experiment. This presentation will explore the aspects of
aural perception that contribute to the difficulties, and the potential, in
the recording and playback of spatial sound, and will survey the current
techniques used in this area.
Get the PDF File
Building Sound into a Virtual Environment: An Aural Perspective Engine
for a Distributed Interactive Virtual Environment (An APE for a DIVE).
(with Lennart E. Fahlén)
Report of the Distributed Systems Laboratory of the Swedish Institute for
Computer Science, Stockholm, August, 1992.
We have investigated the addition of spatially-localized sound to an
existing graphics-oriented synthetic environment (virtual reality system).
To build "3-D audio" systems that are robust, listener-independent,
real-time, multi-source, and able to give stable sound localization is
beyond the current state-of-the-art-even using expensive special-purpose
hardware. The "auralizer" or "aural renderer" described here was built as a
test-bed for experimenting with the known techniques for generating sound
localization cues based on the geometrical models available in a synthetic
3-D world. This paper introduces the psychoacoustical background of sound
localization, and then describes the design and usage of the DIVE auralizer.
We close by evaluating the system's implementation and performance.
Get the PDF file
The Use of 3-D Audio in a Synthetic Environment (with Lennart E. Fahlén)
Proceedings of the 1993 AIMI Colloquium,
Milan, Italy.
(See the above abstract.) Get
the
PDF file
Machine Tongues--Computer Music Journal Survey or
Tutorial Articles
Machine Tongues XI: Object-oriented Software Design
Computer Music Journal 13(2):9-22, Summer, 1989
Object-oriented programming is a term that represents a collection of new
techniques for problem-solving and software engineering. Two previous
articles in this "Machine Tongues" series have introduced object-oriented
programming, presenting tutorials to this technology, and describing its
application to music modeling and software development (Krasner 1980,
Lieberman 1982). This paper discusses the new problem-solving techniques
that constitute the object-oriented design methodology. Object-oriented
analysis, synthesis, design and implementation are presented, while
stressing the issues of design by analytical modeling, design for reuse, and
the development of software packages in terms of frameworks, toolkits and
customizable applications. Numerous object-oriented software description
examples and architectural structures are presented including music
modeling, representation and interactive applications. This essay will
outline object-oriented problem-solving and software design in a language
independent manner. Examples will be taken primarily from the Smalltalk-80
(TM of ParcPlace Systems) programming system, but the reader need only refer
to some of the other articles in this issue of Computer Music Journal for
descriptions of systems based on other languages and programming
environments. No basic introduction to the terms or techniques of
object-oriented languages will be presented here. Get
the
PDF file
Machine Tongues XV: Three Packages for Software Sound Synthesis
Computer Music Journal 17(2): 23-54, Summer, 1993
The origin of the technology and methodology of modern computer music is
certainly the Music V family of software sound synthesis systems developed
since the late 1950s. In the "old days," this consisted of batch computer
processing of musical programs expressed in terms of instrument definitions
(programs) and score note lists (input data), generating sampled sound
output data to off-line storage for later performance. The noticeable
rekindling of interest in programs and languages for software sound
synthesis (SWSS) and software digital audio signal processing (DSP) using
general-purpose computers is due to a number of factors, not least among
them the dramatic increase in the power of personal workstations over the
last five years. There are currently three widely-used, portable, C-language
SWSS tools: (in alphabetical order) cmix (Lansky 1990), cmusic (Moore 1990),
and Csound (Vercoe 1991). This article will discuss the technology of SWSS
and then present and compare these three systems. It is divided into three
parts; the first introduces SWSS in terms of progressive examples. Part two
compares the three systems using the same two instrument/score examples
written in each of them. The final section presents informal benchmark tests
of the systems run on two different hardware platforms-a Sun Microsystems
SPARCstation-2 IPX and a Next Computer Inc. TurboCube machine-and subjective
comments on various features of the languages and programming environments
of state-of-the-art SWSS software. Get
the
PDF file
Machine Tongues XVIII. A Child's Garden of Sound File Formats (with
Guido Van Rossum)
Computer Music Journal 19(1): 25-63 Spring, 1995.
This article introduces a few of the many ways that sound data can be stored
in computer files, and describes several of the file formats that are in
common use for this purpose. This text is an expanded and edited version of
a "frequently asked questions" (FAQ) document that is updated regularly by
one of the authors (van Rossum). Extensive references are given here to
printed and network-accessible machine-readable documentation and source
code resources.
Getthe PDF file
Object-Oriented Programming and Design Patterns
Metamodels and Design Patterns in CSL4 (with Xavier Amatriain, Lance
Putnam, Jorge Castellanos, and Ryan Avery)
Proceedings of the 2006 International
Computer Music Conference.
The task of building a description language for audio synthesis and
processing consists of balancing a variety of conflicting demands and
constraints such as easy learning curve, usability, flexibility,
extensibility, and run-time performance. There are many alternatives as to
what a modern language for describing signal processing patches should look
like. This paper describes the object-oriented models and design patterns
used in version 4 of the CREATE Signal Library (CSL), a full rewrite that
included an effort to use concepts from the ”4MS” metamodel for multimedia
systems, and to integrate a set of design patterns for signal processing. We
refer the reader to other publications for an introduction to CSL, and will
concentrate on design and implementation choices in CSL4 that simplify the
kernel classes, improve their performance, and ease their extension while
using best-practice software engineering techniques.
Get the PDF file
The Well-Tempered Object: Musical Applications of Object-Oriented
Software Technology -- A Structured Anthology on Software Science and
Systems based on Articles from Computer
Music Journal 1980-89
Compiled and edited by Stephen Travis Pope. Published by MIT Press, 1991
See Well-Tempered
Object Web Page
A Description of the Model-View-Controller User Interface Paradigm in
the Smalltalk-80 System (The MVC Cookbook) (with Glenn Krasner)
Journal of Object-Oriented Programming 1(3):26-49 This essay
describes the Model-View-Controller (MVC) programming paradigm and
methodology used in the Smalltalk-80TM programming system. MVC programming
is the application of a three-way factoring, whereby objects of different
classes take over the operations related to the application domain, the
display of the application's state, and the user interaction with the model
and the view. We present several extended examples of MVC implementations
and of the layout of composite application views. The Appendices provide
reference materials for the Smalltalk-80 programmer wishing to understand
and use MVC better within the Smalltalk-80 system.
Get the PDF file
Presentation Slides
Keynote Speech from the CWU Symposium on Undergraduate Research and
Creative Expression (SOURCE)
Get the slides as a PDF File See also STP's
SOURCE Links
The State of the Art in "Sound and Music Computing"
Slides for a presentation given at the weekly computer science
colloquium, UCSB, Feb. 7, 1996. Get
the
PDF file
Composition by Refinement
Presentation at the AIMI Conference, 1989.
Description of the use of the HyperScore ToolKit for composition. Get
the
PDF file
Building Large-scaleInteractive Systems with OSC, Siren, CSL, and CRAM
UC Berkeley AudioIcon Workshop, 2003
Get the
PDF file
CREATE White Papers and Project Reports
Distributed Multimedia Systems R&D at CREATE
Since 1996, the UCSB Center for Research in Electronic Art Technology
(CREATE) has been the home of a series of projects on distributed software
systems for real-time and multimedia applications. Several aspects of our
work are relevant to a new classes of applications as more and more systems
are built using distributed object software technology for real-time
services. This white paper describes our previous projects and innovations
in this area and our plans for the future. Get
the PDF file.
Research on Spatial and Surround Sound at CREATE
Researchers at the UCSB Center for Research in Electronic Art Technology
(CREATE) have been developing spatial sound performance systems and
multichannel surround sound rendering software for several years. We use
these systems as components of immersive user interfaces for a variety of
applications, as well as for the performance of spatialized music. This
white paper surveys our previous work in the field and describes our plans
for the future. Get
the PDF file.
Research on Music/Sound Databases at CREATE
Large-scale storage of sound and music has only become possible in the last
decade. With this, and the new possibility for wide-area distribution of
multimedia over the Internet, there arose a new requirement for flexible and
powerful databases for musical and audio data. Since 1996, our work at
CREATE has focused on database frameworks for multimedia applications, and
on analysis and feature extraction techniques for music and sound databases.
This white paper describes our results and presents several of our plans for
future applications. Get
the PDF file.
Application and User Interface Development at CREATE
The history of computer applications in music reaches back into the 1950s.
Only recently, however, has it been possible to control complex musical
processes such as algorithmic composition or sophisticated sound synthesis
programs in real-time. Advanced software and hardware technology also allow
us to develop user interfaces that allow non-musicians (and even
non-readers) to be musically creative. These two domains of application
development and user interface construction have been important tasks at
CREATE for ten years. We present examples of tools weíve developed below,
and discuss what features they introduce that might be useful to other
application areas. Get
the PDF file.
The CREATE Signal Library (“Sizzle”): Design, Issues, and Applications
(with Chandrasekhar Ramakrishnan)
Proceedings of the 2003 International Computer Music Conference
The CREATE Signal Library (CSL) is a portable general-purpose software
framework for sound synthesis and digital audio signal processing. It is
implemented as a C++ class library to be used as a standalone synthesis
server, or embedded as a library into other programs. This first section of
this paper describes the overall design of CSL version 3 and gives a series
of progressive code examples. We also present CSL's facilities for network
I/O of control and sample streams, and the development and deployment of
distributed CSL systems. What is more interesting is the discussion that
follows of the design issues we faced in implementing CSL, and the
presentation of a few of the applications in which we've used CSL over the
last year. Get
the PDF file.
See Also
Full Bibliography
List of Musical
Compositions
Example Reviews
of My Music
Computer
Music
Journal WWW/FTP Archives (many music-related links)
Return to home page
For more detailed information, mail
a letter to STP.
[Stephen Travis Pope, stp@create.ucsb.edu]