In this paper, we present "PODIUM (POstech Distributed virtual Music environment)", a distributed virtual environment that allows users to participate in a shared space and play music with other participants in a collaborative manner. In addition to playing virtual instruments, users can communicate and interact in various ways to enhance the collaboration and, thus, the quality of the music played together. Musical messages are generated note by note through interaction with the keyboard, mouse, and other devices, and transmitted through an IP-multicasting network among participants. In addition to such note-level information, additional messages for visualization, and interaction are supported. Real world based visualization has been chosen, against, for instance, abstract music world based visualization, to promote "co-presence" (e.g. recognize and interact with other players), which is deemed important for collaborative music production. In addition to the entertainment purpose, we hope that DVME will find great use in casual practice sessions for even professional performers/orchestras/bands. Since even a slight interruption in the flow of the music or out-of-synch graphics and sound would dramatically decrease utility of the system, we employ various techniques to minimize the network delay. An adapted server-client architecture and UDP' s are used to ensure fast packet deliveries and reduce the data bottleneck problem. Time-critical messages such as MIDI messages are multicasted among clients, and the less time-critical and infrequently updated messages are sent through the server. Predefined animations of avatars are invoked by interpreting the musical messages. Using the latest graphics and sound processing hardware, and by maintaining an appropriate scene complexity, and a frame rate sufficiently higher than the fastest note duration, the time constraint tbr graphics and sound synchronization can be met. However, we expect the network delay could cause considerable problems when the system is scaled up for many users and processing simultaneous notes (for harmony). To assess the scalability, we carried out a performance analysis of our system model to derive the maximum number of simultaneous participants. For example, according to our data, about 50 participants should be able to play together without significant disruption, each using one track with five simultaneous notes and for playing a musical piece at a speed of 16 ticks per second in a typical PC/LAN environment. In hopes of enhancing the feeling of "co-presence" among participants, a simple sound localization technique is used to compute panning and relative volumes from positions and orientations of participants. This reduced sound localization model is used also in order to minimize the computational cost and the network traffic. Participants can send predefined messages by interacting with the keyboard, mouse, and other input devices. All of the predefined messages are mapped into simple avatar motions, such as playing various types of instruments (players), making applause (audience), and conducting gestures (conductors). We believe that for coordinated music performance, indirect interaction will be the main interaction method, for example, exchanging particular gestures, signals, and voice commands to synchronize music, conforming and reminding expression of the upcoming portion of the music, and just exchanging glances to enjoy each others' emotion. In this view, there would be mainly three groups of participants: conductor, players, and the audience, playing different roles, but creating co-presence together through mutual recognition. We ran a simple experiment comparing the music performance of two groups of participants, one provided with co-presence cues and the other without, and found no performance edge by the group with the co-presence cues. 5hch a result can serve as one guideline for building music-related VR applications.