In service computing, online services and the Internet environment are evolving over time, which poses a challenge to service composition for adaptivity. In addition, high efficiency should be maintained when faced with massive candidate services. Consequently, this paper presents a new model for large-scale and adaptive service composition based on multi-agent reinforcement learning. The model integrates on-policy reinforcement learning and game theory, where the former is to achieve adaptability in a highly dynamic environment with good online performance, and the latter is to enable multiple agents to work for a common task (i.e., composition). In particular, we propose a multi-agent SARSA (State-Action-Reward-State-Action) algorithm which is expected to achieve better performance compared with the single-agent reinforcement learning methods in our composition framework. The features of our approach are demonstrated by an experimental evaluation.