This paper describes a new group-oriented distributed shared memory (GDSM) computation model and its hardware support for improving performance in large-scale shared memory multiprocessors. The GDSM model is based on a concept of groups as basic computational units representing sets of parallel threads that cooperate and communicate by sharing an address space to solve large problems. These threads are assumed to run in parallel on separate processors in a globally - distributed shared-memory system. We consider parallel loops with inter-iteration data and control dependencies natural computational groups in scientific applications. The goal of introducing the group-oriented computation model is to 1) expose a space (logical group) dimension in the processes of creating and executing a parallel program on global heterogeneous systems and 2) exploit locality and predictability in the patterns of data sharing and processor communication by relaxing the memory release consistency model and providing multi-protocol communication within groups. This paper shows how GDSM features can be integrated into existing cache memory systems to tolerate remote memory access latency. An example of using the group-oriented computation model for parallel calculation of Fast Fourier Transform (FFT) is given.