The memory system is a major bottleneck in achieving high performance and energy efficiency for various processing platforms. This thesis aims to improve memory performance and energy efficiency of data intensive applications through a two-pronged approach which combines a formal representation framework and a hardware substrate that can efficiently reorganize data in memory. The proposed formal framework enables representing and systematically manipulating data layout formats, address mapping schemes, and memory access patterns through permutations to exploit the locality and parallelism in memory. Driven by the implications from the formal framework, this thesis presents the HAMLeT architecture for highly-concurrent, energy-efficient and low-overhead data reorganization performed completely in memory. Although data reorganization simply relocates data in memory, it is costly on conventional systems mainly due to inefficient access patterns, limited data reuse, and roundtrip data traversal throughout the memory hierarchy. HAMLeT pursues a near-data processing approach exploiting the 3D-stacked DRAM technology. Integrated in the logic layer, interfaced directly to the local controllers, it takes advantage of the internal fine-grain parallelism, high bandwidth and locality which are inaccessible otherwise. Its parallel streaming architecture can extract high throughput from stringent power, area, and thermal budgets. The thesis evaluates the efficient data reorganization capability provided by HAMLeT through several fundamental use cases. First, it demonstrates software-transparent data reorganization performed in memory to improve the memory access. A proposed hardware monitoring determines inefficient memory usage and issues a data reorganization to adapt an optimized data layout and address mapping for the observed memory access patterns. This mechanism is performed transparently and does not require any changes to the user software—HAMLeT handles the remapping and its side effects completely in hardware. Second, HAMLeT provides an efficient substrate to explicitly reorganize data in memory. This gives an ability to offload and accelerate common data reorganization routines observed in high-performance computing libraries (e.g., matrix transpose, scatter/gather, permutation, pack/unpack, etc.). Third, explicitly performed data reorganization enables considering the data layout and address mapping as a part of the algorithm design space. Exposing these memory characteristics to the algorithm design space creates opportunities for algorithm/ architecture co-design. Co-optimized computation flow, memory accesses, and data layout lead to new algorithms that are conventionally avoided.