As process technologies have scaled, the increasing number of processor cores and memories on a single die has also driven the need for more complex on-chip interconnection networks. Crossbar switches are primary building blocks in such networks-on-chip, as they can be used as fast single-stage networks or as the core of the router switch in multi-stage networks. While crossbars offer non-blocking, single-hop, all-to-all communication, they tend to scale poorly with the number of nodes due to the latency and energy of the long wires and highradix multiplexor structures needed. In this work, we investigate how to improve crossbar performance, energy-efficiency, and scalability. To better understand the design space and scaling limitations, we have developed an on chip switch modeling tool calibrated using circuit-level simulations. The tool enables a design space exploration showing how area, power, and performance vary across radix, data width, wire parameters, and circuit implementation. In addition to conventional design options, we examined capacitively coupled low-swing signaling to improve to energy consumption of the I/O wires. This exploration shows that the main bottlenecks are the long I/O wires and the key to improving the performance and efficiency is to minimize the area. Using these insights, we present modular crossbar switches that can perform better at high radices than the monolithic designs. The modular sub-blocks are arranged in a controlled flow-through, pipelined scheme to eliminate global connections and maintain linear performance scaling and high throughput. Modularity also enables energy savings via deactivation of unused I/O wires. To evaluate our design, we implemented a prototype radix-64 modular crossbar switch testcip in 40nm CMOS bulk process. The testchip operates at 2.38GHz at 1V nominal supply voltage and consumes 1.2W power. It offers 2.2X better throughput and 2.4X better energy-efficiency than published state of the art designs. We further evaluated modular crossbar networks with the proposed crevaluation tool. The proposed design achieves more than 90% saturation throughput with an internal speed up of 1.5, supports high data line rates, and offers lower average network latency compared to conventional crossbars. Evaluation results show that modular crossbars are scalable to high-radices while still offering high-performance, energy-efficiency and onehop simplicity.ossbar switches using BookSim2, a network on chip