Machine learning approaches have been widely adopted in recent years due to their capability of learning from data rather than hand-tuning features manually. We investigate two important aspects of machine learning methods, i.e., (i) applying machine learning in computing system optimization and (ii) optimizing machine learning algorithms, especially deep convolutional neural networks, so they can train and infer efficiently. As power emerges as the main constraint for computing systems, controlling power consumption under a given Thermal Design Power (TDP) while maximizing the performance becomes increasingly critical. Meanwhile, systems have certain performance constraints that the applications should satisfy to ensure Quality of Service (QoS). Learning approaches have drawn significant attention recently due to the ability to adapt to the ever-increasing complexity of the system and applications. In this thesis, we propose On-line Distributed Reinforcement Learning (OD-RL) based algorithms for many-core system performance improvement under both power and performance constraints. The experiments show that compared to the state-of-the-art algorithms, our approach: 1) produces up to 98% less budget overshoot, 2) up to 23% higher energy efficiency, and 3) two orders of magnitude speedup over state-of-the-art techniques for systems with hundreds of cores, while an improved version can better satisfy performance constraints. To further improve the sample-efficiency of RL algorithms, we propose a novel Bayesian Optimization approach to speed up reinforcement learning-based DVFS control by 37.4x while maintaining the performance of the best rule-based DVFS algorithm. Convolutional Neural Networks (CNNs) have shown unprecedented capability in visual learning tasks. While accuracy-wise CNNs provide unprecedented performance, they are also known to be computationally intensive and energy demanding for modern computer systems. We propose Virtual Pooling (ViP), a model-level approach to improve inference speed and energy consumption of CNN-based image classification and object detection tasks, with provable error bound. We show the efficacy of ViP through extensive experiments. For example, ViP delivers 2.1x speedup with less than 1.5% accuracy degradation in ImageNet classification on VGG-16, and 1.8x speedup with 0.025 mAP degradation in PASCAL VOC object detection with Faster-RCNN. ViP also reduces mobile GPU and CPU energy consumption by up to 55% and 70%, respectively. We further propose to train CNNs with fine-grain labels, which not only improves testing accuracy but also the training data efficiency. For example, a CNN trained with fine-grain labels and only 40% of the total training data can achieve higher accuracy than a CNN trained with the full training dataset and coarse-grain labels.