<p dir="ltr">Mobile robots have made long progress and start to have significant real-world impact now. For example, from mobile robot explores environments in search and rescuing scenarios where human have limited access, quadrupedal robots and drones navigate around and inspect infrastructure, to more recently, humanoid robots move and finish works in factories and even in household environments. We are witnessing robots gaining more movability and dexterity. However, significant challenges remain when adapting mobile robots in real world environments. </p><p dir="ltr">Starting from deploying a robot in a new environment, the robot explores the environment and incrementally acquires new information and makes decision about the subsequent exploration goals. Classic robot exploration methods are usually based on pure geometric search-based or sample-based planners, and usually do not have any prior information regarding the environment it is about to explore. These geometric approaches often ignore the importance of other factors, such as semantic information. It also limits the possibility of defining the importance of different exploration tasks, making the overall method less flexible. In our work, we improved the classic geometric-based exploration with learning-based ingredients, making the robotic exploration more intelligent and flexible. Our method improves the frontier-based and other learning-based methods by 10-20% in real-world exploration tests. </p><p dir="ltr">Exploration and navigation is essential, but not the whole story of mobile robot finishing tasks specified by human. We then explore the possibility to combine mobile robot navigation and manipulation in a unified framework. It is essential to first select a suitable representation for mobile manipulation tasks. Giving the fact that navigation and manipulation need different scale of fineness of representation, making the representation have a coarse-to-fine structure is arguably a good choice. In our work, we present a scene-level neural feature field based on generalizable NeRF that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis 5 as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying this representation on a quadrupedal robot equipped with a manipulator. Our approach improves the real-world navigation and pick and place success rate about 30% comparing with baseline methods. </p><p dir="ltr">With the advancement of radiance field, 3D Gaussian Splatting (3DGS) is becoming more popular due to its fast training and high quality rendering capability. In our previous work, the feature field needs to be pre-trained nearly 5 days on a large scale dataset to achieve generalization ability. However, for Gaussian Splatting, we are able to build the se?mantic field within a few minutes based on the explicit 3D reconstruction. We aim to explore more possibly to make navigation and manipulation work more synergistically based on this representation. In the future, we aim to explore more possibilities of mobile robot exploration with pure visual cues, and finishing more dexterous tasks beyond mobile pick and place</p>