The convolutional neural networks (CNNs), such as U-Net, have shown competitive performance in automatic extraction of buildings from very high-resolution (VHR) remotely sensed imagery. However, due to the unstable multi-scale context aggregation, the insufficient combination of multi-level features, and the lack of consideration about semantic boundary, most existing CNNs produce incomplete segmentation for large-scale buildings and result in predictions with huge uncertainty at building boundaries. This paper presents a novel network embedded a special boundary-aware loss, called Boundary-aware Refined Network (BARNet), to address the gap above. The unique property of BARNet is the gated-attention refined fusion unit (GARFU), the denser atrous spatial pyramid pooling (DASPP) module, and the boundary-aware (BA) loss. The performance of BARNet is tested on two popular benchmark datasets that include various urban scenes and diverse patterns of buildings. Experimental results demonstrate that the proposed method outperforms several state-of-the-art (SOTA) benchmark approaches in both visual interpretation and quantitative evaluations.