Machine learning, particularly through advanced imaging techniques such as three-dimensional Magnetic Resonance Imaging (MRI), has significantly improved medical diagnostics. This is especially critical for diagnosing complex conditions like Alzheimer's disease. Our study introduces Triamese-ViT, an innovative Tri-structure of Vision Transformers (ViTs) that incorporates a built-in interpretability function, it has structure-aware explainability that allows for the identification and visualization of key features or regions contributing to the prediction, integrates information from three perspectives to enhance brain age estimation. This method not only increases accuracy but also improves interoperability with existing techniques. When evaluated, Triamese-ViT demonstrated superior performance and produced insightful attention maps. We applied these attention maps to the analysis of natural aging and the diagnosis of Autism Spectrum Disorder (ASD). The results aligned with those from occlusion analysis, identifying the Cingulum, Rolandic Operculum, Thalamus, and Vermis as important regions in normal aging, and highlighting the Thalamus and Caudate Nucleus as key regions for ASD diagnosis.