Corpus linguistics, with its methodological orientation towards the empirical analysis of language based on large text collections, has the potential to offer significant tools for addressing real-world problems across various social science domains, including climate change, criminology, healthcare and policy making. Despite this potential, the integration of corpus linguistics into social science disciplines (beyond linguistics) remains hampered by fundamental differences in epistemology, definitions and methodological approaches. This article explores the relationship between corpus linguistics and the social sciences. It is argued that epistemology, or the theory of knowledge, represents a primary barrier to integration, with much corpus linguistics research aligning with positivist and naturalist epistemologies. By contrast, many social science disciplines embrace more interpretive, conventionalist approaches that account for the dynamic nature of social phenomena. Considering the role of naturalism and conventionalism within both corpus linguistics and the social sciences, this article illustrates how these epistemological stances are likely to influence the acceptance and use of corpus methods in social science research. Despite the challenges, areas of convergence (e.g. shared use of data processing tools and the acknowledgement of the central role of language in social processes) provide opportunities for cross-disciplinary collaboration. As means to bridge the epistemological divide, this article advocates for a critical realist approach and concludes by calling on users of corpus linguistic methods to be reflexive and transparent about their epistemological stances when reporting their research.