Much compiler-orientated work in the area of mapping parallel programs to parallel architectures has ignored the issue of external workload. Given that the majority of platforms will not be dedicated to just one task at a time, the impact of other jobs needs to be addressed. As mapping is highly dependent on the underlying machine, a technique that is easily portable across platforms is also desirable.
In this paper we develop an approach for predicting the optimal number of threads for a given data-parallel application in the presence of external workload. We achieve 93.7% of the maximum speedup available which gives an average speedup of 1.66 on 4 cores, a factor 1.24 times better than the OpenMP compiler's default policy. We also develop an alternative cooperative model that minimizes the impact on external workload while still giving an improved average speedup. Finally, we evaluate our approach on a separate 8-core machine giving an average 1.33 times speedup over the default policy showing the portability of our approach.