Especially the last section, one must consider critical. are SSE and other multimedia enhancements it for a long time, but the compiler does not produce optimal code for a long time for these extensions. Rather it is that any C code, the compiler often can not be optimally implemented for such extensions, because it has not simply been programmed. OpenCL or CUDA must ultimately be so addressed by the programmer that the code efficiency of the GPU to run.
If you think that with GPU functionality in the CPU any existing code is automatically accelerated by factors (recompilation required) is mistaken huge.
One can synonymous from a different point of view: there for years revamp attempts existing processor architecture and introduce new concepts - VLIW processors are considered to be an example - but in the end be all that potential can not be used because the compiler does not were able to produce appropriate code and similar problems will be here synonymous.