Intel is facing much stiffer competition on the mobile and low power end from Arm and in the high end HPC from GPU vendors. As a result of purchasing the Qlogic infiniband team and Cray fabric teams, Intel have launched Xeon Phi.
The Xeon Phi consists of 64 x86 cores (256 threads), each with a 512-bit vector unit. The vector unit can dispatch 8 double precision SIMD operations. The Xeon Phi runs at 2 GHz (more or less, probably more soon) and thus delivers (2 GHz x 64 cores x 8 FLOPs) 1 TFlops.
Although NVIDIA and AMD GPUs can deliver similar FLOPs, programming the Xeon Phi should be a lot easier to use than CUDA- or OpenCL. The same development tools as the regular Xeons are available: OpenMP, Intel's Threading Building Blocks, MPI, the Math Kernel Library (MKL)