Re: Not an expert on AI
In certain application domains (signal processing is my area of interest), never count on an FPGA being quicker or more efficient than, say, a CPU. The PowerPC 7400, the first one with Altivec, could do a floating point 1k complex FFT quicker than whatever Xilinx were selling at the time (Virtex 2?) could do a fixed point equivalent.
Also, depending on the exact nature of the algorithm being implemented, other factors such as memory bandwidth plays a big role. If you're algorithm needs to chew through several 10s of megabytes at a time, there's a good chance that a CPU will be quicker. Modern Intel / AMD CPUs have fantastic memory systems, and to match that in an FPGA is effectively impossible. FPGAs are OK ish, so long as the data to be processed fits inside it.
In my experience (mainly signal processing), FPGAs are worthwhile only in the circumstances were one knows exactly what the algorithm needs to be. In FPGA development there's a ton of things that gets in the way of progress, it's just so slow to do things such as place & route. If one is developing an algorithm, it's almost certainly better to not use an FPGA.
I've noticed another trend recently. I've done systems that started of as pure CPU, and then as time passed and parts of the algorithm became settled it was worthwhile getting an FPGA involved. However, CPUs have made such tremendous strides over the past 10 years there's no point involving an FPGA any more, simply because the CPUs now have so much performance. There's no point using an FPGA for the hell of it, it's just a wasted chip. That is an application domain specific observation, but it's interesting to see other fields beginning to think along similar lines.
And let's not forget ARMs. There's some really, really good ARM SOCs nowadays that have pretty good compute performance at surprisingly low power consumptions. For a lot of signal processing applications there's plenty of compute in an ARM. Why go to the effort of gluing down an FPGA when a cheap ARM can do the same job, and is far easier to develop for, and gets you a Linux environment?
I know that a lot of FPGAs have ARM cores inside, but they're there to make use of the FPGA logic, not to have any decent math performance of their own. If an ARM SOC has a decent enough SIMD unit, why bother with the FPGA part?