Refreshing Comments...

What is your background and what do you want out of it?

IMO a good way to start is by buying an FPGA development kit such as which has a bunch of I/O and cool things to do with.

The FPGA kit will come with some golden designs and instructions to download and install the EDA software. Feel comfortable with the FPGA software and make some leds blink..

Now you're in a position that you can download a hardware design to the FPGA and use the pins!

RTL Hardware Design Using VHDL: Coding for Efficiency, Portability, and Scalability is an amazing book for learning how to do RTL logic, buy the book, read, execute the examples!

Now you have a kit and you know RTL! Next you can learn the algorithms you want and implement them!

My workflow used to be something like:

1. Implemented an algorithm using fixed point math in Matlab (or Octave)

2. Develop the VHDL equivalent to (1), test with something like ModelSim

3. Generate the RTL with the EDA software and download to the board!

To test step (3) it really depends what your algorithm is.. if you can easily control your inputs and outputs through a computer connected to the FPGA then it is a good way to go about it! You might also want to develop a RTL module to connect to your design and generate the inputs / see the output, or buy some expensive hardware to verify that your design is working as intended.

Hopefully that is helpful.. it is a vast field and there are many algorithms, FPGA sizes and etc. A kit like the one I sent should be enough to go from 0 to a decent size design...

This is very useful, thanks!

And to answer your question on background and aims: I'm a machine learning engineer focusing on probabilistic graphical models (i.e. not deep learning) and would like to better understand how dedicated compute could (or could not) improve computational efficiency.

For ML use case because there's so much data to process it is usually a better ROI to use GPUs from what I have seen. If you're pipeline is anything like Load a lot of data -> process data -> get output. If it is more like a streaming with less data then it might work out fine.

Also, hardware design is very different than software design. If you don't have a digital hardware engineer in your team it might be tricky to get what you want out of the board.

The other comment about a kit is a good suggestion. FPGA cloud instances are also an option.

Register Transfer Level (RTL) development generally has less pleasant tooling and a much longer feedback cycle than software development. So expect some friction there.

Also, for the algorithms part, you might have something in software that isn’t a great match to hardware and will need some re-architecting. Sorting is a good example of this. Check out “sorting networks” for a popular way of accomplishing sorting in hardware.

I did a master’s thesis on transitioning a graph inference algorithm into an FPGA implementation. Send me an email if you have any specific questions.