ECE 5530 Homeworks

  1. N-Queens on an FPGA
  2. N-Quuens on an FPGA and Softcore
  3. Impulse C Hash Function
  4. Cell Matrix
  5. C-Like HDLs Response
  6. Configurable Device Comparison

N-Queens on an FPGA

Software

I implemented my design in software, nqueens.c. The method I used in software to check board validity (to check how many locations passed) would become a problem in hardware. My software solution is also purposefly primitive since I was trying to model what I could do in hardware.

FPGA Implementation

My hardware implementation is in hw1_topmod.v, and is controlled by nqueens_pl. (Rename this file locally to nqueens.pl to run it; if I indicate it is a Perl script, this server tries to execute it when it is viewed.) The implementation only works when N is a power of 2, and N = 16 takes too long to execute on the DINI boards, so effectively it only works for N = 1, 2, 4 and 8. When N = 8, it takes less than two seconds.

Originally, I wanted to implement the game board as N N-bit counters. I created the modules counter.v and board_generator.v for this purpose. The idea behind board_generator is that the overflow from counter N-1 feeds into the enable input for counter N. (Note that these modules are not parameterized to N, but are hardcoded for 4.) I had difficulty testing counter and board_generator, so I decided I was better off implementing the board validation logic first. Implementing this logic took up the rest of my time. My N N-bit counters simply became a register of N*log(N) bits.

My original design was to increment a counter for every valid comparison of board[i] and board[j], and then compare this counter to an expected value. This works fine in software, but the difficulty in hardware is that the loops are synthesized to combinational logic, and the counter is not incremented in the manner I expected. Eventually I realized that it was simpler to check if two board locations did not pass. Any invalid combination results in failure, so I only need to record one failure. This approach translates better to hardware.


N-Queens on an FPGA and Softcore

Design

I decided to offload board generation to the OpenFire. I did this for two reasons: board checking can be done in parallel in hardware, and in software I can recycle my original C implementation from HW1 that allows N to be numbers other than a power of 2.

The same script from HW1 (nqueens_pl) controls execution.

Software

The C program which runs on the OpenFire is main.c. In order to communicate the board to the hardware, I had to pack an array of integers into a 32-bit value. Since the size of a board is N*log(N) bits, the largest size of N I can pack in 32-bits is 8. Hence, my solution can only handle N=1...8. The only challenge on the software side was figuring out the exact bit-twiddling algorithm that would extract log(N) bits from an integer and assign that to the ith location of a 32-bit value.

Hardware

The board checking logic in hw2_topmod.v remains unchanged from the original, all hardware design. However, I had to change the control logic. The implementation is small, but it took a significant amount of time for me to understand the concepts that required new logic. I had assumed that when I read data from the FSL, that data would both be consumed with queue semantics and that the flag to indicate data on the FSL would become false. As far as I can tell, this is not the case. The extra logic ensures that I only check boards that I have not seen before. In the previous implementation, a board could be checked every clock cycle. In this implementation, some clock cycles will pass where a new board can't be checked because it has not been generated from the OpenFire yet.

This design could be extended to handle N > 8, but it would require a state machine on the hardware side to handle multiple 32-bit FSL transfers.


Impulse C Hash Function

The Hash Function

I chose the DJB hash function, designed by Daniel J. Bernstein, and found on Arash Partow's page. I implemented the simple function in hash.c so that I would have a reference for the other implementations. In all implementations I only compute a hash of the string "Hello World!".

Impulse C

Implementing the hash function in Impulse C was straight forward. I started from the HelloWorld example, as it already did the work of streaming a string to a hardware filter, and reading the results of that filter from software.

In order to go from the example to a hardware hash function that operated on a string, I only had to insert the hash logic into HashImpulse_sw.c and change the output stream type to an unsigned int in all files.

Verilog

I also implemented the hash function using Verilog, in hash_topmod.v, and controlled by hash_pl. This was also a straight forward implementation. The only pitfalls I encountered were Verilog for loops not synthesizing in the sequential manner my intuition expects (a lesson I had to relearn), and I had to reverse the endianness of the register holding the string.

Comparison

I implemented the Impulse C version much faster than the Verilog version. However, it actually took me longer to understand how I was going to structure the Impulse C version - but once I understood the structure, I was able to fall back on my understanding of sequential software. The basic implementation in Verilog was faster, but it took me significantly longer to debug. Debugging Impulse C is significantly easier, since I can inspect state at arbitrary points. However, I had to learn more concepts to understand how the Impulse C version would be implemented. After having designed and debugged the Verilog code above, implementing a simple hash function did not present a problem.

Such a simple task does not illuminate the advantages of using something like Impulse C. First of all, while I did not take advantage of it, the Impulse C implementation could have easily been extended to work on an arbitrary string as input. Achieving this in Verilog would have taken considerable more work (most of it debugging). For this reason, the fact that my Verilog implementation is shorter than my Impulse C implementation is misleading. Further, if I was to implement something more complex, I would prefer to do it in Impulse C. Even something like an MD5 hash, which was designed to be easily implementable in hardware, would still be easier to debug in Impulse C.

An exercise such as this is perhaps most valuable for someone who is already experienced with software, and is adapting to hardware. Since the computation itself is simple, the different thinking required to accomplish that computation in software, software acting as hardware and hardware alone is more apparent. Further, the Impulse C implementation provides a conceptual mapping from software to hardware. It literaly uses software constructs to express hardware implementation, which is useful to teach hardware design to those already experienced with software.


Cell Matrix

I logged into the MOD 88 interactive site, ran through some of the demos and experimented with the commands.


C-Like HDLs Response

I wrote a short response essay to the lecture describing various C-like HDLs titled Why Configurable Computing Doesn't Have a General Language. I anticipated some of the points Stephen Edwards raised in his paper, The Challenges of Synthesizing Hardware From C-Like Languages.


Configurable Device Comparsions

An overview of past and present configurable devices.