ECE 5530 Homeworks
- N-Queens on an FPGA
- N-Quuens on an FPGA and Softcore
- Impulse C Hash Function
- Cell Matrix
- C-Like HDLs Response
- Configurable Device Comparison
N-Queens on an FPGA
I implemented my design in software, nqueens.c. The
method I used in software to check board validity (to check how many locations
passed) would become a problem in hardware. My software solution is also purposefly
primitive since I was trying to model what I could do in hardware.
My hardware implementation is in hw1_topmod.v, and is controlled
by nqueens_pl. (Rename this file locally to nqueens.pl to
run it; if I indicate it is a Perl script, this server tries to execute it when it is
viewed.) The implementation only works when N is a power of 2, and N = 16 takes too long
to execute on the DINI boards, so effectively it only works for N = 1, 2, 4 and 8. When N = 8,
it takes less than two seconds.
Originally, I wanted to implement the game board as N N-bit counters. I created the modules
counter.v and board_generator.v for
this purpose. The idea behind board_generator is that the overflow from counter N-1
feeds into the enable input for counter N. (Note that these modules are not parameterized to
N, but are hardcoded for 4.) I had difficulty testing counter and
board_generator, so I decided I was better off implementing the board validation logic
first. Implementing this logic took up the rest of my time. My N N-bit counters simply became
a register of N*log(N) bits.
My original design was to increment a counter for every valid comparison of board[i] and
board[j], and then compare this counter to an expected value. This works fine in software,
but the difficulty in hardware is that the loops are synthesized to combinational logic, and the
counter is not incremented in the manner I expected. Eventually I realized that it was simpler to
check if two board locations did not pass. Any invalid combination results in failure, so
I only need to record one failure. This approach translates better to hardware.
N-Queens on an FPGA and Softcore
I decided to offload board generation to the OpenFire. I did this for two reasons: board
checking can be done in parallel in hardware, and in software I can recycle my original C
implementation from HW1 that allows N to be numbers other than a power of 2.
The same script from HW1 (nqueens_pl) controls execution.
The C program which runs on the OpenFire is main.c. In order to communicate
the board to the hardware, I had to pack an array of integers into a 32-bit value. Since the size
of a board is N*log(N) bits, the largest size of N I can pack in 32-bits is 8. Hence, my solution
can only handle N=1...8. The only challenge on the software side was figuring out the exact
bit-twiddling algorithm that would extract log(N) bits from an integer and assign that to the ith
location of a 32-bit value.
The board checking logic in hw2_topmod.v remains unchanged from the original,
all hardware design. However, I had to change the control logic. The implementation is small, but it took
a significant amount of time for me to understand the concepts that required new logic. I had assumed that
when I read data from the FSL, that data would both be consumed with queue semantics and that the flag
to indicate data on the FSL would become false. As far as I can tell, this is not the case. The extra
logic ensures that I only check boards that I have not seen before. In the previous implementation, a board
could be checked every clock cycle. In this implementation, some clock cycles will pass where a new board
can't be checked because it has not been generated from the OpenFire yet.
This design could be extended to handle N > 8, but it would require a state machine on the hardware side
to handle multiple 32-bit FSL transfers.
Impulse C Hash Function
The Hash Function
I chose the DJB hash function, designed by Daniel J. Bernstein, and found on
Arash Partow's page. I implemented the simple
function in hash.c so that I would have a reference for the other implementations. In all
implementations I only compute a hash of the string "Hello World!".
Implementing the hash function in Impulse C was straight forward. I started from the HelloWorld example, as
it already did the work of streaming a string to a hardware filter, and reading the results of that filter
In order to go from the example to a hardware hash function that operated on a string, I only had to insert
the hash logic into HashImpulse_sw.c and change the output stream type to
an unsigned int in all files.
I also implemented the hash function using Verilog, in hash_topmod.v, and
controlled by hash_pl. This was
also a straight forward implementation. The only pitfalls I encountered were Verilog for loops not synthesizing
in the sequential manner my intuition expects (a lesson I had to relearn), and I had to reverse the endianness
of the register holding the string.
I implemented the Impulse C version much faster than the Verilog version. However, it actually took me longer
to understand how I was going to structure the Impulse C version - but once I understood the structure, I was
able to fall back on my understanding of sequential software. The basic implementation in Verilog was faster,
but it took me significantly longer to debug. Debugging Impulse C is significantly easier, since I can
inspect state at arbitrary points. However, I had to learn more concepts to understand how the Impulse C version
would be implemented. After having designed and debugged the Verilog code above, implementing a simple
hash function did not present a problem.
Such a simple task does not illuminate the advantages of using something like Impulse C. First of all, while
I did not take advantage of it, the Impulse C implementation could have easily been extended to work on an
arbitrary string as input. Achieving this in Verilog would have taken considerable more work (most of it
debugging). For this reason, the fact that my Verilog implementation is shorter than my Impulse C implementation
is misleading. Further, if I was to implement something more complex, I would prefer to do it in Impulse C.
Even something like an MD5 hash, which was designed to be easily implementable in hardware, would still be
easier to debug in Impulse C.
An exercise such as this is perhaps most valuable for someone who is already experienced with software, and
is adapting to hardware. Since the computation itself is simple, the different thinking required to accomplish
that computation in software, software acting as hardware and hardware alone is more apparent. Further, the
Impulse C implementation provides a conceptual mapping from software to hardware. It literaly uses software
constructs to express hardware implementation, which is useful to teach hardware design to those already
experienced with software.
I logged into the MOD 88 interactive site, ran through some
of the demos and experimented with the commands.
C-Like HDLs Response
I wrote a short response essay to the lecture describing various C-like HDLs titled
Why Configurable Computing Doesn't Have a General Language. I anticipated some of
the points Stephen Edwards raised in his paper,
The Challenges of Synthesizing Hardware
From C-Like Languages.
Configurable Device Comparsions
An overview of past and present configurable devices.