Skip to content

Latest commit

 

History

History
254 lines (214 loc) · 10.7 KB

File metadata and controls

254 lines (214 loc) · 10.7 KB

1. Background

FPGA(Field-programmable Gate Array) and ASIC(Application Specific Integrated Circuit) are both significant areas for hardware design. ASICs are custom-designed for specific applications, offering high performance and efficiency, while FPGAs are reprogrammable devices that provide flexibility and rapid prototyping capabilities. The choice between them depends on various factors, such as performance requirements, power consumption, cost, design flexibility, and time-to-market. Actually, there are overlapping areas between these two design processes. For example, many designers choose to build a demo on FPGA to check the specifications and present the design better.

Because the cost of one tapeout for ASIC design is expensive, it is necessary to do lots of preparations to ensure the design to meet design specifications, such as functionality, timing, frequency, power efficiency and so on. The functionality can be checked by behavior simulation in extensive cases. It utilizes testbench with various cases and outputs relevant signal waveforms. However, simulation is an ideal environment and ignores the delay and power information in the real situations. Therefore, only simulation is not enough for pre-silicon verification. FPGA emulation provides an extensive check for a design. The emulation allows designers to debug their designs in simulated but realistic conditions before the time-consuming IC backend process and expensive manufacture process.

2. The constructions of rtl codes

After the specifications and algorithms are confirmed, the next step is to write the rtl code for the design. A classic way to construct rtl codes is to use the hardware discription langueage(HDL). Common HDLs includes Verilog, SystemVerilog and VHDL. There are other ways to create RTL designs such as the Vitis High-Level Synthesis tool. This tool enables C++ designs transformed into RTL.

3. The functionality simulation for the design

After the rtl code is constructed, the simulation of the design should be done. There are many simulation tools, such as VCS, Modelsim, and iverilog helping the functionality check and debug. The behavior simulation outputs waveform files for debugging and functionality verification. The results of the rtl design should be consistent with those generated by cmodel and theoretical values from the algorithms.

4. The implementation on an FPGA

The design module is connected to host by PCI. The host generates simulation data and transmits the data to design module by pcie. There are several key files:

  • rtl: the directory to include the RTL design and PCIe files.
  • fpga_top.v: The top of the rtl module. It instantiates the module of our own rtl design and other modules(pcie ...).
  • Makefile: contain commands to compile and implement designs by Vivado.

Compilation Flow

  1. settng the xilinx environment
    source /tools/Xilinx/Vivado/2023.1/settings64.sh

  2. Compile vivado -nojournal -nolog -mode batch -source tcl/genbit.tcl

### genbit.tcl
create_project -force fpga_top  -part xcu250-figd2104-2-e
source ./tcl/pcie4_uscale_plus_0.tcl
source ./tcl/clock.tcl
source ./tcl/ila.tcl
# import_files hls_top/solution1/syn/verilog -norecurse -flat
import_files rtl -norecurse -flat
set_property top fpga_top [current_fileset]
add_files -fileset constrs_1 -norecurse fpga_u250.xdc
set_property strategy Flow_AlternateRoutability [get_runs synth_1]
set_property STEPS.POST_ROUTE_PHYS_OPT_DESIGN.ARGS.DIRECTIVE AggressiveExplore [get_runs impl_1]
set_property strategy Performance_ExtraTimingOpt [get_runs impl_1]
reset_run synth_1
launch_runs synth_1 -jobs 64
wait_on_run synth_1
launch_runs impl_1 -jobs 64
wait_on_run impl_1
launch_runs impl_1 -to_step write_bitstream -jobs 64
wait_on_run impl_1
  • This step usually takes about 1 hour runnning.
  • Generated files are in ./fpga_top.runs/impl_1/. There are two key files: fpga_top.bit and fpga_top.ltx.
    fpga_top.bit: bitstream file programed to fpga.
    fpga_top.ltx: probe related file.

Onboard Flow

  1. Program to FPGA vivado -nojournal -nolog -mode batch -source tcl/flash.tcl
### flash.tcl
open_hw
connect_hw_server
open_hw_target
current_hw_device [lindex [get_hw_devices] 0]
refresh_hw_device -update_hw_probes false [current_hw_device]
set_property PROGRAM.FILE {fpga_top.runs/impl_1/fpga_top.bit} [current_hw_device]
program_hw_devices [current_hw_device]
exit
  1. Reboot:to apply PCIE change
  2. Check the results on FPGA.
    The record for the checking can seen below.

5. Checking for the results generated from an FPGA

There are two methods to check the results:

  • Check ILA(Integrated Logic Analyzer) wave.
  • Transform simaulation data to host and auto check by the software.
  1. In rtl/core_fpga.v, add a probe module.

    wire [7:0] project_uio_out;
     ila_tiny_tapeout ila_tiny_tapeout_inst(
         .clk(clk),
         .probe0(project_ena),
         .probe1(project_ui_in), // 8
         .probe2(project_uio_in), // 8
         .probe3(project_uo_out), // 8
         .probe4(project_uio_out), // 8
         .probe5(project_uio_oe) // 8
     );
    

    Note: The module is off in simulation while on in onboard.

  2. In tcl/ila.tcl, add the settings for the ila of our design

    create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name    ila_tiny_tapeout
    set_property -dict [list \
      CONFIG.C_DATA_DEPTH {1024} \
      CONFIG.C_NUM_OF_PROBES {6} \
      CONFIG.C_PROBE1_WIDTH {8} \
      CONFIG.C_PROBE2_WIDTH {8} \
      CONFIG.C_PROBE3_WIDTH {8} \
      CONFIG.C_PROBE4_WIDTH {8} \
      CONFIG.C_PROBE5_WIDTH {8} \
    ] [get_ips ila_tiny_tapeout]
    
    • The name after -module_name and get_ips should be same and as the probe_module, here is ila_tiny_tapeout.
    • C_DATA_DEPTH: The number of probe data.
    • C_NUM_OF_PROBES: The number of probes. For 4-bit CLA, there is 6 port related to data. Therefore here is 6.
    • C_PROBE1_WIDTH: The data width for this probe.
  3. Bound the input and output port to pcie.
    The settings are located in rtl/core_config.v.

    • host -> design
      always @(posedge clk) begin
          if(rst) begin
              m_sw_reset_reg <= 0;
          end else if(axil_state_reg == WCONFIG) begin
              case(awaddr_reg)
              32'd0: begin
                  project_ena <= wdata_reg[0:0];
              end
              32'd4: begin
                  project_ui_in <= wdata_reg[7:0];
              end
              32'd8: begin
                  project_uio_in <= wdata_reg[7:0];
              end
              endcase
          end 
      end
    • design -> host
      always @(posedge clk) begin
          if(rst) begin
              // m_sw_reset_reg <= 0;
              rdata_reg <= 0;
              s_axil_rvalid_reg <= 0;
          end else if(axil_state_reg == RCONFIG) begin
              s_axil_rvalid_reg <= 1;
              case(s_axil_araddr - pf0bar0_offset_reg)
              32'd0: begin
                  rdata_reg[0:0] <= project_ena;
              end
              32'd4: begin
                  rdata_reg[7:0] <= project_ui_in;
              end
              32'd8: begin
                  rdata_reg[7:0] <= project_uio_in;
              end
              32'd12: begin
                  rdata_reg[7:0] <= project_uo_out;
              end
              32'd16: begin
                  rdata_reg[7:0] <= project_uio_out;
              end
              32'd20: begin
                  rdata_reg[7:0] <= project_uio_oe;
              end
              endcase
          end else begin
              s_axil_rvalid_reg <= 0;
          end
      end
  4. Provide the stimulus data for the probe.
    Provide the stimulus data in host/host.cpp,

    #include <sys/mman.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <iostream>
    
    // #include "nvme.h"
    
    #define BAR0_ADDRESS 0x96c80000
    // #define BAR2_ADDRESS 0xc0000000
    // #define BAR3_ADDRESS 0xa0000000
    // #define BAR4_ADDRESS 0x28ffe0000000
    
    // #define NVME_BAR 0x96400000
    
    #define BDF(b,d,f) (((b & 0xff) << 8) | ((d & 0x1f) << 3) | (f & 0x7))
    
    int main(int argc, char ** argv){
    
        int fd_mem = open("/dev/mem", O_RDWR);
    
        if(fd_mem == -1){
            std::cout << "/dev/mem fail\n";
        }
    
        int delay = 100;
    
        int* bar0 = (int*) mmap(0, sysconf(_SC_PAGESIZE) * 1024, PROT_READ |    PROT_WRITE, MAP_SHARED, fd_mem, BAR0_ADDRESS);
    
        ((int*) bar0 + 0x00)[0] = 0;usleep(delay);
    
        for(int i=0; i<10; i++){
            bar0[1] = i;
            for(int j=0; j<2; j++){
                bar0[2] = j;
                int rtl_result = bar0[3];
                int golden_result = (i & 0xf) + ((i & 0xf0) >> 4) + j;
                if(rtl_result != golden_result)
                    printf("calculation error: i = %d, j = %d, rtl_result = %d,     golden_result = %d\n", i, j, rtl_result, golden_result);
    
            }
        }
    
        ((int*) bar0 + 0x14)[0] = 0;usleep(delay);
    
        ((int*) bar0 + 0x1c)[0] = 0;usleep(delay);
        ((int*) bar0 + 0x20)[0] = 0;usleep(delay);
    
        ((int*) bar0 + 0x28)[0] = 1024*4;usleep(delay);
        ((int*) bar0 + 0x2c)[0] = 0;usleep(delay);
    
        ((int*) bar0 + 0x34)[0] = 0;usleep(delay);
        ((int*) bar0 + 0x38)[0] = 0;usleep(delay);
    
        ((int*) bar0 + 0x40)[0] = 1;usleep(delay);
    
        ((int*) bar0 + 0x48)[0] = 1;usleep(delay);
    
        ((int*) bar0 + 0x50)[0] = 0;usleep(delay);
    
        ((int*) bar0 + 0x58)[0] = 0;usleep(delay);
    
        munmap(bar0, sysconf(_SC_PAGESIZE) * 1024);
    
        close(fd_mem);
        return 0;
    }
    
    • BAR0_ADDRESS: This is a vittual address. It can be checked on the physics server by lspci -vvs ca:. There is no other
    • int fd_mem = open("/dev/mem", O_RDWR);
      This is a sentence to build connection between board and cpu.
    • for loop: provide stimulus and check the calculation by software.
  5. Compile and check results.

    1. Host checking: Designed in the host.cpp. A common way is to print error message.
    2. ILA checking
      In vivado, use hardware manager to check the result of probe. The way to open hardware managercan refer to:
      https://docs.amd.com/r/en-US/ug908-vivado-programming-debugging/Opening-the-Hardware-Manager?tocId=p5YI1PlPTCyR5NaFJLPQ4g
    • If you have a project open, click the Open Hardware Manager button in the Program and Debug section of the Vivado flow navigator.
    • Select Flow > Open Hardware Manager.
    • In the Tcl Console window, run the open_hw_manager command.