This is an excerpt from my MSEE paper, defended at the Faculty of Electrical Enginering, University of Sarajevo, Bosnia-Herzegovina. The paper explores the application of FPGA programmable structures for Digital Image Processing (DIP). FPGA structures offer high flexibility in constructing custom parallel computation datapaths, making them super fast and often ideal choice for Computer Vision (CV) workloads.
The paper discusses fundamental DIP concepts, including Morphological Operations, Filtering, and Edge Detection. The study showcases practical implementations of algorithms in logic structures, with price-performance tradeoffs between:
- Time-Domain Multiplexed
- Function-Domain Multiplexed
- Pipelined
- and wide Parallel DSP.
In doing so, it demonstrates FPGA capacity to significantly boost image processing speeds. Through illustrative examples and empirical results, the paper highlights the distinct advantages of FPGA solutions, contrasting them to traditional CPU and GPU approaches.
This research then goes on to analyze the challenges that arise from configuration demands on FPGA programmable logic, such as:
- Resource Utilization
- Resource Optimization
- Memory Management
The techniques for achieving higher Frames Per Second (FPS) processing rates, both with and without Fmax changes, are outlined. At its summit, this work explores how an algorithm, initially designed for Static Images, both generated in real-time and retrieved from memory, can seamlessly be extended to the On-the-Fly Processing of Motion Pictures (aka Video).
Throughout this exploration, the study weaves-in the unique intricacies associated with underlying hardware, providing a holistic perspective on the potential for FPGA solutions in the field of digital image and video algorithms.
For a number of good reasons, my experimental system is made of 2 boards, both with Gowin FPGAs:
- Static Image examples use only the second board (TangNano20K), which drives the LCD screen
- Motion Picture examples also use the camera board (TangNano4K)
Such full, end2end datapath has the video jumping on-off FPGA four times, and streaming between two boards.
This is almost like Vector Graphics, naturally to the extent possible with a standard Bit-Map display.
- Image is generated in System Verilog RTL, on-the-fly / "chasing the ray"
The video below demonstrates real-time video transfer from camera to screen.
The 1st FPGA device is equipped with a DVP input. It takes-in video from an OV Camera SOC, and moves it to the 2nd FPGA device, via exernally exposed wires. The second FPGA device receives such streaming video data, passes it through Block RAM (BRAM) configured as an always-half-full FIFO, and renders it on an LCD. The final display operates at 8 FPS.
Both Camera-to-FPGA1 and FPGA1-to-FPGA2 data handoffs are implemented using Source-synchronous, Eye-centered interfacing method.
Real-Time.Image.Transmission.via.a.Two-Board.FPGA.Video.System.mp4
Camera keeps streaming data into FPGA1, and cannot be backpressured. That's the classic "Push" interface.
On the other hand, the Image/Video Processor takes data out of the Pixel Buffer, which is a "Pull" interface. To make it more-efficient in terms of LUT expense, the Image/Video processor takes advantage of blanking intervals, and tries to spread out useful work, at times consuming more than one cycle to process one pixel. This means that, in order not to drop data, the Pixel Buffer must provide a measure of elasticity. This is to absorb the resulting discrepancy between ingress and egress throughput, despite having the same clock rate and data width, both in and out. It does it by running Half Full most of the time, which yields +/- buffer depth of pixel elasticity.
The RGB LCD Backend does not constain the full Store-and-Forward Frame Buffer (FB) either. Instead, it provides the minimal amount of Cut-Through Pixel Buffering, essentialy just enough to smoothen out the occasional burstiness of video output from the Image/Video Processor.
Substantial memory saving is realized in this way compared to the standard video buffering methods.
The data transfer between Camera, two FPGA devices and Screen is governed by:
- Vertical and Horizontal Synchronization Pulses
- Pixel Clock signal
Always-half-full FIFO is the key element that ensures smooth data transfer. That's where the "Camera Push" meets the "Screen Pull", and pixel-level elasticity peacefully resolves the potential clash.
This brief overview is a mere teaser. I've put it together to instigate curiosity and open doors for the follow up, deeper conversations.
Be it related to:
- Video
- Audio
- General signal-processing
- or interfacing FPGA to sensors and photonics.
Be it for using:
- Parallel
- Serial: Standard, or High-speed
- Commodity LVDS, or Specialty CML I/O pads
- QSPI, or SerDes
- OV, RPi, or IMX Camera SOCs
I'm always a Challenge Seeker, Problem Analyst and Problem Solver at heart. Yet, I'm not a lone star, but a Team Player within an elaborate community ecosystem.
In that sense, this Master thesis work has leveraged from the following open-source prior art:
- https://github.com/StereoNinja/StereoNinjaFPGA
- https://github.com/AngeloJacobo/FPGA_OV7670_Camera_Interface
- https://github.com/gatecat/CSI2Rx
- https://github.com/westonb/OV7670-Verilog
- https://github.com/circuitvalley/USB_C_Industrial_Camera_FPGA_USB3
- https://github.com/chili-chips-ba/openXC7-TetriSaraj
Just like I have built on top of open-source community contributions, you are free to use my work as a starter, or incubator for your projects.
Then, jump one level up to the root of my repo, where you'll find more cool hardware/software designs you can take a look at, and build upon.