Skip to content

Commit

Permalink
Finish documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
MuellerDominik committed Jan 20, 2020
1 parent 71f424f commit 0b32759
Show file tree
Hide file tree
Showing 9 changed files with 121 additions and 94 deletions.
Binary file modified doc/report/graphics/statistics.pdf
Binary file not shown.
3 changes: 1 addition & 2 deletions doc/report/sections/ai/cnn.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ \subsection{Convolutional Neural Network}
\clearpage

\paragraph{Pooling Layer}
The pooling layers reduce the spatial size of the convolved features.
Pooling only looks at the portion of the image that is covered by the kernel.
Pooling layers reduce the spatial size of the convolved features.
Pooling is done by only looking at the portion of the image masked by the kernel.
Maximum pooling yields the maximum value and average pooling yields the average value of the masked portion.
By doing so, the required computational power is significantly decreases and the dominant features are extracted \cite{cnn_tds}.
Expand Down
14 changes: 7 additions & 7 deletions doc/report/sections/board.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,22 @@ \section{MPSoC Development Board}
\label{sec:board}

The hardware is an Ultra96-V2 board, which is distributed by Avnet.
The MPSoC installed on it is a Xilinx Zynq UltraScale+ MPSoC ZU3EG A484.
The MPSoC installed on it is a Xilinx Zynq UltraScale+ MPSoC ZU3EG.
This chip is from the Zynq UltraScale+ family and has an UltraScale architecture.
Integrated is a Quad-core ARM Cortex A53 processor to run a complete operating system and a Dual-core ARM Cortex R5, which makes the Ultra96-V2 hard real-time capable.
Integrated is a quad-core ARM Cortex A53 processor to run a complete operating system and a dual-core ARM Cortex R5, which makes the Ultra96-V2 hard real-time capable.
The A53 core can be clocked with \SI{1.5}{GHz}, the R5 with \SI{600}{MHz}.

The FPGA on the MPSoC allows a hardware acceleration of up to a factor of 20 compared to the fastest CPUs \cite{acceleration_xilinx}.
Xilinx therefore recommends the board as ideal for the field of high speed artificial intelligence \cite{ai_resources_xilinx}.
The most important specifications of the MPSoC are listed in table \ref{tab:specs_MPSoC}.
Xilinx therefore recommends the board as ideal for the field of high-speed artificial intelligence \cite{ai_resources_xilinx}.
The most important specifications of the MPSoC are listed in the following table \ref{tab:specs_MPSoC}.

\begin{table}[h]
\caption{Specification of the Xilinx Zynq UltraScale+ MPSoC ZU3EG A484 \cite{xilinx_zynq}}
\caption{Xilinx Zynq UltraScale+ MPSoC ZU3EG key features \cite{xilinx_zynq}}
\label{tab:specs_MPSoC}
\centering
\begin{tabular}{ll}
\toprule
& \textbf{MPSoC} \\
& \textbf{ZU3EG} \\
\midrule
\textbf{Logic Cells} & \SI{154}{k} \\
\textbf{Flip Flops} & \SI{141}{k} \\
Expand All @@ -27,6 +27,6 @@ \section{MPSoC Development Board}
\end{tabular}
\end{table}

The Ultra96-V2 also features two USB 3.0 ports and a \SI{2}{GB} low-power double data rate 4 (LPDDR4) RAM, which are essential for fast image processing.
The Ultra96-V2 also features two USB 3.0 ports and \SI{2}{GB} low-power double data rate 4 (LPDDR4) RAM, which are essential for fast image processing.
A mini DisplayPort serves as a connection to a monitor \cite{avnet_ultra96v2}.
This guarantees a standalone operation.
46 changes: 25 additions & 21 deletions doc/report/sections/camera.tex
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
\section{Camera}
\label{sec:camera}

This chapter documents the camera and lighting requirements.

\subsection{Webcam}
\label{subsec:webcam}
A simple webcam is not enough to capture fast moving objects with sufficient quality.
Expand All @@ -10,19 +13,18 @@ \subsection{Webcam}
Technically this means that fewer sensors are used. Therefore, one sensor processes several pixels one after the other \cite{shuttermode}.

This results in the rolling shutter effect when the image changes.
On the one hand this effect is noticeable when the light flickers quickly.
On the one hand, this effect is noticeable when the light flickers quickly.
For example the upper part of an image can be brighter than the lower part.
On the other hand there are problems with moving pictures.
On the other hand, there are problems with moving pictures.
The lower part of the image was taken at a later time, so the object appears distorted \cite{global_rolling_shutter}.

To eliminate this problem, some cameras feature the global shutter.
This takes all pixels at the same time and stores them.
Thus, no distortions in the image occur.
To eliminate this problem, some cameras feature a global shutter.
This captures all pixels at the same time and stores them.
Thus, no distortions in the image occurs.

This can be shown well with an example.
In figure \ref{subfig:rollingshutter} is a recording of a ball with rolling shutter.
Figure \ref{subfig:globalshutter} shows the same object, taken with a global shutter camera.
The distortion from the rolling shutter is clearly visible and deforms the effective object.
Figure \ref{subfig:rollingshutter} shows the recording of a ball with a rolling shutter and figure \ref{subfig:globalshutter} shows the same object, captured with a global shutter camera.
The distortion from the rolling shutter is clearly visible and deforms the object.

\begin{figure}[ht]
\centering
Expand All @@ -47,41 +49,42 @@ \subsection{Baumer Industrial Camera}

In addition to the global shutter requirement, the camera must also meet other specifications.

To ensure that an image is always received even from fast objects, the maximum duration between two images is
%To ensure that an image is always received even from fast objects, the maximum duration between two images is
To ensure that at least one image of a fast moving object is captured, the maximum duration between two images is

\begin{equation}
T_\text{max} = \frac{f_\text{w} \cdot 0.75}{v_\text{max}} = \frac{\SI{0.69}{m} \cdot 0.75}{\SI{30}{\frac{m}{s}}} = \SI{17.25}{ms}.
T_\text{max} = \frac{f_\text{w} \cdot 0.75}{v_\text{max}} = \frac{\SI{0.69}{m} \cdot 0.75}{\SI[fraction=sfrac]{30}{\metre\per\second}} = \SI{17.25}{ms}.
\label{eq:needed_T}
\end{equation}

It is assumed that the object is thrown in the last quarter of the booth and the maximum throwing speed is \SI[fraction=sfrac]{30}{\metre\per\second}.
This speed is approximately half the speed of the fastest baseball throw and is more or less attainable for an adult with the objects present \cite{speed_baseball}.

From equation \ref{eq:needed_T} it can be seen that the camera needs a frame rate which is higher than
Equation \ref{eq:needed_T} shows that the camera needs a frame rate which is higher than

\begin{equation}
f_\text{min} = \frac{1}{T_\text{max}} = \frac{1}{\SI{17.25}{ms}} = \SI{58}{fps}.
\label{eq:needed_fps}
\end{equation}

\clearpage
In order to capture the contours of the objects being thrown as detailed as possible, the motion blur must be kept to a minimum.
This can be achieved by short exposure time.
With the assumption that the motion blur should be smaller than or equal \SI{3}{cm}, there results a maximum exposure time of the camera of
This can be achieved by short exposure times.
With the assumption that the motion blur should be smaller than or equal to \SI{3}{cm}, the maximum exposure time of the camera is

\begin{equation}
t_\text{exp} = \frac{s_\text{max}}{v_\text{max}} = \frac{\SI{0.03}{m}}{\SI{30}{\frac{m}{s}}} = \SI{1}{ms}.
\label{eq:texp}
\end{equation}

In this project a VCXU-13C camera from Baumer is used.
In this project a VCXU-13C industrial camera from Baumer is used.
The Baumer company produces various sensors, such as CMOS sensors with different specifications.
The VCXU-13C has global shutter.
The VCXU-13C features a global shutter.
Furthermore, it has a USB 3.0 interface for data transfer.
This is required because the Ultra96-V2 does not support an Ethernet interface.
The camera's frame rate is \SI{222}{fps}, which guarantees at least three pictures of each throw.
The camera's maximum frame rate is \SI{222}{fps}, which guarantees at least three pictures of each throw.
The minimum exposure time is \SI{20}{\micro s} \cite{baumer_cam}.

The camera requires a lens as well.
The industrial camera requires a lens for proper operation.
Suitable for the camera is the lens ZVL-FL-HC0614-2M, which is also manufactured by Baumer.
The aperture is manually operated to focus the images.

Expand All @@ -90,17 +93,18 @@ \subsection{Diffuse Lighting}
The more light there is, the shorter the exposure time can be selected.
This results in a clearer image, which is an advantage in image recognition.
In order for the objects to be as independent of the lighting as possible, the lighting must not flicker and the field of view should be illuminated as uniformly as possible.
Therefore, diffuse lighting is required.

The SVL BAR LIGHT LHF300-WHI from Stemmer Imaging meets these requirements.
A completely uniformly illuminated image cannot be achieved with a single LED bar.
In the data sheet Stemmer specifies the illumination of a \SI{1}{m} $\times$ \SI{1}{m} image section according to graphic \ref{fig:lighting_LEDBAR}.
However, a completely uniformly illuminated image cannot be achieved with a single LED bar.
In the data sheet, Stemmer specifies the illumination of a 1 $\times$ \SI{1}{m} image section according to graphic \ref{fig:lighting_LEDBAR}.
The distance from the camera to the measuring surface is \SI{0.5}{m}.
The green frame corresponds to the field of view.
The illumination is therefore very satisfying.

\begin{figure}[h]
\centering
\includegraphics[width=0.3\textwidth]{graphics/brightness_level.pdf}
\includegraphics[width=0.5\textwidth]{graphics/brightness_level.pdf}
\caption{Brightness Distribution according to Stemmer \cite{stemmer_datasheet}}
\label{fig:lighting_LEDBAR}
\end{figure}
2 changes: 1 addition & 1 deletion doc/report/sections/dataset.tex
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ \section{Dataset}
\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{statistics}
\caption{Statistics}
\caption{Statistics showing the amount of captured frames for each object individually}
\label{fig:statistics}
\end{figure}

Expand Down
46 changes: 29 additions & 17 deletions doc/report/sections/experiences.tex
Original file line number Diff line number Diff line change
@@ -1,25 +1,37 @@
\section{Experiences}
\label{sec:experiences}

The FHNW still has very little experience in the field of artificial intelligence on an FPGA.
Therefore, the first part of the project was only about collecting information.
Due to the fact that Xilinx hardware was given research could be focused on this product.
Aside from the hardware definition the field of AI was still a very big world, fairly unknown to us.
With the help of YouTube, other internet research and books we slowly worked our way into the unknown zone.
With time the topic became more accessible to us, but there were still questions to which we have no right answers yet.
For example how good must the picture quality be?
The FHNW still has very little experience in the field of artificial intelligence on FPGAs.
Therefore, the first part of the project consisted only of gathering information.
Due to the fact that Xilinx hardware was provided, research could be focused on this product.
%Aside from the hardware definition the field of AI was still a very big world, fairly unknown to us.
Apart from the hardware definition, the field of AI was still quite unknown to us.
With the help of YouTube videos, other internet resources and books we slowly worked our way into the unknown zone.
Over time, the topic became more accessible to us, but there are still questions to which we do not yet have the right answers.
For example, how good should the picture quality be?
May pictures be used if the object is only partially in the picture?
How many pictures per object are enough?
Questions which are normally answered by experience.
From our lack of experience we have marked cut-off images in the database, kept the quality as high as possible and informed ourselves about the sizes of other data sets on the Internet.
How many pictures of an individual object are enough?
These questions are usually answered by experience.
%From our lack of experience we have marked cut-off images in the database, kept the quality as high as possible and informed ourselves about the sizes of other datasets on the Internet.
Due to our lack of experience we marked images with only partially visible objects in the database, kept the quality as high as possible and informed ourselves about the size of other datasets on the internet.

We also learned the hard way when choosing a camera with too little experience.
The plug and play version of a webcam sounded attractive, but turned out to be useless for moving objects.
%Choosing a suitable camera without much experience is not trivial.
Choosing a suitable camera without a lot of experience is not trivial.
We had to learn this the hard way.
%We also learned the hard way when choosing a camera with too little experience.
%We also had to learn the hard way when choosing the camera.
%The plug-and-play version of a webcam sounded attractive, but turned out to be useless for moving objects.
The plug-and-play version of a webcam sounded attractive, but proved to be useless for moving objects.

The Python scripts turned out to be a reasonable investment.
If we had started and stopped all throws with the Baumer application, countless empty images would have to be removed before and after the throw.
Thanks to the real-time image comparison we saved this work.
The additional effort for configuring the camera via script also benefits us in project 6.
%If we had manually started and stopped the camera for each throw, with the Baumer application, countless empty images would have to be removed before and after the throw.
If we had manually started and stopped the camera for each throw, countless empty images before and after the throw would have had to be removed.
Thanks to the real-time throw detection mechanism, we have saved ourselves this work.
%The additional effort for configuring the camera via script also benefits us in project 6.
%The additional effort for configuring the camera via script will also benefits us during our thesis.
%The additional effort for the configuration of the camera by a program will also benefit us during our diploma thesis.
The additional effort of configuring the camera with a program will also benefit us during our thesis.

At the end of this project, the results can be seen.
We are in possession of a litter stand, a data set with more than \num{15000} images, as well as knowledge about AI.
At the end of this project, the results are very pleasing.
%We are in possession of a throwing booth, a dataset with more than \num{15000} images, as well as knowledge about AI.
We are in possession of a throwing booth, a dataset with more than \num{15000} labeled images and newly gained knowledge about AI.
24 changes: 14 additions & 10 deletions doc/report/sections/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,21 @@ \section{Introduction}

Another challenge in today's world is the increasing data volume. More and more data should be processed in the shortest possible time. This makes fast hardware indispensable.

Project 5, AI High-Performance Solution, covers two topics, artificial intelligence and fast data processing.
A demo object for trade fairs is to be developed in which artificial intelligence is implemented on a field-programmable gate array (FPGA).
The demo object will be shown at exhibitions and represents the Fachhochschule Nordwestschweiz (FHNW).
It is a throwing booth, which is able to detect and recognize flying objects working standalone.
Project 5, AI High-Performance Solution on FPGA, covers the two topics artificial intelligence and fast data processing.
An eye-catcher for trade fairs is to be developed, where artificial intelligence is implemented on a field-programmable gate array (FPGA).
The eye-catcher is a throwing booth, which is able to detect throws and recognize certain objects in standalone operation.
It will be shown at exhibitions and represents the \textit{Fachhochschule Nordwestschweiz} (FHNW).

An Ultra96-V2 Development Board serves as hardware.
Its multiprocessor system-on-chip (MPSoC) used there, an UltraScale+ MPSoC ZU3EG, has arm-based microprocessors and an FPGA.
The used multiprocessor system-on-chip (MPSoC) --- an UltraScale+ MPSoC ZU3EG --- features an ARM-based microprocessors and an FPGA.

The first step is to create the basic conditions for designing an AI.
This is, on the one hand, the throwing booth and, on the other hand, a dataset. A dataset is needed to train a neural network.
In addition, project 5 includes an introduction to the field of artificial intelligence.
In project 6, based on project 5, artificial intelligence will be implemented on the Ultra96 board.
%The first step is to create the basic conditions for designing an AI.
The first step is to lay the foundations for designing an AI.
%This is, on the one hand, the throwing booth and, on the other hand, a dataset.
Those are the design and construction of the throwing booth and the collection of a sufficiently large dataset.
The dataset is necessary to train an artificial neural network (ANN).
In addition, project 5 includes a short introduction to artificial intelligence and machine learning.
%During the thesis, based on project 5, artificial intelligence will be implemented on the Ultra96 board.
During the thesis the required convolutional neural network will be trained and implemented on the Ultra96 board.

This technical report describes the theoretical and technical principles needed to implement the project.
This technical report describes the theoretical and technical basis necessary for the realization of the project.
25 changes: 13 additions & 12 deletions doc/report/sections/thesis_prospect.tex
Original file line number Diff line number Diff line change
Expand Up @@ -7,21 +7,22 @@ \section{Thesis Prospect}
% OS / application
% Verification -- The accuracy and performance of the CNN model implemented on the FPGA is verified.

This project work will be continued as a thesis in the next semester.
This project will be continued as a thesis in the next semester.
The throwing booth will be upgraded with the Fibox and a net construction.
The box can be manufactured according to the drawings, the net construction has to be drawn first.
Further the net holder will be produced with a 3D printer.
The box can be manufactured according to the finished drawings, but the net construction has to be drawn first.
The net holder will most likely be manufactured with a 3D printer.

The CNN model is developed and trained in Python 3 using TensorFlow 2 with the data set created in P5.
In order to be able to evaluate the changes of the images on the ARM, a parallelization will probably be necessary.
The reason is the lower clock rate of the ARM A53 processors with the same amount of data.
The CNN model is developed and trained in Python 3 using TensorFlow 2 with the dataset collected during this project.
%In order to be able to evaluate the changes of the images on the ARM, a parallelization will probably be necessary.
In order to be able to use the throw detection mechanism on the quad-core ARM-based processor, parallel computing will probably be necessary.
The reason for this is the lower clock speed of the ARM Cortex-A53 processors for the same amount of data.

As soon as this is realized the next step is to implement the trained CNN model on the FPGA.
With sufficient time resources additional focus can be placed on speed optimization.
As soon as this is realized, the next step is to implement the trained CNN model on the FPGA.
With sufficient time resources, an additional focus can be placed on speed optimizations.

At the same time a linux-based operating system (OS) is put into operation on the MPSoC.
At the same time a Linux-based operating system (OS) is put into operation on the MPSoC.
The application shows the recognized object to the user at the trade fair.
This function is implemented on the ARM Cortex A53 as well.
This function is implemented on the ARM Cortex-A53 as well.

To measure the quality of the developed system, the neural network is verified.
Special attention is paid to the differences between the computer and the implementation on the Zynq UltraScale+.
To measure the quality of the developed system, the accuracy of the convolutional neural network is evaluated.
Special attention is paid to the differences between the implementation on the computer and the implementation on the embedded system.
Loading

0 comments on commit 0b32759

Please sign in to comment.