forked from nachiket/nachiket.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcv.html
162 lines (162 loc) · 37.3 KB
/
cv.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>Curriculum Vitae</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
</style>
<link rel="stylesheet" href="stylesheets/styles.css" />
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<meta charset='utf-8'>
<div class="wrapper">
<!-- Compilation Instructions
pandoc \-\-columns=160 cv.md -s -c stylesheets/styles.css -o cv.html \-\-metadata pagetitle="Curriculum Vitae"
pandoc -V geometry:margin=1in -f markdown+hard_line_breaks cv.md -o cv.pdf
-->
<header>
<h1 id="curriculum-vitae">Curriculum Vitae</h1>
Nachiket Kapre Electrical and Computer Engineering <br> University of Waterloo<br> Canada <br> Email: nachiket at uwaterloo dot ca<br>
</header>
<section>
<h2 id="education">Education</h2>
<p><strong>Ph.D. California Institute of Technology (USA),</strong> Computer Science<br> Dissertation: <em>SPICE<sup>2</sup> - A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator</em> <br> Degree Conferred: September 2010 <a href="./publications/phd-thesis_2010.pdf">[Link]</a><br> <!--Committee: André DeHon (UPenn), Shuki Bruck (Caltech), Dan Meiron (Caltech), Alain Martin (Caltech) and Steven Trimberger (Xilinx).--></p>
<p><strong>M.S., California Institute of Technology (USA),</strong> Computer Science<br> Degree Conferred: June 2006 <br> Thesis: <em>Packet-Switched FPGA-Overlay Networks</em> <a href="./publications/ms-thesis_2006.pdf">[Link]</a></p>
<p><strong>M.S., California Institute of Technology (USA),</strong> Electrical Engineering<br> Degree Conferred: June 2005</p>
<p><strong>B.E., University of Pune (India),</strong> Electronics and Telecommunication Engineering<br> Project: <em>FPGA based testing system for Siemens Railway Signalling Relayss</em> <a href="./publications/be_project.pdf">[Link]</a> <a href="./publications/be_oral.swf">[slides]</a> <a href="./publications/be_guide.swf">[guide]</a></br> Degree Conferred: August 2002</p>
<h2 id="research-interests">Research Interests</h2>
<p>Concurrent and Spatial Architectures, Parallel Processing, Heterogeneous Architectures and Compilation Tools, Communication-Centric Design</p>
<h2 id="grants">Grants</h2>
<p>NSERC Discovery Grant (2017) CAD$33K/yr (5 years) <br> AcRf Tier 1 Grant (Nov 2015) S$100K (1 year) <br> Delta Electronics Grant (Co-PI) (August 2015) S$100K (Co-PI) (2 years) <br> MIT SMART Innovation Grant (Co-PI) (August 2015) S$50K (Co-PI) (2 years) <br> NTU CELT Excellence in Education Grant (November 2014) S$37K (1 year) <br> AcRf Tier 1 Grant (March 2014) S$150K (2 years) <br> NTU CELT Excellence in Education Grant (October 2013) S$40K (1 year) <br> NTU CELT Excellence in Education Grant (March 2013) S$30K (1 year) <br> NTU CoE Competitive Seed Grant S$50K (Jan-March 2013) <br> NTU SCE Startup Grant S$100K (3 years) <br></p>
<h2 id="journal-publications">Journal Publications</h2>
<p><a href="./publications/caffepresso_tecs2017.pdf">[PDF]</a> <strong>CaffePresso: Accelerating Convolutional Networks on Embedded SoCs</strong> <br> Gopalakrishna Hegde, Siddhartha, <u>Nachiket Kapre</u> <br> <em>ACM Transactions on Embedded Computing Systems</em> (Special Issue on ESWEEK 2016) <br></p>
<p><a href="./publications/hoplite_trets2017.pdf">[PDF]</a> <strong>“Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs”</strong> <strong><font color="red">(Best Paper Award)</font></strong> <br> <u>Nachiket Kapre</u>, Jan Gray <br> <em>ACM Transactions on Reconfigurable Technology and Applications</em> (Special Issue FPL 2015) <br></p>
<p><a href="./publications/soft-vector_trets2016.pdf">[PDF]</a> <strong>“Optimizing Soft Vector Processing in FPGA-based Embedded Systems”</strong> <br> <u>Nachiket Kapre</u> <br> <em>ACM Transactions on Reconfigurable Technology and Applications</em> (Special Issue FPL 2014), Published: 2016<br></p>
<p><a href="./publications/case-for-graphs_superfri2016.pdf">[PDF]</a> <strong>“A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems”</strong> <br> Pradeep Moorthy, <u>Nachiket Kapre</u> <br> <em>Supercomputing Frontiers and Innovations</em> (Special Best Papers Issue from 2015 Supercomputing Frontiers Conference), Published: 2016 <br></p>
<p><a href="./publications/comm-avoid-itersolve_tpds2013.pdf">[PDF]</a> <strong>“Communication Optimization of Iterative Sparse Matrix-Vector Multiply on GPUs and FPGAs”</strong> <br> Abid Rafique, George Constantinides, <u>Nachiket Kapre</u> <br> <em>IEEE Transactions on Parallel and Distributed Systems</em>, Jan 2015 <br></p>
<p><a href="./publications/spice_trcad2012.pdf">[PDF]</a> <strong>“SPICE<sup>2</sup>: Spatial Processors Interconnected for Concurrent Execution for accelerating the SPICE Circuit Simulator using an FPGA”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <em>Transactions in CAD (Special Issue on Parallel CAD)</em>, Volume 31 Issue 1 January 2012 <br></p>
<p><a href="./publications/graphstep_taas-2011.pdf">[PDF]</a> <strong>“Spatial Hardware Implementation for Sparse Graph Algorithms in GraphStep”</strong> <br> Michael deLorimier, <u>Nachiket Kapre</u>, Nikil Mehta and André DeHon <br> <em>ACM Transactions on Autonomous and Adaptive Systems: Spatial Computing Special Issue</em>, September 2011 <br></p>
<p><a href="http://downloads.hindawi.com/journals/ijrc/2011/745147.pdf">[PDF]</a> <strong>“An NoC Traffic Compiler for efficient FPGA implementation of Sparse Graph-Oriented Workloads”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <br> <em>International Journal of Reconfigurable Computing</em> Volume 2011 Article ID 745147<br></p>
<p><a href="./publications/sataccum_jrnl.pdf">[PDF]</a> <strong>“Pipelined Saturated Accumulation”</strong> <br> Karl Papadantonakis, <u>Nachiket Kapre</u>, Stephanie Chan, and André DeHon <br> <em>IEEE Transactions on Computers</em>, February 2009. <br></p>
<h2 id="conferenceworkshop-publications-full-papers">Conference/Workshop Publications (Full papers)</h2>
<p><a href="./publications/part-systolic_fpt-2019.pdf">[PDF]</a> <strong>“Partitioning Systolic Arrays for Fun and Profit”</strong> <strong><font color="red">(Best Paper Award)</font></strong><br> Harry Chan, Gurshaant Malik, <u>Nachiket Kapre</u> <br> <em>International Conference on Field-Programmable Technology</em>, Dec 2019 <br> (upcoming)</p>
<p><a href="./publications/stc_fpl-2019.pdf">[PDF]</a> <strong>“Scaling the Cascades: Interconnect-aware FPGA implementation of Machine Learning problems”</strong> <br> Ananda Samajdar, Tushar Garg, Tushar Krishna, <u>Nachiket Kapre</u> <br> <em>29th International Conference on Field-Programmable Logic and Applications</em>, Sep 2019 <br></p>
<p><a href="./publications/rapidroute-timing_fpl-2019.pdf">[PDF]</a> <strong>“Timing-aware routing in the RapidWright framework”</strong> <br> Leo Liu, <u>Nachiket Kapre</u> <br> <em>29th International Conference on Field-Programmable Logic and Applications</em>, Sep 2019 <br></p>
<p><a href="./publications/enhanced-bft_fccm-2019.pdf">[PDF]</a> <strong>“Enhancing Butterfly Fat Tree NoCs for FPGAs with lightweight flow control”</strong> <br> Gurshaant Malik, <u>Nachiket Kapre</u> <br> <em>International Conference on Field Programmable Custom Computing Machines</em>, Apr 2019 <br></p>
<p><a href="./publications/hoplitebuf_fpga-2019.pdf">[PDF]</a> <strong>“HopliteBuf: FPGA NoCs with Provably Stall-Free FIFOs”</strong> <strong><font color="red">(Best Paper Nominee)</font></strong><br> Tushar Garg, Saud Al Wasly, Rodolfo Pellizzoni, <u>Nachiket Kapre</u> <br> <em>International Symposium on Field-Programmable Gate Arrays</em>, Feb 2019<br></p>
<p><a href="./publications/nengo-embedded_fpt-2018.pdf">[PDF]</a> <strong>“Implementing NEF Neural Networks on Embedded FPGAs”</strong> <br> Benjamin Morcos, Terrence Stewart, Chris Eliasmith, <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2018 <br></p>
<p><a href="./publications/dataflow-overlay_fpt-2018.pdf">[PDF]</a> <strong>“DaCO: A High-Performance Token Dataflow Coprocessor Overlay for FPGAs”</strong> <br> Siddhartha, <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2018 <br></p>
<p><a href="./publications/fasttrack_isca-2018.pdf">[PDF]</a> <strong>“FastTrack: Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs”</strong><br> <u>Nachiket Kapre</u>, Tushar Krishna <br> <em>International Symposium on Computer Architecture</em>, June 2018 <br></p>
<p><a href="./publications/legup-noc_fccm-2018.pdf">[PDF]</a> <strong>“LegUp-NoC: High-Level Synthesis of Loops with Indirect Addressing”</strong><br> Asif Islam, <u>Nachiket Kapre </u> <br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, April-May 2018 <br></p>
<p><a href="./publications/hopliteq_fccm-2018.pdf">[PDF]</a> <strong>“HopliteQ: Priority-Aware Routing in FPGA Overlay NoCs” </strong> <br> Siddhartha, <u> Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, April-May 2018 <br></p>
<p><a href="./publications/hoplitert_fpt-2017.pdf">[PDF]</a> <strong>“HopliteRT: An Efficient FPGA NoC for Real-Time Applications”</strong> <br> Saud Al Wasly, Rodolfo Pellizzoni, <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2017 <br></p>
<p><a href="./publications/deflection-bft_fpl-2017.pdf">[PDF]</a> <strong>“Deflection Routed Butterfly Fat Trees on FPGAs”</strong> <br> <u>Nachiket Kapre</u><br> <em>27th International Conference on Field-Programmable Logic and Applications</em>, Sep 2017 <br></p>
<p><a href="./publications/hoplite-pr_fpl-2017.pdf">[PDF]</a> <strong>“Enabling Partial Reconfiguration and Low Latency Routing using Segmented FPGA NoCs”</strong> <br> Kizhepatt Vipin, Jan Gray, <u>Nachiket Kapre</u><br> <em>27th International Conference on Field-Programmable Logic and Applications</em>, Sep 2017 <br></p>
<p><a href="./publications/hoplite-bitserial_fccm2017.pdf">[PDF]</a> <strong>“On Bit-Serial NoCs for FPGAs”</strong> <br> <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2017 <br></p>
<p><a href="./publications/hoplite-ultrascale_fccm2017.pdf">[PDF]</a> <strong>“Implementing FPGA overlay NoCs using the Xilinx UltraScale memory cascades”</strong> <br> <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2017 <br></p>
<p><a href="./publications/ebsp_date2017.pdf">[PDF]</a> <strong>“eBSP: Managing NoC traffic for BSP workloads on the 16-core Adapteva Epiphany-III Processor”</strong> <br> Siddhartha, <u>Nachiket Kapre</u><br> <em>Design, Automation, and Test in Europe</em>, March 2017 <br></p>
<p><a href="./publications/multi-level-noc_fpt2016.pdf">[PDF]</a> <strong>“Deflection Routing for Multi-Level FPGA Overlay NoCs”</strong> <br> Chethan Kumar H B, Shubham Agarwal, <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2016 <br></p>
<p><a href="./publications/mosquito-ml_acmdev2016.pdf">[PDF]</a> <strong>“Preventive Detection of Mosquito Populations using Embedded Machine Learning on Low Power IoT Platforms”</strong> <br> Prashant Ravi, Uma Syam, <u>Nachiket Kapre</u><br> <em>Seventh ACM Symposium on Computing and Development</em>, Nov 2016 <br></p>
<p><a href="./publications/caffepresso_cases2016.pdf">[PDF]</a> <strong>“CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms”</strong> <strong><font color="red">(Best Paper Award)</font></strong> <br> Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, <u>Nachiket Kapre</u><br> <em>International Conference on Compilers, Architecture, and Synthesis for Embedded Systems</em>, Oct 2016 <br></p>
<p><a href="./publications/hoplite-dsp_fpl2016.pdf">[PDF]</a> <strong>“Hoplite-DSP: Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs”</strong> <br> Chethan Kumar H B, <u>Nachiket Kapre</u><br> <em>26th International Conference on Field-Programmable Logic and Applications</em>, Sep 2016 <br></p>
<p><a href="./publications/intime_fpl2016.pdf">[PDF]</a> <strong>“Boosting Convergence of Timing Closure using Feature Selection in a Learning-Driven Approach”</strong> <br> Que Yanghua, Harnhua Ng, <u>Nachiket Kapre</u><br> <em>26th International Conference on Field-Programmable Logic and Applications</em>, Sep 2016 <br></p>
<p><a href="./publications/rc-dsl-survey_fpl2016.pdf">[PDF]</a> <strong>“Survey of Domain-Specific Languages for FPGA Computing”</strong> <br> <u>Nachiket Kapre</u>, Samuel Bayliss<br> <em>26th International Conference on Field-Programmable Logic and Applications</em>, Sep 2016 <br></p>
<p><a href="./publications/marathon_fccm2016.pdf">[PDF]</a> <strong>“Marathon: Statically-Scheduled Conflict-Free Routing on FPGA Overlay NoCs”</strong> <br> <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2016 <br></p>
<p><a href="./publications/gpu-bitwidth_fpga2016.pdf">[PDF]</a> <strong>“GPU-Accelerated High-Level Synthesis for Bitwidth Optimization of FPGA Datapaths”</strong> <br> <u>Nachiket Kapre</u>, Ye Deheng<br> <em>International Symposium on Field-Programmable Gate Arrays</em>, Feb 2016 <br></p>
<p><a href="./publications/hoplite_fpl2015.pdf">[PDF]</a> <strong>“Hoplite: Building Austere Overlay NoCs for FPGAs”</strong> <strong><font color="red">(Best Paper Award)</font></strong> <br> <u>Nachiket Kapre</u>, Jan Gray<br> <em>25th International Conference on Field-Programmable Logic and Applications</em>, Sep 2015 <br></p>
<p><a href="./publications/green_fpl2015.pdf">[PDF]</a> <strong>“Limits of FPGA Acceleration of 3D Green’s Function Computation for Geophysical Applications”</strong> <br> <u>Nachiket Kapre</u>, Selvakumar Jayakrishnan, Parjanya Gupta, Sagar Masuti, Sylvain Barbot<br> <em>25th International Conference on Field-Programmable Logic and Applications</em>, Sep 2015 <br></p>
<p><a href="./publications/graphproc_asap2015.pdf">[PDF]</a> <strong>“Custom FPGA-based Soft-Processors for Sparse Graph Acceleration”</strong> <br> <u>Nachiket Kapre</u><br> <em>26th IEEE International Conference on Application-specific Systems, Architectures and Processors</em>, July 2015 <br></p>
<p><a href="./publications/graphmmu_raw2015.pdf">[PDF]</a> <strong>“GraphMMU: Memory Management Unit for Sparse Graph Accelerators”</strong> <br> <u>Nachiket Kapre</u>, Han Jianglei, Andrew Bean, Pradeep Moorthy, Siddhartha<br> <em>22nd Reconfigurable Architectures Workshop, 2015</em> (co-located with IPDPS 2015), May 2015 <br></p>
<p><a href="./publications/spice-faulttol_raw2015.pdf">[PDF]</a> <strong>“Enhancing Speedups for FPGA Accelerated SPICE through Frequency Scaling and Precision Reduction”</strong> <br> <u>Nachiket Kapre</u>, Lim Hui Hui<br> <em>22nd Reconfigurable Architectures Workshop, 2015</em> (co-located with IPDPS 2015), May 2015 <br></p>
<p><a href="./publications/zedwulf_fccm2015.pdf">[PDF]</a> <strong>“Zedwulf: Power-Performance Tradeoffs of a 32-node Zynq SoC cluster”</strong> <br> Pradeep Moorthy, <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2015 <br></p>
<p><a href="./publications/intime_fccm2015.pdf">[PDF]</a> <strong>“Driving Timing Convergence of FPGA Designs through Machine Learning and Cloud Computing”</strong> <br> <u>Nachiket Kapre</u>, Bibin Chandrashekaran, Harnhua Ng, Kirvy Teo<br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2015 <br></p>
<p><a href="./publications/opencv-saliency_fccm2015.pdf">[PDF]</a> <strong>“Energy-Efficient Acceleration of OpenCV Saliency Computation using Soft Vector Processors”</strong><br> Gopalakrishna Hegde, <u>Nachiket Kapre</u> <br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2015 <br></p>
<p><a href="./publications/idea-loopback_fpga2015.pdf">[PDF]</a> <strong>“On Data Forwarding in Deeply Pipelined Soft Processor”</strong> <br> Hui Yan Cheah, Suhaib A. Fahmy and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Gate Arrays</em>, February 2015 <br></p>
<p><a href="./publications/relax-miracle_hipc2014.pdf">[PDF]</a> <strong>“Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling”</strong> <br> Sagar Masuti, Sylvain Barbot, and <u>Nachiket Kapre</u><br> <em>International Conference on High Performance Computing</em>, December 2014 <br></p>
<p><a href="./publications/limits-vector_fpl2014.pdf">[PDF]</a> <strong>“Comparing Soft and Hard Vector Processing in FPGA-based Embedded Systems” <font color="red">(Best Paper Nominee)</font></strong> <br> Soh Jun Jie, and <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Logic and Applications</em>, September 2014 <br></p>
<p><a href="./publications/dataflow-limits_dfm2014.pdf">[PDF]</a> <strong>“Limits of Statically Scheduled Token Dataflow Processing”</strong> <br> <u>Nachiket Kapre</u>, and Siddhartha<br> <em>4th International Workshop on Data-Flow Execution Models for Extreme Scale Computing</em> (co-located with PACT 2014), August 2014 <br></p>
<p><a href="./publications/fpga-driver_fpt-2013.pdf">[PDF]</a> <strong>“System-Level FPGA Device Driver with High-Level Synthesis Support”</strong> <br> Vipin Kizhepatt, Shreejit Shanker, Dulitha Gunasekara, Suhaib A Fahmy, <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2013 <br></p>
<p><a href="./publications/uncertainty_fccm-2013.pdf">[PDF]</a> <strong>“Exploiting Input Parameter Uncertainty for Reducing Datapath Precision of SPICE Device Models”</strong> <br> <u>Nachiket Kapre</u> <br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, April 2013 <br></p>
<p><a href="./publications/appcompose_fccm-2013.pdf">[PDF]</a> <strong>“Application Composition and Communication Optimization of Iterative Solvers using FPGAs” <font color="red"> (HiPEAC Paper Award)</font></strong> <br> Abid Rafique, <u>Nachiket Kapre</u> and George Constantinides<br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, April 2013 <br></p>
<p><a href="./publications/tsqr_fpl-2012.pdf">[PDF]</a> <strong>“Enhancing Performance of Tall-Skinny QR factorization using FPGAs”</strong> <br> Abid Rafique, <u>Nachiket Kapre</u> and George Constantinides<br> <em>International Conference on Field-Programmable Logic and Applications</em>, August 2012 <br></p>
<p><a href="./publications/fxscore_fccm-2012.pdf">[PDF]</a> <strong>“FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models using Gappa++”</strong> <br> Helene Martorell and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, April 2012 <br></p>
<p><a href="./publications/spice_iterctrl_fpt-2011.pdf">[PDF]</a> <strong>“VLIW-SCORE: Beyond C for Sequential Control of SPICE FPGA Acceleration” <font color="red"> (Best Paper Award)</font></strong> <br> <u>Nachiket Kapre</u> and André DeHon<br> <em>International Conference on Field-Programmable Technology</em>, December 2011 <br></p>
<p><a href="./publications/spice_overview_carl2010.pdf">[PDF]</a> <strong>“SPICE<sup>2</sup> - A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator”</strong> <br> <u>Nachiket Kapre</u> and André DeHon<br> <em>The First Workshop on the Intersections of Computer Architecture and Reconfigurable Logic</em>, December 2010 <br></p>
<p><a href="./publications/noc_traffic-engg_recosoc2010.pdf">[PDF]</a> <strong>“An NoC Traffic Compiler for efficient FPGA implementation of Sparse Graph-Oriented Workloads”</strong> <br> <u>Nachiket Kapre</u> and André DeHon<br> <em>Reconfigurable Communication-centric Systems on Chip</em>, May 2010 <br></p>
<p><a href="./publications/spice_matrix-solve_fpt-2009.pdf">[PDF]</a> <strong>“Parallelizing Sparse Matrix-Solve for SPICE Circuit Simulation using FPGAs”</strong> <br> <u>Nachiket Kapre</u> and André DeHon<br> <em>International Conference on Field-Programmable Technology</em>, December 2009 <br></p>
<p><a href="./publications/spice_perf-compare-arch_fpl-2009.pdf">[PDF]</a> <strong>“Performance Comparison of Single-Precision SPICE Model-Evaluation on FPGA, GPU, Cell, and Multi-Core Processors”</strong> <br> <u>Nachiket Kapre</u> and André DeHon<br> <em>International Conference on Field-Programmable Logic and Applications</em>, September 2009 <br></p>
<p><a href="./publications/spice_spatial-model-eval_fccm-2009.pdf">[PDF]</a> <strong>“Accelerating SPICE Model-Evaluation using FPGAs”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <br> <em>IEEE Symposium on Field-Programmable Custom Computing Machines</em>, April 2009 <br></p>
<p><a href="./publications/fp-accum_arith-2007.pdf">[PDF]</a> <strong>“Optimistic Parallelization of Floating-Point Accumulation”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <br> <em>IEEE Symposium on Computer Arithmetic</em>, June 2007. <br></p>
<p><a href="./publications/ps-tm-networks_fccm-2006.pdf">[PDF]</a> <strong>“Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks” <font color="red">(FCCM20 25-most influential papers award winner) </font></strong> <br> <u>Nachiket Kapre</u>, Nikil Mehta, Michael deLorimier, Raphael Rubin, Henry Barnor, Michael Wilson, Michael Wrighton and André DeHon<br> <em>IEEE Symposium on Field-Programmable Custom Computing Machines</em>, April 2006. <br></p>
<p><a href="./publications/graphstep_fccm-2006.pdf">[PDF]</a> <strong>“GraphStep: A System Architecture for Sparse Graph Algorithms”</strong> <br> Michael deLorimier, <u>Nachiket Kapre</u>, Nikil Mehta, Dominic Rizzo, Ian Eslick, Raphael Rubin, Tomas Uribe, Thomas Knight Jr., and André DeHon<br> <em>IEEE Symposium on Field-Programmable Custom Computing Machines</em>, April 2006. <br></p>
<p><a href="./publications/sat-accum_fpt-2005.pdf">[PDF]</a> <strong>“Pipelined Saturated Accumulation”</strong> <br> Karl Papadantonakis, <u>Nachiket Kapre</u>, Stephanie Chan, and André DeHon<br> <em>International Conference on Field-Programmable Technology</em>, December 2005. <br></p>
<p><a href="./publications/des-pat_fccm-2004.pdf">[PDF]</a> <strong>“Design Patterns for Reconfigurable Computing”</strong> <br> André DeHon, Joshua Adams, Michael deLorimier, <u>Nachiket Kapre</u>, Yuki Matsuda, Helia Naeimi, Michael Vanier, and Michael Wrighton <br> <em>IEEE Symposium on Field-Programmable Custom Computing Machines</em>, April 2004. <br></p>
<h2 id="conference-publications-short-papers">Conference Publications (Short Papers)</h2>
<p><a href="./publications/bft-arity4_fccm-2020.pdf">[PDF]</a> <strong>“Exploring the Impact of Switch Arity on Butterfly Fat Tree FPGA NoCs”</strong> <br> Ian Elmor Lang, Ziqiang Huang, <u>Nachiket Kapre</u> <br> <em>International Conference on Field Programmable Custom Computing Machines</em>, Apr 2020 <br></p>
<p><a href="./publications/rapid-route_fccm-2019.pdf">[PDF]</a> <strong>“RapidRoute: Fast Assembly of Communication Structures for FPGA Overlays”</strong> <strong><font color="red">(Best Short Paper Runner-up)</font></strong> <br> Leo Liu, Jay Weng, <u>Nachiket Kapre</u> <br> <em>International Conference on Field Programmable Custom Computing Machines</em>, Apr 2019 <br></p>
<p><a href="./publications/openclpipe_iwocl-2017.pdf">[PDF]</a> <strong>“Applying Models of Computation to OpenCL Pipes for FPGA Computing”</strong> <br> <u>Nachiket Kapre</u>, Hiren Patel<br> <em>5th International Workshop on OpenCL</em>, May 2017<br></p>
<p><a href="./publications/mips_fpga2017.pdf">[PDF]</a> <strong>“120-core microAptiv MIPS Overlay for the Terasic DE5-NET FPGA board”</strong> <br> Chethan Kumar H B, Gourav Modi, Prashant Ravi, <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Gate Arrays</em>, Feb 2017 <br></p>
<p><a href="./publications/dwt-mxp_fpl2016.pdf">[PDF]</a> <strong>“Vector Acceleration of 1-D DWT Computations using Sparse Matrix Skeletons”</strong> <br> Sidharth Maheshwari, Gourav Modi, Siddhartha, <u>Nachiket Kapre</u><br> <em>26th International Conference on Field-Programmable Logic and Applications</em>, Sep 2016 <br></p>
<p><a href="./publications/ml-classifiers_fccm2016.pdf">[PDF]</a> <strong>“Improving Classification Accuracy of a Machine Learning approach for FPGA Timing Closure”</strong> <br> Que Yanghua, <u>Nachiket Kapre</u>, Harnhua Ng, Kirvy Teo<br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2016 <br></p>
<p><a href="./publications/ml-case_fpga2016.pdf">[PDF]</a> <strong>“Case for Design-Specific Machine Learning in Timing Closure of FPGA Designs”</strong> <br> Que Yanghua, Chinnakkannu Adaikkal Raj, Harnhua Ng, Kirvy Teo, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Gate Arrays</em>, Feb 2016 <br></p>
<p><a href="./publications/intime_fpga2015.pdf">[PDF]</a> <strong>“InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters”</strong> <br> <u>Nachiket Kapre</u>, Harnhua Ng, Kirvy Teo and Jaco Naude<br> <em>International Symposium on Field-Programmable Gate Arrays</em>, February 2015 <br></p>
<p><a href="./publications/fanout-decomp_fpt2014.pdf">[PDF]</a> <strong>“Fanout Decomposition Dataflow Optimizations for FPGA-based Sparse LU Factorization”</strong> <br> Siddhartha, and <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2014 <br></p>
<p><a href="./publications/idea-analysis_fpt2014.pdf">[PDF]</a> <strong>“Analysis and Optimization of a Deeply Pipelined FPGA Soft Processor”</strong> <br> Hui Yan Cheah, Suhaib A. Fahmy and <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Technology</em>, December 2014 <br></p>
<p><a href="./publications/hetero-dataflow_fpl2014.pdf">[PDF]</a> <strong>“Heterogeneous Dataflow Architectures for FPGA-based Sparse LU Factorization”</strong> <br> Siddhartha, and <u>Nachiket Kapre</u><br> <em>International Conference on Field-Programmable Logic and Applications</em>, September 2014 <br></p>
<p><a href="./publications/breakseq_fccm-2014.pdf">[PDF]</a> <strong>“Breaking Sequential Dependencies in FPGA-based Sparse LU Factorization”</strong> <br> Siddhartha, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2014 <br></p>
<p><a href="./publications/mixfxscore_fccm-2014.pdf">[PDF]</a> <strong>“MixFX-SCORE: Heterogeneous Fixed-Point Compilation of Dataflow Computations”</strong> <br> Ye Deheng, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2014 <br></p>
<p><a href="./publications/timfaults_fccm-2014.pdf">[PDF]</a> <strong>“Timing Fault Detection in FPGA-based Circuits”</strong> <br> Edward Stott, Joshua M. Levine, Peter Y. K. Cheung, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2014 <br></p>
<h2 id="posters">Posters</h2>
<p><strong>“Evaluating Embedded FPGA Accelerators for Deep Learning Applications”</strong> <br> Gopalakrishna Hegde, Siddhartha, Nachiappan Ramasamy, Vamsi Buddha, <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2016 <br></p>
<p><strong>“Communication Optimization for the 16-core Epiphany Floating-Point Processor Array”</strong> <br> Siddhartha, <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2016 <br></p>
<p><strong>“Machine-Learning driven Auto-Tuning of High-Level Synthesis for FPGAs”</strong> <br> Li Ting, Harri Sapto Wijaya, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Gate Arrays</em>, Feb 2016 <br></p>
<p><strong>“Sparse Graph Processing using Soft-Processors”</strong> <br> <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Custom Computing Machines</em>, May 2015 <br></p>
<p><strong>“FPGA Acceleration of Irregular Iterative Computations using Criticality-Aware Dataflow Optimizations”</strong> <br> Siddhartha, and <u>Nachiket Kapre</u><br> <em>International Symposium on Field-Programmable Gate Arrays</em>, February 2015 <br></p>
<p><a href="./publications/timing-errors_selse2014.pdf">[PDF]</a> <strong>“Measuring Timing Errors in FPGA-based Circuits”</strong> <br> Joshua Levine, Edward Stott, and <u>Nachiket Kapre</u><br> <em>The 10th IEEE Workshop on Silicon Errors in Logic - System Effects</em>, April 2014 <br></p>
<h2 id="magazine-articles">Magazine Articles</h2>
<p><a href="./publications/saliency-fpga_ine2004.pdf">[PDF]</a> <strong>“Saliency on a chip: a digital approach with an FPGA”</strong> <br> <u>Nachiket Kapre</u>, Dirk Walther, and Christof Koch, and André DeHon<br> <em>The Neuromorphic Engineer, Volume 1, Issue 2, Autumn 2004</em></p>
<h2 id="book-chapters">Book Chapters</h2>
<p><strong>“Accelerating the SPICE Circuit Simulator using an FPGA - A Case Study”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <br> From <em>High-Performance Computing using FPGAs</em> <br> Page 389-427, <br> Edited by Wim Vanderbauwhere and Khaled Benkrid <br> Published by <em>Springer</em>, Copyright 2013, ISBN-13: 978-1-4614-1790-3 <br></p>
<p><strong>“Programming FPGA Applications in VHDL”</strong> <br> <u>Nachiket Kapre</u> and André DeHon <br> From <em>Reconfigurable Computing: The Theory and Practice of FPGA-based Computation</em>, <br> Pages 129-153, <br> Edited by <em>Scott Hauck</em> and <em>André DeHon</em>, <br> Published by <em>Morgan Kauffman/Elsevier</em>, Copyright 2008, ISBN-13: 978-0-12-370522-8 <br></p>
<h2 id="selected-talks">Selected Talks</h2>
<p>“A Case for Embedded FPGA-based SoCs for Energy-Efficient Acceleration of Graph Problems” <br> Pradeep Moorthy, Siddhartha, <u>Nachiket Kapre</u><br> <em>Supercomputing Frontiers 2015</em>, March 2015 <br></p>
<p>“SPICE<sup>2</sup>- A Spatial Parallel Architecture for Accelerating the SPICE Circuit Simulator - Retrospective and Vision”<br> <em>Talk at Maxeler Inc., University of Glasgow, University of York, Oxford, University of Southampton, National University of Singapore, Mahankorn University of Technology</em>, 2010-2013. <br></p>
<p>“Spatial SPICE Mapping and Lessons”<br> Vancouver, Canada <em>Talk at the University of British Columbia (UBC)</em>, August 2010. <br></p>
<p>“Accelerating the SPICE Circuit Simulator using FPGAs”<br> Bengaluru (Bangalore), India <em>Invited Talk at the Indian Institute of Science (IISC)</em>, March 2010. <br></p>
<p>“Accelerating the SPICE Circuit Simulator using FPGAs”<br> Austin, USA <em>Invited Talk at IBM Inc.</em>, August 2009. <br></p>
<p>“Accelerating SPICE Model-Evaluation using FPGAs”<br> San Jose, USA. <em>Invited Talk at Xilinx Inc.</em>, February 2009.</p>
<p>“Exploiting Application Structure in On-Chip Network Design”<br> <em>Invited Talk at University of Gent, Belgium and TU Munich, Germany</em>, July-August 2007.</p>
<h3 id="patents">Patents</h3>
<p>“Method and a circuit using an associative calculator for calculating a sequence of non-associative operations”, André DeHon and Nachiket Kapre<br> Publication Number US7991817 B2, Applied Jan 2007, Granted Aug 2011.<br></p>
<h2 id="advising">Advising</h2>
<p><strong>Waterloo PhD</strong><br> Gurshaant Malik (2017-present): <em>TBA</em> <br></p>
<p><strong>Waterloo MASc</strong><br> Benjamin Morcos (2017-present): <em>Neural Networks on FPGAs</em> <br> Tushar Garg (2017-present): <em>TBA</em> <br></p>
<p><strong>NTU PhD</strong><br> Siddhartha (2013-present): <em>Dataflow Computing using FPGAs</em> <br></p>
<p><strong>NTU MSc</strong><br> Chethan Kumar Basavaraju (2015): <em>FPGA NoCs</em> <br> Jayakrishnan Selva Kumar (2014): <em>Maxeler Applications</em> <br> Venugopal Swetha (2014): <em>GPU Monte-Carlo Applications</em> <br> Chinnakkannu Adaikkala Raj (2014): <em>Machine Learning in FPGA CAD</em> <br> Jianrong, Kiran Ganapathi, Kunal Gokhale (2014): Misc Topics<br> Kanchan Kaur, Shipeng Xu (2013): <em>FPGA Placement/Routing</em></p>
<p><strong>NTU UG (Final Year Projects)</strong><br> Shubham Agarwal (2015): <em>FPGA NoCs</em> <br> Que Yanghua (2015): <em>Machine Learning</em> <br> Dakshina Pradeep Moorthy (2014): <em>Parallel Graph Accelerators</em> <br> Han Jianglei (2014): <em>Parallel Graph Accelerators</em> <br> Soh Jun Jie (2013-14): <em>Vectorblox</em> <br> Favian (2013-14): <em>3D Convolution using FPGAs</em> <br> Lim Hui Hui (2013): <em>SPICE Fault Tolerance</em> <br></p>
<p><strong>Imperial PhD, MSc, MEng, BEng and Interns</strong> </br> Andrew Bean (PhD student 2011-2016): <em>Adaptive/Learning Systems using FPGAs</em><br> Abid Rafique (PhD student 2010-2013): <em>Accelerating Semi-Definite Programming with FPGAs, GPUs and Multi-Cores</em> <br> Siddhartha, Dulitha Gunasekara (BEng/MEng students 2011-2012): Different topics <br> Helene Martorell, Emmanouil Spanakis, Fang Zhou, Wei Lizhong (MSc students 2010-2011): Different topics <br> Coryan Wilson-Shah (UROP student 2011): Matrix-Free SPICE <br> Cody Huang (CAPA intern, UC Davis undergraduate 2011): GPU Code-Generation</p>
<p><strong>Caltech Undergraduates and Summer Students</strong> </br> Henry Barnor (2005, now at Altera): <em>VHDL Design of systolic hardware sorter/placer</em> <br> Stephanie Chan (2005, now at NIST): <em>Experiments on saturating accumulator</em> <br> Ravi Teja Sukhavasi (2006, Caltech graduate student): <em>Applying network-coding ideas to message traffic between parallel compute elements</em> <br> Jon Ramirez (2006): <em>Floating-point associative accumulator</em> <br></p>
<p><strong>Corporate Collaborations</strong> <br> Harnhua Ng, Kirvy Teo (Plunify): <em>Machine-Learning for FPGA CAD</em> <br> Jacob Bower (Maxeler): <em>Maxeler Compiler Framework</em> <br> Kumiko Nomura (Toshiba): <em>Architecture analysis of 3D chips</em> <br></p>
<h2 id="teaching-experience">Teaching Experience</h2>
<p><strong>Lecturer</strong><br> Semester 1 2015, Nanyang Technological University, <em>CE4052/ES6152: Embedded System Development</em> <br> Semester 2 2014, Nanyang Technological University, <em>CE4054/ES6154: Programmable Systems-on-Chip</em> <br> Semester 1 2014, Nanyang Technological University, <em>CE4052/ES6152: Embedded System Development</em> <br> Semester 2 2013, Nanyang Technological University, <em>CE4054/ES6154: Programmable Systems-on-Chip</em> <br> Semester 1 2013, Nanyang Technological University, <em>CE7451: Research Methods in Computer Science & Engineering</em> <br> Semester 1 2013, Nanyang Technological University, <em>CE4052/ES6152: Embedded Systems Development</em> <br> Semester 1 2013, Nanyang Technological University, <em>ES7501: Electronic Design Automation</em> <br></p>
<p><strong>Tutorials/Labs</strong><br> Semester 1 2015, Nanyang Technological University, <em>CE3001: Advanced Computer Architecture</em> <br> Semester 1 2015, Nanyang Technological University, <em>CE4052/ES6152: Embedded Systems Development</em> <br> Semester 2 2014, Nanyang Technological University, <em>CE4054/ES6154: Programmable System-on-Chip</em> <br> Semester 1 2014, Nanyang Technological University, <em>CE1005: Digital Logic</em> (3 groups) <br> Semester 1 2014, Nanyang Technological University, <em>CE4052/ES6152: Embedded Systems Development</em> <br> Semester 2 2013, Nanyang Technological University, <em>CE1005: Digital Logic</em> (1 group) <br> Semester 1 2013, Nanyang Technological University, <em>CE4052/ES6152: Embedded Systems Development</em> <br></p>
<p><strong>Guest Lectures</strong><br> Fall 2011, Imperial College London, <em>ISE2: Computer Architecture</em> <br> Winter 2011, Imperial College London, <em>DoC: Custom Computing</em> <br></p>
<p><strong>Teaching Assistant</strong><br> Spring 2007, University of Pennsylvania, Electrical and Systems Engineering, <em>ESE680s2: Computer Organization</em> <br> Winter 2006, California Institute of Technology, Computer Science, <em>CS137: Electronic Digital Automation</em><br></p>
<h2 id="professional-experience">Professional Experience</h2>
<p><strong>Nanyang Technological University</strong>, Assistant Professor (Oct 2012-Sep 2016)<br> <strong>Plunify, Inc.</strong>, Chief Technology Officer (July 2014-)<br> <strong>Imperial College London</strong>, Junior Research Fellow (October 2010-September 2012)<br> <strong>Maxeler Inc.</strong>, Consultant (July 2011-July 2012)<br> <strong>University of Pennsylvania</strong>, Visiting Graduate Student (October 2006-present)<br> <strong>Xilinx Inc.</strong>, Summer Intern (Summer 2005)<br> <strong>Koch Lab (Caltech)</strong>, Research Assistant (February 2004 to September 2004)<br> <strong>Paxonet Communications Inc. (now Conexant)</strong>, Employee (August 2002 to August 2003)<br> <strong>Siemens Inc.</strong>, Part-Time Intern (2002).</p>
<p><font color="red"> Copyright for all the PDF papers hosted here belongs to IEEE, or ACM as appropriate.</font></p>
</section>
</div>
</body>
</html>