Skip to content

Troubleshoot CNTK

Zhou Wang edited this page Sep 14, 2016 · 19 revisions

This page collects some of the most frequent pitfalls users encounter.

Although the model was trained with a larger set than the evaluation set, CNTK runs out of memory during evaluation.

Training the model usually has a minibatchSize property set in its CNTK configuration file. When evaluating the model using CNTK.exe, ensure that the minibatchSize is appropriate. To quickly determine if this property is causing the issue, set the property to a low value (e.g. minibatchSize=2) in the configuration file for the evaluation command. (cf. Issue #468)

During eval the following error is seen: About to throw exception 'cuDNN failure 8: CUDNN_STATUS_EXECUTION_FAILED ; GPU=0 ; hostname=haha; expr=err'

Set the minibatchSize property to a low value (e.g. minibatchSize=2).

When I compile CNTK using VS2013, I see a compiler error, what is wrong?

You must upgrade VS2013 to "update 5". Setting up CNTK on Windows

I enabled Image Reader with zip support and get "Plugin not found: 'ImageReader.dll'" error when running Image Reader unit tests or trying to use the reader. What might be wrong?

Check that you have correctly installed zlib and libzip, especially that you have not forgotten to rename zlib.dll to zlib1.dll.

Evaluating a model returns the same values, regardless of input

There is a bug in ACML with regards to some Intel chips. We removed ACML from the CNTK build and will not support it any further. If you are still using an older version of CNTK which is using ACML, we strongly recommend switching to a CNTK version linked against MKL. If you want to continue using ACML, add ACML_FMA=0 to the system's environment variables to remove this issue. (cf. Issues #465, #506, #519). If the issue still remains you can also try different learning rates (try 0.1, 0.01, 0.001, ... for example), in some cases smaller learning rates are the key.

I get errors when using Eval C# library in Azure web apps

The error messages could be something like "Could not load file or assembly 'some CNTK dlls', or an exception System.Runtime.InteropServices.SEHException, or "InternalServiceFault: External component has thrown an exception.".

First please make sure that all CNTK dependency dlls are deployed to the Azure web app. Then you have to set your Azure web app to use 64-bit VM. In order to allow the Azure web app to load CNTK unmanaged dlls, you should change the PATH variable by adding the following code in the Application_Start() method in global.asax:
string pathValue = Environment.GetEnvironmentVariable("PATH"); string domainBaseDir = AppDomain.CurrentDomain.BaseDirectory; string cntkPath = domainBaseDir + @"bin"; pathValue += ";" + cntkPath; Environment.SetEnvironmentVariable("PATH", pathValue);

Please see the "Evaluate a model in an Azure WebApi" page for detailed steps.

I have just downloaded and installed CNTK binary package and want to run a job, but get weird errors, like missing CUDA 7.0 libraries (and I have downloaded CPU-only version!)

Please, check carefully what you have in your PATH. Especially, if it is a shared development machine. With high probability such errors are caused by a "forgotten" CNTK.exe file from a previous release or built from not very recent sources, that is reachable in the PATH.

On Widows I installed a new version of NVIDIA driver and now CNTK build fails with the errors like ..\Common\BestGpu.cpp(24): fatal error C1083: Cannot open include file: 'nvml.h': No such file or directory

You have selected Perform a clean installation option in NVIDIA Driver Installer. That results in the removal of GPU Deployment Kit (GDK). To repair the system, perform the following steps:

  • Launch CUDA Installer
  • Select Custom (Advanced) Installation
  • Unselect all installation options, except GPU Deployment Kit
  • This will automatically select Graphics Driver option - it is expected. Leave it selected
  • Proceed with CUDA installation
  • After successful CUDA installation launch the installation of the desired Graphics Driver version
  • Select Custom (Advanced) Installation
  • Ensure that Perform a clean installation is NOT selected and proceed with the installation

I'm getting one of the following exceptions: "OS call failed or operation not supported on this OS" or "EXCEPTION occurred: CUSPARSE failure 1".

One possible reason here is the excessive memory pressure caused by loading the whole data set in memory with the default (i.e., unlimited) randomization window. Please try running your workload with an explicit randomizationWindow value, which will limit the amount of input data cached in memory. To do that, add the following parameters to your reader configuration section (using 10000 as an example, you may choose any value that fits in memory and ensures good randomization):

   randomize=true
   randomizationWindow=10000 #(assuming that 10K samples << total available memory)

I get errors when using Eval C# library EvalWrapper.dll in Azure web app like the following: "Could not load file or assembly 'some CNTK dlls', or an exception System.Runtime.InteropServices.SEHException, or "InternalServiceFault: External component has thrown an exception.".

You might be able to run the web app successfully in a local instance of IIS (or IIS Express), but failed to run it in Azure. First please make sure that all CNTK dependent dlls are deployed to the Azure web app. Then you have to set your Azure web app to use 64-bit VM, and change the PATH variable to allow the Azure web app to load CNTK unmanaged dlls. Please see the How do I run Eval in Azure section for detailed steps.

Clone this wiki locally