forked from Theano/Theano
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathHISTORY.txt
721 lines (609 loc) · 37.7 KB
/
HISTORY.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
.. _HISTORY:
=================
Old Release Notes
=================
Theano 0.5 (23 February 2012)
=============================
Highlights:
* Moved to github: http://github.com/Theano/Theano/
* Old trac tickets moved to assembla tickets: http://www.assembla.com/spaces/theano/tickets
* Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people)
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
* Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm
and dot(vector, vector). (James, Frédéric, Pascal)
* C implementation of Alloc. (James, Pascal)
* theano.grad() now also works with sparse variables. (Arnaud)
* Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan)
* See the Interface changes.
Interface Behavior Changes:
* The current default value of the parameter axis of
theano.{max,min,argmax,argmin,max_and_argmax} is now the same as
numpy: None. i.e. operate on all dimensions of the tensor.
(Frédéric Bastien, Olivier Delalleau) (was deprecated and generated
a warning since Theano 0.3 released Nov. 23rd, 2010)
* The current output dtype of sum with input dtype [u]int* is now always [u]int64.
You can specify the output dtype with a new dtype parameter to sum.
The output dtype is the one used for the summation.
There is no warning in previous Theano versions about this.
The consequence is that the sum is done in a dtype with more precision than before.
So the sum could be slower, but will be more resistant to overflow.
This new behavior is the same as numpy. (Olivier, Pascal)
* When using a GPU, detect faulty nvidia drivers. This was detected
when running Theano tests. Now this is always tested. Faulty
drivers result in wrong results for reduce operations. (Frederic B.)
Interface Features Removed (most were deprecated):
* The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They
were accepted only by theano.function().
Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead.
* tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt
(list/tuple/TensorVariable). (Ian Goodfellow, Olivier)
* A few tag.shape and Join.vec_length left have been removed. (Frederic)
* The .value attribute of shared variables is removed, use shared.set_value()
or shared.get_value() instead. (Frederic)
* Theano config option "home" is not used anymore as it was redundant with "base_compiledir".
If you use it, Theano will now raise an error. (Olivier D.)
* scan interface changes: (Razvan Pascanu)
* The use of `return_steps` for specifying how many entries of the output
to return has been removed. Instead, apply a subtensor to the output
returned by scan to select a certain slice.
* The inner function (that scan receives) should return its outputs and
updates following this order:
[outputs], [updates], [condition].
One can skip any of the three if not used, but the order has to stay unchanged.
Interface bug fix:
* Rop in some case should have returned a list of one Theano variable,
but returned the variable itself. (Razvan)
New deprecation (will be removed in Theano 0.6, warning generated if you use them):
* tensor.shared() renamed to tensor._shared(). You probably want to
call theano.shared() instead! (Olivier D.)
Bug fixes (incorrect results):
* On CPU, if the convolution had received explicit shape information,
they were not checked at runtime. This caused wrong result if the
input shape was not the one expected. (Frederic, reported by Sander
Dieleman)
* Theoretical bug: in some case we could have GPUSum return bad value.
We were not able to reproduce this problem
* patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim):
01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic)
* div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James)
* theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value.
The grad is now disabled and returns an error. (Frederic)
* An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)"
and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your
code was affected by this bug. (Olivier, reported by Sander Dieleman)
* When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]),
an optimization replacing it with a direct indexing (x[d]) used an incorrect formula,
leading to incorrect results. (Pascal, reported by Razvan)
* The tile() function is now stricter in what it accepts to allow for better
error-checking/avoiding nonsensical situations. The gradient has been
disabled for the time being as it only implemented (incorrectly) one special
case. The `reps` argument must be a constant (not a tensor variable), and
must have the same length as the number of dimensions in the `x` argument;
this is now checked. (David)
Scan fixes:
* computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan)
before: most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug)
now: do the right thing.
* gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan)
before: it used to return wrong values
now: do the right thing.
Note: The reported case of this bug was happening in conjunction with the
save optimization of scan that give run time errors. So if you didn't
manually disable the same memory optimization (number in the list4),
you are fine if you didn't manually request multiple taps.
* Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan)
before: compilation error when computing R-op
now: do the right thing.
* save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan)
before: for certain corner cases used to result in a runtime shape error
now: do the right thing.
* Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes)
* Scan.infer_shape now works correctly when working with a condition for the number of loops.
In the past, it returned n_steps as the length, which is not always true. (Razvan)
* Scan.infer_shape crash fix. (Razvan)
New features:
* AdvancedIncSubtensor grad defined and tested (Justin Bayer)
* Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra)
* tensor.{zeros,ones}_like now supports the dtype param as numpy (Frederic)
* Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian)
* theano-cache list: list the content of the theano cache (Frederic)
* theano-cache unlock: remove the Theano cache lock (Olivier)
* tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic)
* MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic)
* used by tensor.{max,min,max_and_argmax}
* tensor.{all,any} (Razvan)
* tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley)
* Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban)
* IfElse now allows to have a list/tuple as the result of the if/else branches.
* They must have the same length and corresponding type (Razvan)
* Argmax output dtype is now int64 instead of int32. (Olivier)
* Added the element-wise operation arccos. (Ian)
* Added sparse dot with dense grad output. (Yann Dauphin)
* Optimized to Usmm and UsmmCscDense in some case (Yann)
* Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs.
The new theano.sparse.dot() has a dense gradient for all inputs.
* GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic)
* TensorVariable.zeros_like() and SparseVariable.zeros_like()
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic)
* theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic)
* Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder)
* We also support the "theano_version" substitution.
* IntDiv C code (faster and allows this elemwise to be fused with other elemwise) (Pascal)
* Internal filter_variable mechanism in Type. (Pascal, Ian)
* Ifelse works on sparse.
* It makes use of gpu shared variable more transparent with theano.function updates and givens parameter.
* Added a_tensor.transpose(axes) axes is optional (James)
* theano.tensor.transpose(a_tensor, kwargs) We were ignoring kwargs, now it is used as the axes.
* a_CudaNdarray_object[*] = int, now works (Frederic)
* tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier)
* sparse_variable.size (as scipy) computes the number of stored values. (Olivier)
* sparse_variable[N, N] now works (Li Yao, Frederic)
* sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal)
M, N, O, and P can be Python int or scalar tensor variables, None, or
omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work).
* tensor.tensordot can now be moved to GPU (Sander Dieleman,
Pascal, based on code from Tijmen Tieleman's gnumpy,
http://www.cs.toronto.edu/~tijmen/gnumpy.html)
* Many infer_shape implemented on sparse matrices op. (David W.F.)
* Added theano.sparse.verify_grad_sparse to easily allow testing grad of
sparse op. It supports testing the full and structured gradients.
* The keys in our cache now store the hash of constants and not the constant values
themselves. This is significantly more efficient for big constant arrays. (Frederic B.)
* 'theano-cache list' lists key files bigger than 1M (Frederic B.)
* 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.)
* 'theano-cache list' prints the number of compiled modules per op class (Frederic B.)
* The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file.
* Add the header_dirs to the hard part of the compilation key. This is
currently used only by cuda, but if we use libraries that are only headers,
this can be useful. (Frederic B.)
* The Theano flag "nvcc.flags" is now included in the hard part of the key.
This means that now we recompile all modules for each value of "nvcc.flags".
A change in "nvcc.flags" used to be ignored for modules that were already
compiled. (Frederic B.)
* Alloc, GpuAlloc are not always pre-computed (constant_folding optimization)
at compile time if all their inputs are constant.
(Frederic B., Pascal L., reported by Sander Dieleman)
* New Op tensor.sort(), wrapping numpy.sort (Hani Almousli)
New optimizations:
* AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic)
* dot22, dot22scalar work with complex. (Frederic)
* Generate Gemv/Gemm more often. (James)
* Remove scan when all computations can be moved outside the loop. (Razvan)
* scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan)
* exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier)
* Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume)
* Made the optimization process faster. (James)
* Allow fusion of elemwise when the scalar op needs support code. (James)
* Better opt that lifts transpose around dot. (James)
Crashes fixed:
* T.mean crash at graph building time. (Ian)
* "Interactive debugger" crash fix. (Ian, Frederic)
* Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin)
* Optimization crash with gemm and complex. (Frederic)
* GPU crash with elemwise. (Frederic, some reported by Chris Currivan)
* Compilation crash with amdlibm and the GPU. (Frederic)
* IfElse crash. (Frederic)
* Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal)
* GPU compilation crash on MacOS X. (Olivier)
* Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier)
* When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic)
* Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle)
* Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic)
* Fix dot22scalar cast of integer scalars (Justin Bayer, Frédéric, Olivier)
* Fix runtime crash in gemm, dot22. FB
* Fix on 32 bit computer: make sure all shapes are int64. (Olivier)
* Fix to deque on python 2.4 (Olivier)
* Fix crash when not using C code (or using DebugMode) (not used by
default) with numpy 1.6*. Numpy has a bug in the reduction code that
made it crash. (Pascal)
* Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU)
when matrices had non-unit stride in both dimensions (CPU and GPU),
or when matrices had negative strides (GPU only). In those cases,
we are now making copies. (Pascal)
* More cases supported in AdvancedIncSubtensor1. (Olivier D.)
* Fix crash when a broadcasted constant was used as input of an
elemwise Op and needed to be upcasted to match the op's output.
(Reported by John Salvatier, fixed by Pascal L.)
* Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.)
Known bugs:
* CAReduce with nan in inputs don't return the good output (`Ticket <https://www.assembla.com/spaces/theano/tickets/763>`_).
* This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
Sandbox:
* cvm interface more consistent with current linker. (James)
* Now all tests pass with the linker=cvm flags.
* vm linker has a callback parameter. (James)
* review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier)
* review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume)
* review/finish/doc: MatrixInverse, matrix_inverse. (Razvan)
* review/finish/doc: matrix_dot. (Razvan)
* review/finish/doc: det (determinent) op. (Philippe Hamel)
* review/finish/doc: Cholesky determinent op. (David)
* review/finish/doc: ensure_sorted_indices. (Li Yao)
* review/finish/doc: spectral_radius_boud. (Xavier Glorot)
* review/finish/doc: sparse sum. (Valentin Bisson)
* review/finish/doc: Remove0 (Valentin)
* review/finish/doc: SquareDiagonal (Eric)
Sandbox New features (not enabled by default):
* CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James)
* New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan)
Documentation:
* Many updates. (Many people)
* Updates to install doc on MacOS. (Olivier)
* Updates to install doc on Windows. (David, Olivier)
* Doc on the Rop function (Ian)
* Added how to use scan to loop with a condition as the number of iteration. (Razvan)
* Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic)
* Refactored GPU installation of Theano. (Olivier)
Others:
* Better error messages in many places. (Many people)
* PEP8 fixes. (Many people)
* Add a warning about numpy bug when using advanced indexing on a
tensor with more than 2**32 elements (the resulting array is not
correctly filled and ends with zeros). (Pascal, reported by David WF)
* Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan)
* New min_informative_str() function to print graph. (Ian)
* Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier)
* Better support for utf string. (David)
* Fix pydotprint with a function compiled with a ProfileMode (Frederic)
* Was broken with change to the profiler.
* Warning when people have old cache entries. (Olivier)
* More tests for join on the GPU and CPU. (Frederic)
* Do not request to load the GPU module by default in scan module. (Razvan)
* Fixed some import problems. (Frederic and others)
* Filtering update. (James)
* On Windows, the default compiledir changed to be local to the
computer/user and not transferred with roaming profile. (Sebastian
Urban)
* New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior):
it prints a warning when an error occurs when inferring the shape of some apply node.
The other accepted value is "raise" to raise an error when this happens. (Frederic)
* The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic)
* better pycuda tests (Frederic)
* check_blas.py now accepts the shape and the number of iterations as parameter (Frederic)
* Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic)
* More internal verification on what each op.infer_shape return. (Frederic, James)
* Improved docstring and basic tests for the Tile Op (David).
Reviewers (alphabetical order):
* David, Frederic, Ian, James, Olivier, Razvan
Theano 0.4.1 (12 August 2011)
=============================
New features:
* `R_op <http://deeplearning.net/software/theano/tutorial/gradients.html>`_ macro like theano.tensor.grad
* Not all tests are done yet (TODO)
* Added alias theano.tensor.bitwise_{and,or,xor,not}. They are the numpy names.
* Updates returned by Scan (you need to pass them to the theano.function) are now a new Updates class.
That allow more check and easier work with them. The Updates class is a subclass of dict
* Scan can now work in a "do while" loop style.
* We scan until a condition is met.
* There is a minimum of 1 iteration(can't do "while do" style loop)
* The "Interactive Debugger" (compute_test_value theano flags)
* Now should work with all ops (even the one with only C code)
* In the past some errors were caught and re-raised as unrelated errors (ShapeMismatch replaced with NotImplemented). We don't do that anymore.
* The new Op.make_thunk function(introduced in 0.4.0) is now used by constant_folding and DebugMode
* Added A_TENSOR_VARIABLE.astype() as a way to cast. NumPy allows this syntax.
* New BLAS GER implementation.
* Insert GEMV more frequently.
* Added new ifelse(scalar condition, rval_if_true, rval_if_false) Op.
* This is a subset of the elemwise switch (tensor condition, rval_if_true, rval_if_false).
* With the new feature in the sandbox, only one of rval_if_true or rval_if_false will be evaluated.
Optimizations:
* Subtensor has C code
* {Inc,Set}Subtensor has C code
* ScalarFromTensor has C code
* dot(zeros,x) and dot(x,zeros)
* IncSubtensor(x, zeros, idx) -> x
* SetSubtensor(x, x[idx], idx) -> x (when x is a constant)
* subtensor(alloc,...) -> alloc
* Many new scan optimization
* Lower scan execution overhead with a Cython implementation
* Removed scan double compilation (by using the new Op.make_thunk mechanism)
* Certain computations from the inner graph are now Pushed out into the outer
graph. This means they are not re-comptued at every step of scan.
* Different scan ops get merged now into a single op (if possible), reducing
the overhead and sharing computations between the two instances
GPU:
* PyCUDA/CUDAMat/Gnumpy/Theano bridge and `documentation <http://deeplearning.net/software/theano/tutorial/gpu_data_convert.html>`_.
* New function to easily convert pycuda GPUArray object to and from CudaNdarray object
* Fixed a bug if you crated a view of a manually created CudaNdarray that are view of GPUArray.
* Removed a warning when nvcc is not available and the user did not requested it.
* renamed config option cuda.nvccflags -> nvcc.flags
* Allow GpuSoftmax and GpuSoftmaxWithBias to work with bigger input.
Bugs fixed:
* In one case an AdvancedSubtensor1 could be converted to a GpuAdvancedIncSubtensor1 insted of GpuAdvancedSubtensor1.
It probably didn't happen due to the order of optimizations, but that order is not guaranteed to be the same on all computers.
* Derivative of set_subtensor was wrong.
* Derivative of Alloc was wrong.
Crash fixed:
* On an unusual Python 2.4.4 on Windows
* When using a C cache copied from another location
* On Windows 32 bits when setting a complex64 to 0.
* Compilation crash with CUDA 4
* When wanting to copy the compilation cache from a computer to another
* This can be useful for using Theano on a computer without a compiler.
* GPU:
* Compilation crash fixed under Ubuntu 11.04
* Compilation crash fixed with CUDA 4.0
Know bug:
* CAReduce with nan in inputs don't return the good output (`Ticket <http://trac-hg.assembla.com/theano/ticket/763>`_).
* This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements.
* This is not a new bug, just a bug discovered since the last release that we didn't had time to fix.
Deprecation (will be removed in Theano 0.5, warning generated if you use them):
* The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead.
* The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead.
* scan interface change:
* The use of `return_steps` for specifying how many entries of the output
scan has been deprecated
* The same thing can be done by applying a subtensor on the output
return by scan to select a certain slice
* The inner function (that scan receives) should return its outputs and
updates following this order:
[outputs], [updates], [condition]. One can skip any of the three if not
used, but the order has to stay unchanged.
* tensor.grad(cost, wrt) will return an object of the "same type" as wrt
(list/tuple/TensorVariable).
* Currently tensor.grad return a type list when the wrt is a list/tuple of
more than 1 element.
Decrecated in 0.4.0(Reminder, warning generated if you use them):
* Dividing integers with / is deprecated: use // for integer division, or
cast one of the integers to a float type if you want a float result (you may
also change this behavior with config.int_division).
* tag.shape attribute deprecated (#633)
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
Sandbox:
* MRG random generator now implements the same casting behavior as the regular random generator.
Sandbox New features(not enabled by default):
* New Linkers (theano flags linker={vm,cvm})
* The new linker allows lazy evaluation of the new ifelse op, meaning we compute only the true or false branch depending of the condition. This can speed up some types of computation.
* Uses a new profiling system (that currently tracks less stuff)
* The cvm is implemented in C, so it lowers Theano's overhead.
* The vm is implemented in python. So it can help debugging in some cases.
* In the future, the default will be the cvm.
* Some new not yet well tested sparse ops: theano.sparse.sandbox.{SpSum, Diag, SquareDiagonal, ColScaleCSC, RowScaleCSC, Remove0, EnsureSortedIndices, ConvolutionIndices}
Documentation:
* How to compute the `Jacobian, Hessian, Jacobian times a vector, Hessian times a vector <http://deeplearning.net/software/theano/tutorial/gradients.html>`_.
* Slide for a 3 hours class with exercises that was done at the HPCS2011 Conference in Montreal.
Others:
* Logger name renamed to be consistent.
* Logger function simplified and made more consistent.
* Fixed transformation of error by other not related error with the compute_test_value Theano flag.
* Compilation cache enhancements.
* Made compatible with NumPy 1.6 and SciPy 0.9
* Fix tests when there was new dtype in NumPy that is not supported by Theano.
* Fixed some tests when SciPy is not available.
* Don't compile anything when Theano is imported. Compile support code when we compile the first C code.
* Python 2.4 fix:
* Fix the file theano/misc/check_blas.py
* For python 2.4.4 on Windows, replaced float("inf") with numpy.inf.
* Removes useless inputs to a scan node
* Beautification mostly, making the graph more visible. Such inputs would appear as a consequence of other optimizations
Core:
* there is a new mechanism that lets an Op permit that one of its
inputs to be aliased to another destroyed input. This will generally
result in incorrect calculation, so it should be used with care! The
right way to use it is when the caller can guarantee that even if
these two inputs look aliased, they actually will never overlap. This
mechanism can be used, for example, by a new alternative approach to
implementing Scan. If an op has an attribute called
"destroyhandler_tolerate_aliased" then this is what's going on.
IncSubtensor is thus far the only Op to use this mechanism.Mechanism
Theano 0.4.0 (2011-06-13)
=========================
Change in output memory storage for Ops:
If you implemented custom Ops, with either C or Python implementation,
this will concern you.
The contract for memory storage of Ops has been changed. In particular,
it is no longer guaranteed that output memory buffers are either empty,
or allocated by a previous execution of the same Op.
Right now, here is the situation:
* For Python implementation (perform), what is inside output_storage
may have been allocated from outside the perform() function, for
instance by another node (e.g., Scan) or the Mode. If that was the
case, the memory can be assumed to be C-contiguous (for the moment).
* For C implementations (c_code), nothing has changed yet.
In a future version, the content of the output storage, both for Python and C
versions, will either be NULL, or have the following guarantees:
* It will be a Python object of the appropriate Type (for a Tensor variable,
a numpy.ndarray, for a GPU variable, a CudaNdarray, for instance)
* It will have the correct number of dimensions, and correct dtype
However, its shape and memory layout (strides) will not be guaranteed.
When that change is made, the config flag DebugMode.check_preallocated_output
will help you find implementations that are not up-to-date.
Deprecation:
* tag.shape attribute deprecated (#633)
* CudaNdarray_new_null is deprecated in favour of CudaNdarray_New
* Dividing integers with / is deprecated: use // for integer division, or
cast one of the integers to a float type if you want a float result (you may
also change this behavior with config.int_division).
* Removed (already deprecated) sandbox/compile module
* Removed (already deprecated) incsubtensor and setsubtensor functions,
inc_subtensor and set_subtensor are to be used instead.
Bugs fixed:
* In CudaNdarray.__{iadd,idiv}__, when it is not implemented, return the error.
* THEANO_FLAGS='optimizer=None' now works as expected
* Fixed memory leak in error handling on GPU-to-host copy
* Fix relating specifically to Python 2.7 on Mac OS X
* infer_shape can now handle Python longs
* Trying to compute x % y with one or more arguments being complex now
raises an error.
* The output of random samples computed with uniform(..., dtype=...) is
guaranteed to be of the specified dtype instead of potentially being of a
higher-precision dtype.
* The perform() method of DownsampleFactorMax did not give the right result
when reusing output storage. This happen only if you use the Theano flags
'linker=c|py_nogc' or manually specify the mode to be 'c|py_nogc'.
Crash fixed:
* Work around a bug in gcc 4.3.0 that make the compilation of 2d convolution
crash.
* Some optimizations crashed when the "ShapeOpt" optimization was disabled.
Optimization:
* Optimize all subtensor followed by subtensor.
GPU:
* Move to the gpu fused elemwise that have other dtype then float32 in them
(except float64) if the input and output are float32.
* This allow to move elemwise comparisons to the GPU if we cast it to
float32 after that.
* Implemented CudaNdarray.ndim to have the same interface in ndarray.
* Fixed slowdown caused by multiple chained views on CudaNdarray objects
* CudaNdarray_alloc_contiguous changed so as to never try to free
memory on a view: new "base" property
* Safer decref behaviour in CudaNdarray in case of failed allocations
* New GPU implementation of tensor.basic.outer
* Multinomial random variates now available on GPU
New features:
* ProfileMode
* profile the scan overhead
* simple hook system to add profiler
* reordered the output to be in the order of more general to more specific
* DebugMode now checks Ops with different patterns of preallocated memory,
configured by config.DebugMode.check_preallocated_output.
* var[vector of index] now work, (grad work recursively, the direct grad
work inplace, gpu work)
* limitation: work only of the outer most dimensions.
* New way to test the graph as we build it. Allow to easily find the source
of shape mismatch error:
`http://deeplearning.net/software/theano/tutorial/debug_faq.html#interactive-debugger`__
* cuda.root inferred if nvcc is on the path, otherwise defaults to
/usr/local/cuda
* Better graph printing for graphs involving a scan subgraph
* Casting behavior can be controlled through config.cast_policy,
new (experimental) mode.
* Smarter C module cache, avoiding erroneous usage of the wrong C
implementation when some options change, and avoiding recompiling the
same module multiple times in some situations.
* The "theano-cache clear" command now clears the cache more thoroughly.
* More extensive linear algebra ops (CPU only) that wrap scipy.linalg
now available in the sandbox.
* CUDA devices 4 - 16 should now be available if present.
* infer_shape support for the View op, better infer_shape support in Scan
* infer_shape supported in all case of subtensor
* tensor.grad now gives an error by default when computing the gradient
wrt a node that is disconnected from the cost (not in the graph, or
no continuous path from that op to the cost).
* New tensor.isnan and isinf functions.
Documentation:
* Better commenting of cuda_ndarray.cu
* Fixes in the scan documentation: add missing declarations/print statements
* Better error message on failed __getitem__
* Updated documentation on profile mode
* Better documentation of testing on Windows
* Better documentation of the 'run_individual_tests' script
Unit tests:
* More strict float comparaison by default
* Reuse test for subtensor of tensor for gpu tensor(more gpu test)
* Tests that check for aliased function inputs and assure appropriate copying
(#374)
* Better test of copies in CudaNdarray
* New tests relating to the new base pointer requirements
* Better scripts to run tests individually or in batches
* Some tests are now run whenever cuda is available and not just when it has
been enabled before
* Tests display less pointless warnings.
Other:
* Correctly put the broadcast flag to True in the output var of
a Reshape op when we receive an int 1 in the new shape.
* pydotprint: high contrast mode is now the default, option to print
more compact node names.
* pydotprint: How trunk label that are too long.
* More compact printing (ignore leading "Composite" in op names)
Theano 0.3.1 (2011-02-21)
=========================
Deprecation:
* The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`!
See http://deeplearning.net/software/theano/tutorial/aliasing.html
Bugs fixed:
* The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the CPU and GPU.
* In some cases, there was a (possibly large) fraction of non-random garbage in the returned sequence.
* In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
* Scan cached the number of steps. This caused no problem because each time you called scan the number of steps would got refreshed.
The problem was when you called ScanGrad which would use the cached number of steps without refreshing it.
To be affected by this bug, one would have to compile two graph, one that would contain a Scan and the other the corresponding GradScan, and
call the first function to cache the number of steps, and then call the second function with a different number of steps.
* In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
The error was not found before as the impact was less then the relative tolerance of 1e-3. Now the relative tolerance is 1e-5.
Crash fixed:
* Add a feature to not have an exception that makes Theano crash when taking the gradient on DimShuffle in some particular case.
* Compilation crash for GpuElemwise with tensor with high number of dimensions (~6 or more).
* Disabled C code generator that make gcc crash on complex type.
* Crash in optimization when an Op has no input.
* Output shape is now computed correctly for matrix-vector multiplication on GPU.
* In Scan, when using numbers as inputs, not symbolic variables.
* In GradScan, when there is only 1 inputs in the Scan.
* In GpuSum, bug in calculation of n_blocks for the 10 pattern. (Sum on the row of a matrix)
* Some segfault at exit with GPU code.
Optimization:
* New SpecifyShape op that allow to pass more shape info in the graph.
* Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
* Remove join of only 1 element.
* During optimization, consider one more case in get_constant_value.
GPU:
* cuda_shared.value = X now works inplace!
* cuda_shared_var.set_value(new_ndarray) will overwrite the old value inplace in the most common case.
* Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
* New init_gpu_device theano flags.
* Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
* CPU join of only 1 element that was not moved to the GPU.
New features:
* tensor.reshape now makes dimensions of length 1 broadcastable.
* tensor.prod now implements the gradient.
* DebugMode now warns if an Op declared itself as returning a view of the input but did not do so.
* This behaviour is a problem, because it can block other Ops from being inplace on the same inputs. This could lower the reuse of memory.
* Sparse.structured_dot now works when both matrices are sparse
* Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
* New 3D convolution ops, with CPU and GPU implementations.
* New colors in pydotprint.
Documentation:
* Documented lib.amdlibm and (new) init_gpu_device config variables.
* A new page (was done for 0.3 but an error was hiding it on the web page) on the memory aliasing contract of Theano.
* Revision to the Windows installation instructions.
* The cuda documentation is now generated on the web server.
* Better documentation of .theanorc and its sections.
Unit tests:
* Stop usage of deprecated functions or syntax in the unit tests.
* Better testing of GPU convolution nets.
* Make more tests able to use different random seeds.
* Tests of sparse now use default mode, not a hard-coded one.
* Remove some tests of unimplemented features.
Other:
* The name of compiledir now includes the Python version to make it easier for people with many Python versions
* Added theano.tensor.std as a shortcut to sqrt(var(input=input, axis=axis)).
* Whitespace, tabulation and indentation clean-up in the code.
* Better detection of memory sharing between variables.
Theano 0.3 (2010-11-23)
=======================
This is the first major release of Theano since 0.1. Version 0.2 development started internally but it was never advertised as a release.
There have been so many changes since 0.1 that we have lost track of many of them. Below is a *partial* list of changes since 0.1.
* GPU code using NVIDIA's CUDA framework is now generated for many Ops.
* Some interface changes since 0.1:
* A new "shared variable" system to allow reusing memory space between Theano functions.
* A new memory contract has been formally written for Theano, for people who want to minimize memory copies.
* The old module system has been deprecated.
* By default, inputs to a Theano function will not be silently downcasted (e.g. from float64 to float32).
* An error is now raised when using the result of logical operation on Theano variable in an 'if' (i.e. an implicit call to __nonzeros__).
* An error is now raised when we receive a non-aligned ndarray as input to a function (this is not supported).
* An error is raised when the list of dimensions passed to dimshuffle() contains duplicates or is otherwise not sensible.
* Call NumPy BLAS bindings for gemv operations in addition to the already supported gemm.
* If gcc is unavailable at import time, Theano now falls back to a Python-based emulation mode after raising a warning.
* An error is now raised when tensor.grad is called on a non-scalar Theano variable (in the past we would implicitly do a sum on the tensor to make it a scalar).
* Added support for "erf" and "erfc" functions.
* The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is deprecated. We now use the default NumPy behavior of operating on the entire tensor.
* Theano is now available from PyPI and installable through "easy_install" or "pip".
Theano 0.1
==========
*Release date: 2009-04-02*
What works
----------
- building symbolic expression.
- arranging symbolic expressions into Modules so that multiple functions
can work on the same data.
- symbolic gradient descent.
- graph optimization.
- compilation to C for many kinds of expression.
- a debugging mode that checks that your expression results are correct,
using a variety of sanity checks.
What's missing?
---------------
- An algorithm library. We're missing a library of examples and standard
component implementations. Some examples will find their way into
the Theano repo, but standard algorithms will go into the 'pylearn'
project (toolbox style). Now that we have a stable foundation, we
can reach a consensus on style for algorithms.