-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path17_adaptive_merge_sort.html
336 lines (287 loc) · 13.3 KB
/
17_adaptive_merge_sort.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width">
<title>17. Adaptive merge sort</title>
<link rel="stylesheet" href="template/style.css">
</head>
<body>
<span style="float: right">
[
<a href="16_optimizing_stable_sort.html">previous</a>,
<a href="index.html">contents</a>,
<a href="18_binary_insertion_sort.html">next</a>
]
</span>
<a name="L17.-Adaptive-merge-sort"></a>
<h1>17. Adaptive merge sort</h1>
<a name="L-22-temporary-22--buffers-in-STL"></a>
<h2>“temporary” buffers in STL</h2>
<p>When I was implementing STL
I said that the OS vendor should know how much extra storage is available.
So, I’m going to have this function called <code>get_temporary_buffer</code>
which allocates as much physical memory is available at this moment.
It’s the only outside hook which STL will use.
But, it is vendor specific, you cannot do it as a client.
There is actually no call in UNIX which tells you how much physical memory
you have, how much is used, it’s just impossible.
But, I needed to ship it, and in order to do that, I couldn’t just require them to add a hook.
So I wrote the following thing:</p>
<pre><code>// ask for n, system gives you as much as it can, but not more than n.
std::pair<void*, size_t> get_temporary_buffer(size_t n) {
// this is bogus code and needs to be replaced
void* buffer = malloc(n);
while (!buffer && n) {
n = n >> 1;
buffer = malloc(n);
}
return std::make_pair(buffer, n);
}
</code></pre>
<p>So it binary searches for a buffer small enough to fit<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>.
Is it a useful piece of code? No.
But, I had to ship.
Guess what happened after that.
You might think they didn’t change it.
You’re wrong, they did change it.
They removed my comment.</p>
<p>Every implementation, UNIX, Microsoft, Apple,
does the same binary search.
I have been telling this story for decades, nothing.
Therefore, my function always returns whatever you ask, because
it just allocates <a href="https://en.wikipedia.org/wiki/Virtual_memory">virtual memory</a><sup id="fnref:2"><a href="#fn:2" rel="footnote">2</a></sup>.
It used to be a problem when we had 16-bit address spaces.
But, we have 64 bit address spaces.</p>
<p>If it was correct, I think it would be useful.
You want as much physical memory as the system has.
There is virtual memory but virtual memory is actually useless unless it’s backed by physical memory.
It’s useful for remapping things<sup id="fnref:3"><a href="#fn:3" rel="footnote">3</a></sup>.
But, it is a figment of imagination.
It does not exist.
As <a href="https://en.wikipedia.org/wiki/Seymour_Cray">Seymour Cray</a> used to say, “you can’t simulate what you do not have”<sup id="fnref:4"><a href="#fn:4" rel="footnote">4</a></sup>.
If your algorithm working set doesn’t fit into physical memory,
it will not just <a href="https://en.wikipedia.org/wiki/Thrashing_(computer_science)">thrash</a>, your program will not terminate,
because your memory starts working at the speed of a disk.
That’s not good enough.</p>
<p>It just shows you how imperfect life is.</p>
<a name="Merge-with-buffer"></a>
<h2>Merge with buffer</h2>
<p>The first thing we are going
to write is <code>merge_with_buffer</code>.
Let’s assume that this buffer is big enough.
Eventually we will have to figure out how to deal with limited buffers.
Right, now let’s assume whoever is going to call is going to assure it.
What we do is copy from the first range into our buffer,
then we merge back into the original buffer.</p>
<p>For now we can rely on <a href="https://en.cppreference.com/w/cpp/algorithm/merge"><code>std::merge</code></a>,
but later we will write a better one.</p>
<pre><code>template <typename I, typename R, typename B>
// requires I is ForwardIterator
// requires R is StrictWeakOrdering
// requires B is ForwardIterator
void merge_with_buffer(I first, I middle, I last, R r, B buffer) {
B buffer_last = std::copy(first, middle, buffer);
std::merge(buffer, buffer_last, middle, last, first, r);
}
</code></pre>
<p>Even though we aren’t worried about it now, we can see the buffer
will need to be big enough to copy the entire left half in,
so about size <code>n/2</code>.</p>
<p>Note that the buffer doesn’t have to match the type of the container.
We will probably use an array for buffer,
but <code>I</code> could be an iterator for a linked list.
This is a general principle, relax type requirements.</p>
<p>Let’s grab our in-place sort from before and modify it.
It’s identical to our other,
except it uses our new <code>merge_with_buffer</code>.
Should this function allocate the buffer?
No because it is recursive.</p>
<pre><code>template <typename I, typename N, typename R, typename B>
// I is ForwardIterator
// N is Integral
// R is WeakStrictOrdering on the value type of I
I sort_inplace_n_with_buffer(I first, N n, R r, B buffer) {
if (!n) return first;
N half = n >> 1;
if (!half) return ++first;
I middle = sort_inplace_n_with_buffer(first, half, r, buffer);
I last = sort_inplace_n_with_buffer(middle, n - half, r, buffer);
merge_with_buffer(first, middle, last, r, buffer);
return last;
}
</code></pre>
<p>Note we put the buffer argument at the end, because
we are extending the interface of the previous sort.</p>
<p>Now to use it in our framework we need a more convenient interface.
We have too many parameters, so we need to somehow get rid of all of them.
We write a wrapper.</p>
<pre><code>template <typename I>
// I is ForwardIterator
inline
void sort_inplace_with_buffer(I first, I last) {
typedef typename std::iterator_traits<I>::value_type T;
typedef typename std::iterator_traits<I>::difference_type N;
N n = std::distance(first, last);
std::vector<T> buffer(n >> 1);
sort_inplace_n_with_buffer(first, n, std::less<T>(), buffer.begin());
}
</code></pre>
<a name="Performance-test"></a>
<h3>Performance test</h3>
<p>Let’s give it a try in the same timing framework we had before<sup id="fnref:5"><a href="#fn:5" rel="footnote">5</a></sup>.</p>
<pre><code>Sorting double from 8 up to 2097152 elements generated with iota at: Mon Jun 28 21:02:41 2021
size stable inplace with_buffer
8 14 20 16
16 8 28 19
32 8 39 19
64 10 48 22
128 10 59 25
256 13 78 29
512 17 107 36
1024 27 136 39
2048 34 147 49
4096 45 172 59
8192 54 192 67
16384 62 209 75
32768 67 227 81
65536 72 245 86
131072 76 263 92
262144 82 282 97
524288 88 303 104
1048576 94 323 109
2097152 98 343 115
</code></pre>
<p>Look at how fast that is.
It’s already within 10% of our goal.
It shows us the spectrum of what’s possible.</p>
<a name="Adaptive-merge"></a>
<h2>Adaptive merge</h2>
<p>We play the copy and paste game with our work from before.
The function <code>merge_inplace_left_subproblem</code>
and the right variant, do not need to be changed,
so they can be included.</p>
<pre><code>template <typename I, typename N, typename R, typename B>
// I is ForwardIterator
// N is Integral
// R is WeakStrictOrdering on the value type of I
void merge_adaptive_n(I f0, N n0,
I f1, N n1, R r, B buffer, N buffer_size) {
// precondition std::distance(f0, f1) == n0
// precondition is_sorted_n(f0, n0, r) && is_sorted(f1, n1, r)
if (!n0 || !n1) return;
if (n0 <= buffer_size) {
I last = f1;
std::advance(last, n1);
merge_with_buffer(f0, f1, last, r, buffer);
return;
}
I f0_0, f0_1, f1_0, f1_1;
N n0_0, n0_1, n1_0, n1_1;
if (n0 < n1) merge_inplace_left_subproblem(f0, n0,
f1, n1,
f0_0, n0_0,
f0_1, n0_1,
f1_0, n1_0,
f1_1, n1_1,
r);
else merge_inplace_right_subproblem(f0, n0,
f1, n1,
f0_0, n0_0,
f0_1, n0_1,
f1_0, n1_0,
f1_1, n1_1,
r);
merge_adaptive_n(f0_0, n0_0, f0_1, n0_1, r, buffer, buffer_size);
merge_adaptive_n(f1_0, n1_0, f1_1, n1_1, r, buffer, buffer_size);
}
</code></pre>
<p>Now we know the drill to turn this into sort.
I will just show the sort so we can see the buffer allocation:</p>
<pre><code>template <typename I>
// I is ForwardIterator
inline
void sort_1_8(I first, I last) {
typedef typename std::iterator_traits<I>::value_type T;
typedef typename std::iterator_traits<I>::difference_type N;
N n = std::distance(first, last);
std::vector<T> buffer(n >> 3);
sort_adaptive_n(first, n, std::less<T>(), buffer.begin(), N(buffer.size()));
}
</code></pre>
<a name="Performance-test"></a>
<h3>Performance test</h3>
<p>How does this one do?
Worse than before,
but we are also using about 10x less memory.</p>
<pre><code>Sorting double from 8 up to 2097152 elements generated with iota at: Mon Jun 28 21:32:08 2021
size stable inplace merging 1_8th
8 13 19 13 20
16 7 24 12 23
32 7 38 17 27
64 9 44 17 25
128 9 54 21 27
256 12 72 25 29
512 15 99 29 34
1024 25 129 37 41
2048 36 158 48 51
4096 46 181 60 62
8192 56 209 71 73
16384 64 225 80 83
32768 93 240 84 88
65536 74 261 90 92
131072 78 280 97 108
262144 96 314 101 104
524288 91 322 107 110
1048576 99 343 113 116
2097152 102 403 120 123
</code></pre>
<a name="Code"></a>
<h2>Code</h2>
<ul>
<li><a href="code/merge.h">merge.h</a></li>
<li><a href="code/test_sort.cpp">test_sort.cpp</a></li>
<li><a href="code/test_sort.h">test_sort.h</a></li>
<li><a href="code/test_temp_buffer.cpp">test_temp_buffer.cpp</a></li>
</ul>
<div class="footnotes">
<hr/>
<ol>
<li id="fn:1">
<code>malloc</code> returns <code>NULL</code> when it fails to allocate of the requested size.
Alex’s <code>get_temporary_buffer</code> function uses that as an indicator that the requested buffer was too large
and continues attempting smaller and smaller buffers.<a href="#fnref:1" rev="footnote">↩</a></li>
<li id="fn:2">
<p>Virtual memory allows programs to allocate more memory than is physically available
by saving and loading portions of memory to disk as needed.
When memory is fully utilized the system starts working slower rather than simply crashing.</p>
<p>Even though the total amount of virtual memory available on a system is very large,
individual memory allocations are typically limited.
For example, when testing this code on Linux, the system only
allows a program to allocate a buffer up to the total physical memory size.</p>
<p>What this means is that Alex’s implementation of <code>get_temporary_buffer</code> is not useful.
It is equivalent to <code>malloc(n)</code> for anything but extremely large allocations.</p>
<p><strong>Exercise:</strong> Experiment with <code>get_temporary_buffer</code> on your machine. How large of an allocation will it give you?</p><a href="#fnref:2" rev="footnote">↩</a></li>
<li id="fn:3">
<p>Memory mapping files is a very useful application of virtual memory.
When a program wants to interact with a file on disk it can instead request that
the system map it to a range in memory.
The file can then be manipulated by reading and writing to pointers as if it was a buffer instead of a file.
In other words, the program can interact with the file, just like other data.
See <a href="https://man7.org/linux/man-pages/man2/mmap.2.html">mmap(2)</a> for details.</p>
<p>Alex has used memory mapped files in his own code.</p><a href="#fnref:3" rev="footnote">↩</a></li>
<li id="fn:4">
I cannot find a reference to this quotation.<a href="#fnref:4" rev="footnote">↩</a></li>
<li id="fn:5">
AMD Ryzen 5 2400G (8 core, 3.6 GHz). GCC 9.3.0<a href="#fnref:5" rev="footnote">↩</a></li>
</ol>
</div>
<span style="float: right">
[
<a href="16_optimizing_stable_sort.html">previous</a>,
<a href="index.html">contents</a>,
<a href="18_binary_insertion_sort.html">next</a>
]
</span>
</body>
</html>