-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathincite.go
508 lines (477 loc) · 21.8 KB
/
incite.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
// Copyright 2022 The incite Authors. All rights reserved.
// Use of this source code is governed by an MIT-style
// license that can be found in the LICENSE file.
package incite
import (
"io"
"time"
)
// QuerySpec specifies the parameters for a query operation either
// using the global Query function or a QueryManager's Query method.
type QuerySpec struct {
// Text contains the actual text of the CloudWatch Insights query.
//
// Text must not contain an empty or blank string. Beyond checking
// for blank text, Incite does not attempt to parse Text and simply
// forwards it to the CloudWatch Logs service. Care must be taken to
// specify query text compatible with the Chunk, Preview, and Split
// fields, or the results may be confusing.
//
// To limit the number of results returned by the query, use the
// Limit field, since the Insights API seems to ignore the `limit`
// command when specified in the query text. Note that if the
// QuerySpec specifies a chunked query, then Limit will apply to the
// results obtained from each chunk, not to the global query.
//
// To learn the Insights query syntax, please see the official
// documentation at:
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html.
Text string
// Groups lists the names of the CloudWatch Logs log groups to be
// queried. It may not be empty.
Groups []string
// Start specifies the beginning of the time range to query,
// inclusive of Start itself.
//
// Start must be strictly before End, and must represent a whole
// number of milliseconds (it cannot have sub-millisecond
// granularity).
Start time.Time
// End specifies the end of the time range to query, exclusive of
// End itself.
//
// End must be strictly after Start, and must represent a whole
// number of milliseconds (it cannot have sub-millisecond
// granularity).
End time.Time
// Limit optionally specifies the maximum number of results to be
// returned by the query.
//
// If Limit is zero or negative, the value DefaultLimit is used
// instead. If Limit exceeds MaxLimit, the query operation will fail
// with an error.
//
// In a chunked query, Limit applies to each chunk separately, so
// up to (n × Limit) final results may be returned, where n is the
// number of chunks.
//
// Note that as of 2021-07-15, the CloudWatch Logs StartQuery API
// seems to ignore the `limit` command in the query text, so if you
// want to apply a limit you must use the Limit field.
Limit int64
// Chunk optionally requests a chunked query and indicates the chunk
// size.
//
// In a chunked query, the query time range is subdivided into
// smaller time chunks of duration Chunk, and each chunk is sent to
// the CloudWatch Logs service as a separate Insights query. This
// can help large queries avoid the CloudWatch Logs Insights
// result size limit, MaxLimit; help queries complete before the
// CloudWatch Logs Insights query timeout of 15 minutes; and can
// increase performance, because multiple chunks can be run in
// parallel.
//
// If Chunk is zero, negative, or greater than the difference
// between End and Start, the query is not chunked. If Chunk is
// positive and less than the difference between End and Start, the
// query is broken into n or n+1 chunks, where n is
// (End-Start)/Chunk, rounded up to the nearest integer value. If
// Chunk is positive, it must represent a whole number of
// milliseconds (cannot have sub-millisecond granularity).
//
// For chunked queries, the following special considerations apply:
//
// • In a chunked query, Limit applies separately to each chunk. So
// a query with 50 chunks and a limit of 1,000 could produce up to
// 50,000 final results.
//
// • If Text contains a sort command, the sort will only apply
// within each individual chunk. If the QuerySpec is executed by a
// QueryManager configured with a parallelism factor above 1, then
// the results may appear to be out of order since the order of
// chunk completion is not guaranteed.
//
// • If Text contains a stats command, the statistical aggregation
// will be applied separately to each chunk in a chunked query,
// meaning up to n+1 versions of each aggregate data point may be
// returned, one per chunk, potentially necessitating further
// aggregation in your application logic.
//
// • In general if you use chunking with query text which implies
// any kind of server-side post-processing, of which sorting and
// statistical aggregation are two examples, you may need to perform
// custom post-processing within your application to put the results
// into the final form you expect.
Chunk time.Duration
// Preview optionally requests preview results from a running query.
//
// If Preview is true, intermediate results for the query are
// sent to the result Stream as soon as they are available. This
// can improve your application's responsiveness for the end user
// but requires care since not all intermediate results are valid
// members of the final result set.
//
// When Preview is true, the query result Stream may produce some
// intermediate results which it later determines are invalid
// because they shouldn't be final members of the result set. For
// each such invalid result, an extra dummy Result will be sent to
// the result Stream with the following structure:
//
// incite.Result{
// { Field: "@ptr", Value: "<Unique @ptr of the earlier invalid result>" },
// { Field: "@deleted", Value: "true" },
// }
//
// The presence of the "@deleted" field can be used to identify and
// delete the earlier invalid result sharing the same "@ptr" field.
//
// If the results from CloudWatch Logs do not contain an @ptr field,
// the Preview option does not detect invalidated results and
// consequently does not create dummy @deleted items. Since the
// `stats` command creates results that do not contain @ptr,
// applications should either avoid combining Preview mode with
// `stats` or apply their own custom logic to eliminate obsolete
// intermediate results.
Preview bool
// Priority optionally allows a query operation to be given a higher
// or lower priority with regard to other query operations managed
// by the same QueryManager. This allows your application to ensure
// its higher priority work runs before lower priority work,
// making the most efficient work of finite CloudWatch Logs service
// resources.
//
// A lower number indicates a higher priority. The default zero
// value is appropriate for many cases.
//
// The Priority field may be set to any valid int value. A query
// whose Priority number is lower is allocated CloudWatch Logs
// query capacity in preference to a query whose Priority number is
// higher, but only within the same QueryManager.
Priority int
// SplitUntil specifies if, and how, the query time range, or the
// query chunks, will be dynamically split into sub-chunks when
// they produce the maximum number of results that CloudWatch Logs
// Insights will return (MaxLimit).
//
// If SplitUntil is zero or negative, then splitting is disabled.
// If positive, then splitting is enabled and SplitUntil must
// represent a whole number of milliseconds (cannot have
// sub-millisecond granularity).
//
// To use splitting, you must also set Limit to MaxLimit and
// Preview must be false.
//
// When splitting is enabled and, when a time range produces
// MaxLimit results, the range is split into sub-chunks no smaller
// than SplitUntil. If a sub-chunk produces MaxLimit results, it is
// recursively split into smaller sub-sub-chunks again no smaller
// than SplitUntil. The splitting process continues until either
// the time range cannot be split into at least two chunks no
// smaller than SplitUntil or the time range produces fewer than
// MaxLimit results.
SplitUntil time.Duration
}
// Stats records metadata about query execution. When returned from a
// Stream, Stats contains metadata about the stream's query. When
// returned from a QueryManager, Stats contains accumulated metadata
// about all queries executed by the query manager.
//
// The Stats structure contains two types of metadata.
//
// The first type of metadata in Stats are returned from the CloudWatch
// Logs Insights web service and consist of metrics about the amount of
// data scanned by the query or queries. These metadata are contained
// within the fields BytesScanned, RecordsMatched, and RecordsScanned.
//
// The second type of metadata in Stats are collected by Incite and
// consist of metrics about the size of the time range or ranges queried
// and how much progress has been on the queries. These metadata can be
// useful, for example, for showing progress bars or other work in
// progress indicators. They are contained within the fields
// RangeRequested, RangeStarted, RangeDone, RangeFailed, and
// RangeMaxed.
type Stats struct {
// BytesScanned is a metric returned by CloudWatch Logs Insights
// which represents the total number of bytes of log events
// scanned.
BytesScanned float64
// RecordsMatched is a metric returned by CloudWatch Logs Insights
// which tallies the number of log events that matched the query or
// queries.
RecordsMatched float64
// RecordsScanned is a metric returned by CloudWatch Logs Insights
// which tallies the number of log events scanned during the query
// or queries.
RecordsScanned float64
// RangeRequested is a metric collected by Incite which tallies the
// accumulated time range requested in the query or queries.
RangeRequested time.Duration
// RangeStarted is a metric collected by Incite which tallies the
// aggregate amount of query time for which the query has been
// initiated in the CloudWatch Logs Insights web service. The value
// in this field is always less than or equal to RangeRequested, and
// it never decreases.
//
// For a non-chunked query, this field is either zero or the total
// time range covered by the QuerySpec. For a chunked query, this
// field reflects the accumulated time of all the chunks which have
// been started. For a QueryManager, this field represents the
// accumulated time of all started query chunks from all queries
// submitted to the query manager.
RangeStarted time.Duration
// RangeDone is a metric collected by Incite which tallies the
// aggregate amount of query time which the CloudWatch Logs Insights
// service has successfully finished querying so far. The value in
// this field is always less than or equal to RangeStarted, and it
// never decreases.
RangeDone time.Duration
// RangeFailed is a metric collected by Incite which tallies the
// aggregate amount of query time which was started in the
// CloudWatch Logs Insights service but which ended with an error,
// either because the Stream or QueryManager was closed, or because
// the CloudWatch Logs service returned a non-retryable error.
RangeFailed time.Duration
// RangeMaxed is a metric collected by Incite which tallies the
// aggregate amount of query time which the CloudWatch Logs Insights
// service has successfully finished querying so far but for which
// Insights returned the maximum number of results requested in the
// QuerySpec Limit field, indicating that more results may be
// available than were returned. The value in this field is always
// less than or equal to RangeDone, and it never decreases.
//
// If RangeMaxed is zero, this indicates that no query chunks
// produced the maximum number of results requested.
//
// RangeMaxed will usually not be increased by queries that use
// dynamic chunk splitting. This is because chunk splitting will
// continue recursively splitting until sub-chunks do not produce
// the maximum number of results. However, when a chunk which
// produces MaxLimit results is too small to split further, its
// duration will be added to RangeMaxed.
RangeMaxed time.Duration
}
func (s *Stats) add(t *Stats) {
s.BytesScanned += t.BytesScanned
s.RecordsMatched += t.RecordsMatched
s.RecordsScanned += t.RecordsScanned
s.RangeRequested += t.RangeRequested
s.RangeStarted += t.RangeStarted
s.RangeDone += t.RangeDone
s.RangeFailed += t.RangeFailed
s.RangeMaxed += t.RangeMaxed
}
// StatsGetter provides access to the Insights query statistics
// returned by the CloudWatch Logs API.
//
// Both Stream and QueryManager contain the StatsGetter interface. Call
// the GetStats method on a Stream to get the query statistics for the
// stream's query. Call the GetStats method on a QueryManager to get the
// query statistics for all queries run within the QueryManager.
type StatsGetter interface {
GetStats() Stats
}
// QueryManager executes one or more CloudWatch Logs Insights queries,
// optionally executing simultaneous queries in parallel.
//
// QueryManager's job is to hide the complexity of the CloudWatch Logs
// Insights API, taking care of mundane details such as starting and
// polling Insights query jobs in the CloudWatch Logs service, breaking
// queries into smaller time chunks (if desired), de-duplicating and
// providing preview results (if desired), retrying transient request
// failures, and managing resources to try to stay within the
// CloudWatch Logs service quota limits.
//
// Use NewQueryManager to create a QueryManager, and be sure to close it
// when you no longer need its services, since every QueryManager
// consumes a tiny amount of system resources just by existing.
//
// Calling the Query method will return a result Stream from which the
// query results can be read as they become available. Use the
// Unmarshal function to unmarshal the bare results into other
// structured types.
//
// Calling the Close method will immediately cancel all running queries
// started with the QueryManager, as if each query's Stream had been
// explicitly closed.
//
// Calling the GetStats method will return the running sum of all
// statistics for all queries run within the QueryManager since it was
// created.
type QueryManager interface {
io.Closer
StatsGetter
Query(QuerySpec) (Stream, error)
}
// Result represents a single result row from a CloudWatch Logs Insights
// query.
type Result []ResultField
// ResultField represents a single field name/field value pair within a
// Result.
type ResultField struct {
Field string
Value string
}
// Stream provides access to the result stream from a query operation
// either using a QueryManager or the global Query function.
//
// Use the Close method if you need to prematurely cancel the query
// operation, releasing the local (in-process) and remote (in the
// CloudWatch Logs service) resources it consumes.
//
// Use the Read method to read query results from the stream. The Read
// method returns io.EOF when the entire results stream has been
// consumed. At this point the query is over and all local and remote
// resources have been released, so it is not necessary to close the
// Stream explicitly.
//
// Use the GetStats method to obtain the Insights statistics pertaining
// to the query. Note that the results from the GetStats method may
// change over time as new results are pulled from the CloudWatch Logs
// web service, but will stop changing after the Read method returns
// io.EOF or any other error. If the query was chunked, the stats will
// be summed across multiple chunks.
type Stream interface {
io.Closer
StatsGetter
// Read reads up to len(p) CloudWatch Logs Insights results into p.
// It returns the number of results read (0 <= n <= len(p)) and any
// error encountered. Even if Read returns n < len(p), it may use
// all of p as scratch space during the call. If some, but fewer
// than len(p), results are available, Read conventionally
// returns what is available instead of waiting for more.
//
// When Read encounters an error or end-of-file condition after
// successfully reading n > 0 results, it returns the number of
// results read. It may return the (non-nil) error from the same
// call or return the error (and n == 0) from a subsequent call. An
// instance of this general case is that a Stream returning a
// non-zero number of results at the end of the input stream may
// return either err == EOF or err == nil. The next Read should
// return 0, EOF.
//
// Callers should always process the n > 0 results returned before
// considering the error err. Doing so correctly handles I/O errors
// that happen after reading some results and also both of the
// allowed EOF behaviors.
//
// Implementations of Read are discouraged from returning a zero
// result count with a nil error, except when len(p) == 0. Callers
// should treat a return of 0 and nil as indicating that nothing
// happened; in particular it does not indicate EOF.
//
// Implementations must not retain p.
//
// As a convenience, the ReadAll function may be used to read all
// remaining results available in a Stream.
//
// If the query underlying the Stream failed permanently, then err
// may be one of:
//
// • StartQueryError
// • TerminalQueryStatusError
// • UnexpectedQueryError
Read(p []Result) (n int, err error)
}
const (
// QueryConcurrencyQuotaLimit contains the default CloudWatch Logs
// query concurrency service limit as documented at
// https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html.
//
// The documented service quota may increase over time, in which
// case this value should be updated to match the documentation.
QueryConcurrencyQuotaLimit = 20
// DefaultParallel is the default maximum number of parallel
// CloudWatch Logs Insights queries a QueryManager will attempt to
// run at any one time.
//
// The default value is set to slightly less than the service quota
// limit to leave some concurrency available for other users even if
// the QueryManager is at maximum capacity.
DefaultParallel = QueryConcurrencyQuotaLimit - 2
// DefaultLimit is the default result count limit used if the Limit
// field of a QuerySpec is zero or negative.
DefaultLimit = 1000
// MaxLimit is the maximum value the result count limit field in a
// QuerySpec may be set to.
MaxLimit = 10000
)
// Config provides the NewQueryManager function with the information it
// needs to construct a new QueryManager.
type Config struct {
// Actions provides the CloudWatch Logs capabilities the QueryManager
// needs to execute Insights queries against the CloudWatch Logs
// service. If this value is nil then NewQueryManager panics.
//
// Normally Actions should be set to the value of an AWS SDK for Go
// (v1) CloudWatch Logs client: both the cloudwatchlogsiface.CloudWatchLogsAPI
// interface and the *cloudwatchlogs.CloudWatchLogs type are
// compatible with the CloudWatchLogsActions interface. Use a
// properly configured instance of one of these types to set the
// value of the Actions field.
Actions CloudWatchLogsActions
// Parallel optionally specifies the maximum number of parallel
// CloudWatch Logs Insights queries which the QueryManager may run
// at one time. The purpose of Parallel is to avoid starving other
// humans or systems using CloudWatch Logs Insights in the same AWS
// account and region.
//
// If set to a positive number then that exact number is used as the
// parallelism factor. If set to zero or a negative number then
// DefaultParallel is used instead.
//
// Parallel gives the upper limit on the number of Insights queries
// the QueryManager may have open at any one time. The actual number
// of Insights queries may be lower either because of throttling or
// service limit exceptions from the CloudWatch Logs web service, or
// because the QueryManager simply doesn't need all the parallel
// capacity.
//
// Do not set Parallel above QueryConcurrencyQuotaLimit unless you
// have received a query concurrency limit increase from AWS
// CloudWatch Logs Insights.
Parallel int
// RPS optionally specifies the maximum number of requests to the
// CloudWatch Logs web service which the QueryManager may make in
// each one-second period for each CloudWatch Logs act. The
// purpose of RPS is to prevent the QueryManager or other humans or
// systems using CloudWatch Logs in the same AWS account and region
// from being throttled by the web service.
//
// If RPS has a missing, zero, or negative number for any required
// CloudWatch Logs act, the value specified in RPSDefaults is
// used instead. The default behavior should be adequate for many
// use cases, so you typically will not need to set this field
// explicitly.
//
// The values in RPS should ideally not exceed the corresponding
// values in RPSQuotaLimits as this will almost certainly result in
// throttling, worse performance, and your application being a "bad
// citizen" affecting other users of the same AWS account.
RPS map[CloudWatchLogsAction]int
// Logger optionally specifies a logging object to which the
// QueryManager can send log messages about queries it is managing.
// This value may be left nil to skip logging altogether.
Logger Logger
// Name optionally gives the new QueryManager a friendly name, which
// will be included in log messages the QueryManager emits to the
// Logger.
//
// Other than being used in logging, this field has no effect on the
// QueryManager's behavior.
Name string
// DisableAdaptation, if true, turns off the QueryManager's adaptive
// capacity utilization behavior. Most users will want to leave this
// flag at the zero value, i.e. false.
//
// Adaptive capacity utilization makes the QueryManager more
// resilient to temporary capacity problems reported by the
// CloudWatch Logs Insights service. The QueryManager achieves this
// resilience by temporarily reducing its usage of CloudWatch Logs
// Insights resources in response to detected limit issues. For
// example, it may temporarily reduce the number of parallel queries
// in flight below Parallel, or it may temporarily reduce its
// service requests per second below RPS. Any temporary reductions
// are phased out when the QueryManager stops getting resource limit
// errors from the Insights service.
DisableAdaptation bool
}