Skip to content

Instrumentation with influxdb backend #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Gnoale
Copy link

@Gnoale Gnoale commented Sep 16, 2019

Description

See comments from this issue for details about the implementation choice of InfluxDB.

I looked for the 2 endpoints referred in the issue 298 RecordActivityTaskHeartbeat and PollForActivityTask, and implement a method inside swf/core.py to send statistic to InfluxDB.

I define the tags region endpoint and hostname.

Considerations for Reviewers

  • This library is used to talk to InfluxDB backend, I assume this part of the code would be non blocking and should not impact the simpleflow behavior, because UDP do not except any answer from the server.

  • Maybe we should use some AWS tag instead of the worker hostname (not human friendly)

  • Maybe we should pay attention if the worker would not be able to resolve the grafana fqdn.

  • Authentication and encryption :

The UDP listener on InfluxDB server doesn't seems to support this, a solutions could consist in writing to a Telegraf proxy agent wich would send the data to Influx over HTTPS.
Or use a local database in the same network.

Test environment/Test case

Considerations for Deployment

In order to implement this solution, the InfluxDB server would have to be setup and the environment secured.

gnoale added 2 commits September 16, 2019 15:36
…skHeartbeat and RecordActivityTaskHeartbeat endpoints
define hostname attribute for metrology purpose

fix hostname parameter string
@@ -127,6 +127,9 @@ def heartbeat(self, task_token, details=None):
task_token,
format.heartbeat_details(details),
)
# instrumentation
self.send_endpoint_usage("RecordActivityTaskHeartbeat")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the instrumentation lines should be put before the actual call to the endpoint. Otherwise if the call makes a 429, an exception is raised and you'll never hit influxdb for this call. Which means in practice that graphs won't ever reach the rate limiting thresholds, whereas we want them to do so. I'm actually unsure if when SWF answers a 429 it's counted in our rate limit, but I think we're safer if we have an exaggerated view than the other way around. Last but not least the line above is never called (it's after a "return").

Also: I'd be happy that we integrate instrumentation in a generic way that doesn't pollute API calls. Basically connection.call_endpoint() should result in the endpoint instrumentation being triggered AND the endpoint being called. But that can come in a second phase I think.


self.hostname = socket.gethostname()

def send_endpoint_usage(endpoint, host="grafana.botify.com", port=8089):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be hardcoded, please push force with no botify value asap, since we're disclosing a private info on a public repository here ;-) You have examples in the repo of using simpleflow settings. You can grep for "SIMPLEFLOW_ENABLE_DISK_CACHE" for instance to get a simple enough example.

@Gnoale Gnoale closed this Sep 16, 2019
@Gnoale Gnoale deleted the task/Monitor-SWF-rate-limits-#298 branch September 16, 2019 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants