Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: High (Prometheus) CPU usage for new unified metrics server endpoint [v1.12.0] #1642

Open
mohammad051 opened this issue Feb 21, 2025 · 16 comments · Fixed by #1644 or #1643
Open
Assignees
Labels
bug Something isn't working manager The Electron app server usage/service metrics issues related to collecting service metrics

Comments

@mohammad051
Copy link

Application

Outline Manager

Describe the bug

hello
Today, Outline Manager was automatically updated to version 1.17.0.
All server resources like ram - cpu
It became 100%

When I reboot the server, the server is fixed, but when I use the outline manager, I want to open the management key, it gets full again and the server crashes.

How can I disable the automatic update of Outline Manager and use the previous version to solve the problem of the new version?

Steps to reproduce

1.Open the Outline Manager

What did you expect to happen?

No response

What actually happened?

No response

Outline Version

1.17.0

What operation system are you using?

Windows

Operating System Version

No response

Screenshots and Videos

No response

@mohammad051 mohammad051 added the bug Something isn't working label Feb 21, 2025
@BossyBigBoss
Copy link

Yeah, bro! The same problem. It drives me crazy. All of my Outline servers went to 100% of CPU utilization.
Firstly I thought it was a server problem.

I already dropped a ticket to Outline support.

Luckily I've had a backup of Outline Manager on my flash drive and it works well.

Waiting for the solution.

@mohammad051
Copy link
Author

Yeah, bro! The same problem. It drives me crazy. All of my Outline servers went to 100% of CPU utilization. Firstly I thought it was a server problem.

I already dropped a ticket to Outline support.

Luckily I've had a backup of Outline Manager on my flash drive and it works well.

Waiting for the solution.

My brother version 1.14.0
I installed it on the system, but it updates automatically, what version did you install?

@sbruens
Copy link
Contributor

sbruens commented Feb 21, 2025

Thanks for the report. We're looking into it. If you can share more details of what you're experiencing, please feel free to share them in here.

@mohammad051
Copy link
Author

Thanks for the report. We're looking into it. If you can share more details of what you're experiencing, please feel free to share them in here.

Outline Manager was automatically updated today
30 of my servers are down
The servers they use are lightsail from Amazon

I thought it was a server problem, but it wasn't.

I rebooted the server and saw that it came up.

After opening the Outline Manager, I saw that the CPU was at 100%.
And the server had a problem and did not come up.

Please check and fix the problem.
I really don't know what to do to find a solution.

thank you

@sbruens
Copy link
Contributor

sbruens commented Feb 21, 2025

Thanks @mohammad051. We introduced some new metrics in the Manager UI, the calculation of which I assume is the cause of this high CPU. For my understanding, how many access keys do your servers roughly have?

@mohammad051
Copy link
Author

Thanks @mohammad051. We introduced some new metrics in the Manager UI, the calculation of which I assume is the cause of this high CPU. For my understanding, how many access keys do your servers roughly have?

Brother, each of my servers has between 30 and 60 active keys.
I didn't have this problem on the previous version. How can I use the previous version without automatically updating this manager?

@BossyBigBoss
Copy link

BossyBigBoss commented Feb 21, 2025

Thanks @mohammad051. We introduced some new metrics in the Manager UI, the calculation of which I assume is the cause of this high CPU. For my understanding, how many access keys do your servers roughly have?

I have the same issues on Amazon servers. The servers have 40-50 active access keys.
Even if I close Outline Manager my Outline servers remain under 100% CPU utilization. Only reboot helps.

@BossyBigBoss
Copy link

Yeah, bro! The same problem. It drives me crazy. All of my Outline servers went to 100% of CPU utilization. Firstly I thought it was a server problem.
I already dropped a ticket to Outline support.
Luckily I've had a backup of Outline Manager on my flash drive and it works well.
Waiting for the solution.

My brother version 1.14.0 I installed it on the system, but it updates automatically, what version did you install?

I have a backup of the previous version of Outline Manager for Windows 1.15.2
I disabled WiFi on my PC to prevent a new version update, ran the backup version, got "Server Unreachable" and then enabled WiFi and clicked Retry to connect to the Outline server.

@sbruens
Copy link
Contributor

sbruens commented Feb 21, 2025

An old Manager is a workaround, but we completed a rollback of the server back to v1.11.0. Your servers should pick up this change within the hour, when watchtower looks for a new image to pull.

The continued CPU usage is surprising, that implies something is still doing work despite the Manager not asking anything. Can I ask whether you are also experiencing memory issues?

@mohammad051
Copy link
Author

mohammad051 commented Feb 21, 2025

An old Manager is a workaround, but we completed a rollback of the server back to v1.11.0. Your servers should pick up this change within the hour, when watchtower looks for a new image to pull.

The continued CPU usage is surprising, that implies something is still doing work despite the Manager not asking anything. Can I ask whether you are also experiencing memory issues?

After opening the Outline Manager menu, it goes up quickly and we don't even have a chance to log in and we don't know that the memory is involved.

I went to the old version of Outline Manager but it immediately updates to the new version and the problems start.

How can I disable automatic updates to fix the problem?

please help me

@sbruens
Copy link
Contributor

sbruens commented Feb 21, 2025

It's not a Manager issue; it's a server issue, which we rolled back earlier. Are you saying this is still happening for servers running the rolled back v1.11.0 version?

@mohammad051
Copy link
Author

It's not a Manager issue; it's a server issue, which we rolled back earlier. Are you saying this is still happening for servers running the rolled back v1.11.0 version?

thank you brother
Now it came to version 1.11.0
And all the problems were solved

thank you very much

@sbruens sbruens self-assigned this Feb 21, 2025
@sbruens sbruens changed the title [Bug]: Using the above resources in Outline Manager version 1.17.0 [Bug]: High (Prometheus) CPU usage for new unified metrics server endpoint on server v1.12.0 Feb 21, 2025
@sbruens
Copy link
Contributor

sbruens commented Feb 21, 2025

Thank you for confirming @mohammad051 and I'm glad to hear that resolved the immediate outage.
I'm just going to move this over to the server repo so we can track the work to fix this over there.

@sbruens sbruens transferred this issue from Jigsaw-Code/outline-apps Feb 21, 2025
@sbruens sbruens changed the title [Bug]: High (Prometheus) CPU usage for new unified metrics server endpoint on server v1.12.0 [Bug]: High (Prometheus) CPU usage for new unified metrics server endpoint [v1.12.0] Feb 21, 2025
@mohammad051
Copy link
Author

Thank you for confirming @mohammad051 and I'm glad to hear that resolved the immediate outage. I'm just going to move this over to the server repo so we can track the work to fix this over there.

Thank you very much for your help.

You helped me a lot.

Thank you for your quick answers.

Thank you for solving this problem in the shortest possible time.

@sbruens sbruens added server manager The Electron app usage/service metrics issues related to collecting service metrics labels Feb 21, 2025
@sbruens
Copy link
Contributor

sbruens commented Feb 23, 2025

We have spent some more time on this and confirmed that Prometheus can cause increased CPU issues. d262f52 is mitigating the issue, though we still need to examine a full root cause.

If anyone that ran into this issue is able to do a test run with the new release candidate containing the hotfix, that would help give us more confidence before releasing to a wider audience. New release candidate image:

quay.io/outline/shadowbox:v1.12.2-rc2

@BossyBigBoss
Copy link

sbruens

We have spent some more time on this and confirmed that Prometheus can cause increased CPU issues. d262f52 is mitigating the issue, though we still need to examine a full root cause.

If anyone that ran into this issue is able to do a test run with the new release candidate containing the hotfix, that would help give us more confidence before releasing to a wider audience. New release candidate image:

quay.io/outline/shadowbox:v1.12.2-rc2

I would like to say a big thank you.
Unfortunately, I can't do an experiment on my servers cause it's a live Outline server for my users.
Too risky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working manager The Electron app server usage/service metrics issues related to collecting service metrics
Projects
None yet
3 participants