Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: flyteadmin doesn't shutdown servers gracefully #6289

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 51 additions & 8 deletions flyteadmin/pkg/server/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@
"fmt"
"net"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"

"github.com/gorilla/handlers"
Expand Down Expand Up @@ -386,8 +389,9 @@
}

go func() {
err := grpcServer.Serve(lis)
logger.Fatalf(ctx, "Failed to create GRPC Server, Err: ", err)
if err := grpcServer.Serve(lis); err != nil {
logger.Fatalf(ctx, "Failed to create GRPC Server, Err: %v", err)
}

Check warning on line 394 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L392-L394

Added lines #L392 - L394 were not covered by tests
}()

logger.Infof(ctx, "Starting HTTP/1 Gateway server on %s", cfg.GetHostAddress())
Expand Down Expand Up @@ -422,11 +426,34 @@
ReadHeaderTimeout: time.Duration(cfg.ReadHeaderTimeoutSeconds) * time.Second,
}

err = server.ListenAndServe()
if err != nil {
return errors.Wrapf(err, "failed to Start HTTP Server")
go func() {
err = server.ListenAndServe()
if err != nil && err != http.ErrServerClosed {
logger.Fatalf(ctx, "Failed to start HTTP Server: %v", err)
}

Check warning on line 433 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L429-L433

Added lines #L429 - L433 were not covered by tests
}()

// Gracefully shutdown the servers
sigCh := make(chan os.Signal, 1)
Copy link
Contributor

@Sovietaced Sovietaced Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have two signal listeners? Should just be able to use one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean just SIGINT or SIGTERM?

Copy link
Contributor

@Sovietaced Sovietaced Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that in this diff you have two signal channels (one for grpc and one for http). You only need one to know whether or not the app is shutting down.

signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
<-sigCh
time.Sleep(1 * time.Second)

// force to shut down servers after 10 seconds
timer := time.AfterFunc(10 * time.Second, func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want to make this timeout configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like configure the timeout in flyteadmin_config.yaml?

server:
  forceShutdownTimeoutSec: 10
  

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

logger.Infof(ctx, "Server couldn't stop gracefully in time. Doing force stop.")
server.Close()
grpcServer.Stop()
})
defer timer.Stop()

grpcServer.GracefulStop()

if err := server.Shutdown(ctx); err != nil {
logger.Errorf(ctx, "Failed to gracefully shutdown HTTP server: %v", err)

Check warning on line 453 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L437-L453

Added lines #L437 - L453 were not covered by tests
}

logger.Infof(ctx, "Servers gracefully stopped")

Check warning on line 456 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L456

Added line #L456 was not covered by tests
return nil
}

Expand Down Expand Up @@ -534,10 +561,26 @@
ReadHeaderTimeout: time.Duration(cfg.ReadHeaderTimeoutSeconds) * time.Second,
}

err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
go func() {
err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
if err != nil && err != http.ErrServerClosed {
logger.Errorf(ctx, "Failed to start HTTP/2 Server: %v", err)
}

Check warning on line 568 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L564-L568

Added lines #L564 - L568 were not covered by tests
}()
Comment on lines +565 to +570
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition in server startup

The HTTP/2 server is now started in a goroutine, but the function immediately proceeds to wait for shutdown signals. This could lead to a race condition where the shutdown sequence begins before the server is fully initialized. Consider adding a small delay or a readiness check before proceeding to the shutdown logic.

Code suggestion
Check the AI-generated fix before applying
 @@ -558,6 +558,13 @@
  	go func() {
  		err = srv.Serve(tls.NewListener(conn, srv.TLSConfig))
  		if err != nil && err != http.ErrServerClosed {
  			logger.Errorf(ctx, "Failed to start HTTP/2 Server: %v", err)
  		}
  	}()
 +
 +	// Give the server a moment to start before proceeding to shutdown logic
 +	time.Sleep(100 * time.Millisecond)
 +
 +	// Log that the server has started
 +	logger.Infof(ctx, "HTTP/2 Server started successfully on %s", cfg.GetHostAddress())
 

Code Review Run #55c786


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them


// Gracefully shutdown the servers
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
<-sigCh

Check warning on line 574 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L572-L574

Added lines #L572 - L574 were not covered by tests

if err != nil {
return errors.Wrapf(err, "failed to Start HTTP/2 Server")
// Create a context with timeout for the shutdown process
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()

if err := srv.Shutdown(shutdownCtx); err != nil {
logger.Errorf(ctx, "Failed to shutdown HTTP server: %v", err)

Check warning on line 581 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L576-L581

Added lines #L576 - L581 were not covered by tests
}

logger.Infof(ctx, "Servers gracefully stopped")

Check warning on line 584 in flyteadmin/pkg/server/service.go

View check run for this annotation

Codecov / codecov/patch

flyteadmin/pkg/server/service.go#L584

Added line #L584 was not covered by tests
return nil
}
Loading