Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale Paralus and Relay replicas for high availability and no-downtime upgrades #264

Open
akshay196 opened this issue Oct 11, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@akshay196
Copy link
Member

Briefly describe the feature

Paralus and Relay should be scaled to more than one replicas. Relay should work properly when relay server or agent are scaled up.

What problem does this feature solve? Please link any relevant documentation or Issues

Support HA and zero-downtime upgrades.

(optional) What is your current workaround?

None

@akshay196 akshay196 added enhancement New feature or request needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 11, 2023
@akshay196 akshay196 self-assigned this Oct 11, 2023
@akshay196
Copy link
Member Author

First attempt to scale relay server (2 replicas) with an imported cluster in my local Kind setup. Paralus server got crashed with below error.

{"AccountID":"","PartnerID":"","OrganizationID":"","Username":"","IsSSO":false,"EnforceSession":false,"SessionType":"","SystemUser":false,"RelayNetwork":false}}
panic: uuid: Parse(): invalid UUID length: 0

goroutine 405 [running]:
github.com/google/uuid.MustParse({0x0, 0x0})
	/go/pkg/mod/github.com/google/uuid@v1.3.0/uuid.go:163 +0xb9
github.com/paralus/paralus/pkg/service.(*accountPermissionService).GetAccount(0xc000593388, {0x239ebc0, 0xc0005413b0}, {0x0, 0x0})
	/build/pkg/service/account_permission.go:91 +0x33
github.com/paralus/paralus/server.(*auditInfoServer).LookupUser(0xc00049c660, {0x239ebc0, 0xc0005413b0}, 0xc0003dcfc0)
	/build/server/audit_info.go:47 +0x367
github.com/paralus/paralus/proto/rpc/sentry._AuditInformationService_LookupUser_Handler({0x1dda6e0, 0xc00049c660}, {0x239ebc0, 0xc0005413b0}, 0xc0000bb080, 0x0)
	/build/proto/rpc/sentry/audit_info_grpc.pb.go:91 +0x170
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0005ee1c0, {0x23d4b70, 0xc000315040}, 0xc0003c5c20, 0xc000aceff0, 0x365f220, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1282 +0xccf
google.golang.org/grpc.(*Server).handleStream(0xc0005ee1c0, {0x23d4b70, 0xc000315040}, 0xc0003c5c20, 0x0)
	/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1616 +0xa2a
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:921 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:919 +0x294

Looks like common name (or entire peer certificate) is missing the request coming to relay server. Auditing handler - https://github.com/paralus/relay/blob/cc8661975750da3f4c6e156d72d8a955d9ccf6cd/pkg/audit/audit.go#L69

@akshay196 akshay196 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2023
@akshay196
Copy link
Member Author

Fixing above issue and scaling up relay server cause failure accessing target cluster.

kubectl get pod --kubeconfig kubeconfig-admin@paralus.local.yaml 
E1025 22:24:11.932897  218790 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E1025 22:24:11.934600  218790 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E1025 22:24:11.936381  218790 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E1025 22:24:11.938531  218790 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
E1025 22:24:11.940673  218790 memcache.go:265] couldn't get current server API group list: the server has asked for the client to provide credentials
error: You must be logged in to the server (the server has asked for the client to provide credentials)

@akshay196
Copy link
Member Author

When no dialin conn key (dialinsni) in dialin pool then we lookup peer cache - https://github.com/paralus/relay/blob/cc8661975750da3f4c6e156d72d8a955d9ccf6cd/pkg/tunnel/server.go#L679
But there is no routine found that is inserting relay peer to peer cache. -

func InsertPeerCache(cache *ristretto.Cache, expiry time.Duration, key, value interface{}) bool {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants