Implement a way to time out tokens based on (in)activity #17640
Implement a way to time out tokens based on (in)activity #17640openshift-merge-robot merged 4 commits intoopenshift:masterfrom
Conversation
|
/hold |
990e955 to
7026f96
Compare
enj
left a comment
There was a problem hiding this comment.
Here is the diffs I used:
http://pastebin.test.redhat.com/538893 (for everything but the main timeout validator file)
https://www.diffchecker.com/ln16YiA7 (for the validator file)
| AccessTokenTimeoutSeconds: &invalid, | ||
| GrantMethod: oauthapi.GrantHandlerAuto, | ||
| }) | ||
| if err == nil { |
| ) | ||
|
|
||
| const MinTokenLength = 32 | ||
| const MinFlushTimeout = 150 |
There was a problem hiding this comment.
I am actually going to change flush timeout handling a little, and will put here docs on what it means
| allErrs := validation.ValidateObjectMetaUpdate(&newToken.ObjectMeta, &oldToken.ObjectMeta, field.NewPath("metadata")) | ||
| copied := *oldToken | ||
| copied.ObjectMeta = newToken.ObjectMeta | ||
| // allow only TimeoutsIn to be changed |
There was a problem hiding this comment.
Most of my validation additions are missing.
There was a problem hiding this comment.
Ok I think I have it now
| timeoutValidator := authnregistry.NewTimeoutValidator(accessTokenGetter, oauthClientLister, *config.OAuthConfig.TokenConfig.AccessTokenTimeoutSeconds) | ||
| validators = append(validators, timeoutValidator) | ||
| postStartHooks["openshift.io-TokenTimeoutUpdater"] = func(context genericapiserver.PostStartHookContext) error { | ||
| go timeoutValidator.Start(context.StopCh) |
There was a problem hiding this comment.
I don't really like that we have to export the type to call this, but we seem to do this in other places as well so I suppose its fine (granted controllers seem to have a Run method instead of a Start).
I think we want a defer utilruntime.HandleCrash() in the hook though.
There was a problem hiding this comment.
Ok, changed Start() -> Run() and added a HandleCrash call in it
| } | ||
| } | ||
|
|
||
| glog.Infof("Flushed %d tokens out of %d in bucket", flushedTokens, totalTokens) |
| // make sure to kill the timer when we exit | ||
| defer closeTimer() | ||
|
|
||
| updateTimerAndFlush := func() { |
There was a problem hiding this comment.
I think we can reset the timer to save resources and this way the timer is always stooped even when we receive from the channel (I could not tell from the docs if that was required).
resetTimerAndFlush := func() {
closeTimer()
nextTimer.Reset(a.flushTimeout)
nextTimeout = time.Now().Add(a.flushTimeout)
a.flush(nextTimeout.Add(a.safetyMargin))
}| delta := a.clientTimeout(td.token.ClientName) | ||
| newTimeout := int32(0) | ||
| if delta > 0 { | ||
| newTimeout = int32((td.seen.Sub(td.token.CreationTimestamp.Time) + delta) / time.Second) |
There was a problem hiding this comment.
I had some comments explaining this math which I think are useful.
| if oauthClient.AccessTokenTimeoutSeconds == nil { | ||
| return a.defaultTimeout | ||
| } | ||
| a.updateTimeouts(oauthClient) |
There was a problem hiding this comment.
I believe I had some extra glogs in this func and some other places in this file?
There was a problem hiding this comment.
I did not import all of them as some things simply changed, any specific thing you want to generate logs out of ? You can describe the thought process about what you think should be logged too.
| switch { | ||
| case err == nil: | ||
| flushedTokens++ | ||
| case apierrors.IsConflict(err): |
| glog.V(5).Infof("Token timeout for user=%q client=%q scopes=%v was not updated: %v", | ||
| td.token.UserName, td.token.ClientName, td.token.Scopes, err) | ||
| } else { | ||
| flushedTokens++ |
There was a problem hiding this comment.
Should we also count total failures?
There was a problem hiding this comment.
Any token not flushed is a failure, do we need to be very explicit about it ?
|
Ok I should have addressed all comments but the clock one. |
|
Finally #17662 got in and I was able to properly generate the internal API changes, code has been updated |
| "github.com/openshift/origin/pkg/user/apis/user" | ||
| ) | ||
|
|
||
| var ErrTimedout = errors.New("token timed out") |
There was a problem hiding this comment.
nit: I believe this can be made private.
| } | ||
|
|
||
| if config.TokenConfig.AccessTokenTimeoutSeconds != nil { | ||
| if *config.TokenConfig.AccessTokenTimeoutSeconds < oauthvalidation.ReferenceTimeoutInterval { |
There was a problem hiding this comment.
*config.TokenConfig.AccessTokenTimeoutSeconds == 0 is valid
| // to timeout earlier than they should. 0 is the exception, as this value | ||
| // indicates that the token does not expire, and it is an allowed new | ||
| // value. | ||
| if newToken.TimeoutsIn > 0 && newToken.TimeoutsIn < oldToken.TimeoutsIn { |
There was a problem hiding this comment.
I had some related validation in pkg/oauth/registry/oauthaccesstoken/strategy.go that went with this as well.
There was a problem hiding this comment.
Yeah I saw that, but given we can update the field with any (valid) value, I do not see much value in adding this kind of validation just at token creation.
| } | ||
|
|
||
| func (a *tokenData) timeout() time.Time { | ||
| func (a tokenData) timeout() time.Time { |
There was a problem hiding this comment.
You probably want to use *tokenData to avoid copying the struct unnecessarily.
There was a problem hiding this comment.
I was doing that before, but when I added RankedSet the compiler complained, changing it to not be apointer made it work.
There was a problem hiding this comment.
Right because you will have to update everything to use *tokenData otherwise it will not implement the interface.
| ))) | ||
| } | ||
|
|
||
| if config.TokenConfig.AccessTokenTimeoutSeconds != nil { |
There was a problem hiding this comment.
if timeout := config.TokenConfig.AccessTokenTimeoutSeconds; timeout != nil && *timeout != 0 { is a bit easier to read.
| fmt.Sprintf("cannot beless than the current value=%d", oldToken.TimeoutsIn))) | ||
| } | ||
| // negative values are not allowed either | ||
| if newToken.TimeoutsIn < 0 { |
There was a problem hiding this comment.
We want this on create as well.
| allErrs = append(allErrs, ValidateScopeRestriction(restriction, field.NewPath("scopeRestrictions").Index(i))...) | ||
| } | ||
|
|
||
| if client.AccessTokenTimeoutSeconds != nil { |
There was a problem hiding this comment.
Want negative check here as well.
There was a problem hiding this comment.
And same checks on update too I believe.
There was a problem hiding this comment.
yeah about negative, update automatically checks for anything doen at create
|
@simo5 please add unit tests for all the validation bits, should be easy to extend/copy existing tests. |
enj
left a comment
There was a problem hiding this comment.
I do not think we want to log the error in these places since it is likely to leak the token value.
| case err == nil: | ||
| flushedTokens++ | ||
| case apierrors.IsConflict(err) || apierrors.IsServerTimeout(err): | ||
| glog.V(5).Infof("Token update deferred for token=%q, retriable error: %v", |
There was a problem hiding this comment.
Err actually I see you are logging the token name, do not do that xD
| td.token.Name, err) | ||
| retryList = append(retryList, *td) | ||
| default: | ||
| glog.V(5).Infof("Token timeout for user=%q client=%q scopes=%v was not updated: %v", |
| for _, td := range retryList { | ||
| err := a.update(td) | ||
| if err != nil { | ||
| glog.V(5).Infof("Token timeout for user=%q client=%q scopes=%v was not updated: %v", |
There was a problem hiding this comment.
neither here, is the err what you want me drop ? (doing that anyway)
|
|
||
| var errTimedout = errors.New("token timed out") | ||
|
|
||
| // Implements rankedset.Item |
| return nil | ||
| } | ||
|
|
||
| td := tokenData{ |
There was a problem hiding this comment.
Shouldn't you just use a pointer the whole time? The channel having tokenData and the set having *tokenData is weird.
| if *timeout != 0 && *timeout < oauthvalidation.ReferenceTimeoutInterval { | ||
| validationResults.AddErrors(field.Invalid( | ||
| fldPath.Child("tokenConfig", "accessTokenTimeoutSeconds"), | ||
| *config.TokenConfig.AccessTokenInactivityTimeoutSeconds, |
There was a problem hiding this comment.
nit: use *timeout on this line
pkg/oauth/apiserver/auth.go
Outdated
|
|
||
| storage := registrystorage.New(c.ExtraOAuthConfig.OAuthAccessTokenClient, c.ExtraOAuthConfig.OAuthAuthorizeTokenClient, combinedOAuthClientGetter, registry.NewUserConversion()) | ||
| tokentimeout := int32(0) | ||
| if c.ExtraOAuthConfig.Options.TokenConfig.AccessTokenInactivityTimeoutSeconds != nil { |
There was a problem hiding this comment.
nit: if timeout := c.ExtraOAuthConfig.Options.TokenConfig.AccessTokenInactivityTimeoutSeconds; timeout != nil would make it so you do not have to repeat that whole thing on the next line.
|
|
||
| var timeout = flag.Duration("sub.timeout", 5*time.Minute, "Specify the timeout for each sub test") | ||
| var timeoutException = map[string]*time.Duration{ | ||
| "TestOAuthTimeout": flag.Duration("oauth.timeout", 15*time.Minute, "Timeout for the OAuth test"), |
There was a problem hiding this comment.
nit: just make this a map[string]string, don't need the flag / pointer stuff.
There was a problem hiding this comment.
easier this way for me
| // Use the server and CA info | ||
| anonConfig := restclient.AnonymousClientConfig(clientConfig) | ||
|
|
||
| min := int32(150) |
There was a problem hiding this comment.
uh isn't this too small per validation?
There was a problem hiding this comment.
hint the min timeout is reference timeout / 3 ie 100 seconds is the current absolute minimum
| } | ||
| } | ||
|
|
||
| glog.V(5).Infof("Successfully flushed %d tokens out of %d", |
There was a problem hiding this comment.
You may want a.data.Len() as well?
There was a problem hiding this comment.
no, I rally only want to show what we tried to flush, and what succeeded, data.Len() is all the stuff in the queue
| err := a.update(*td) | ||
| switch { | ||
| case err == nil: | ||
| flushedTokens++ |
| glog.V(5).Infof("Token timeout for user=%q client=%q scopes=%v was not updated: %v", | ||
| td.token.UserName, td.token.ClientName, td.token.Scopes, err) | ||
| } else { | ||
| flushedTokens++ |
| return a.defaultTimeout | ||
| } | ||
| if oauthClient.AccessTokenInactivityTimeoutSeconds == nil { | ||
| return a.defaultTimeout |
| if oauthClient.AccessTokenInactivityTimeoutSeconds == nil { | ||
| return a.defaultTimeout | ||
| } | ||
| return timeoutAsDuration(*oauthClient.AccessTokenInactivityTimeoutSeconds) |
|
|
||
| func ValidateAccessTokenUpdate(newToken, oldToken *oauthapi.OAuthAccessToken) field.ErrorList { | ||
| allErrs := validation.ValidateObjectMetaUpdate(&newToken.ObjectMeta, &oldToken.ObjectMeta, field.NewPath("metadata")) | ||
| // since TimeoutsIn can be concurrently updated by multipe masters we do |
There was a problem hiding this comment.
We should error if oldToken.TimeoutsIn == 0 && newToken.TimeoutsIn != 0 to prevent a token that was never supposed to timeout from being updated to timeout. Probably needs a comment describing how this race could happen between multiple masters.
6cd0dc2 to
c8badfe
Compare
|
flake: #17974 |
|
flake: #17731 |
|
flake: #16484 |
|
/retest |
|
/lgtm Manually approving the apps changes since they are generated code. David did not seem too concerned. |
|
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: enj, simo5 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
/retest |
4 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
flake: #17985 |
|
/retest |
1 similar comment
|
/retest |
|
flake: #16994 |
|
/retest |
|
flake: #17697 |
|
/retest |
1 similar comment
|
/retest |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest |
|
flake: #15630 |
|
@simo5: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
Automatic merge from submit-queue. |
When OAuthClient is configure with accessTokenTimeoutSeconds then tokens obtained with the specific client get a TimeoutsIn field that marks when the token is to be considered timed out.
The timeout is in seconds since token CreationTimestamp
As the token is used it is pushed into a bucket which is regularly flushed (for now).
This replaces #17161 which github inexplicably and unilaterally decided to close after a wrong push (can't be reopened)