Symptoms
K2 authentication fails in one of K2 environments on the same network but at the same time K2 works in another environment. Error message is the following:
---
Authentication Failed : The system cannot contact a domain controller to service the authentication request. Please try again later
at SourceCode.Hosting.Client.BaseAPI.BaseAPIConnection.WindowsAuthentication(SCConnectionStringBuilder connectionStringBuilder)
at SourceCode.Hosting.Client.BaseAPI.BaseAPIConnection.Authenticate(String connectionString)
at SourceCode.Security.Claims.Web.Shared.ConnectionClass.HandleIdentityImpersonation(Boolean asAppPool, Action action)
at SourceCode.Security.Claims.Web.Shared.ConnectionClass.TryCredentialToken(BaseAPIConnection connection, String credentialToken, Boolean asAppPool)
at SourceCode.Security.Claims.Web.Shared.ConnectionClass.GetPoolConnection(String credentialToken, Boolean asAppPool, Booleanand tokenApplied)
at SourceCode.Security.Claims.Web.Shared.ConnectionClass.OpenConnectionForAuthentication()
at SourceCode.Security.Claims.Web.ClaimsHelper.SaveBootstrapContext(BootstrapContext bootstrapContext, ClaimsIdentity claimsIdentity)
at SourceCode.Security.Claims.Web.ClaimsHelper.OnValidateToken(SecurityTokenHandler tokenHandler, SecurityToken token, ReadOnlyCollection`1 claimsIdentityCollection, Boolean standardMapToWindows)
at SourceCode.Security.Claims.Web.WIFExtensions.SamlSecurityTokenHandler.ValidateToken(SecurityToken token)
at System.IdentityModel.Services.TokenReceiver.AuthenticateToken(SecurityToken token, Boolean ensureBearerToken, String endpointUri)
at System.IdentityModel.Services.WSFederationAuthenticationModule.SignInWithResponseMessage(HttpRequestBase request)
at System.IdentityModel.Services.WSFederationAuthenticationModule.OnAuthenticateRequest(Object sender, EventArgs args)
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Booleanand completedSynchronously)
---
No errors in K2 logs and at a first glance infrastructure is up and running.
Diagnoses
To verify if there are any problems with environment/infrastructure lying outside of K2 network trace can be recorded in environment using "netsh trace" command. It is no always feasible to put network tracing software (which often requires reboot during installation) in production environments, but there is an approach which allows you to perform trace recording without installing any tools:
https://blogs.msdn.microsoft.com/canberrapfe/2012/03/30/capture-a-network-trace-without-installing-anything-capture-a-network-trace-of-a-reboot/
Next you can analyze recorded trace with help of Microsoft Message Analyzer (successor of Microsoft Network Monitor) to see if affected environment has any network layer issues and/or DNS issues.
Quite often it is possible to see in the trace such diagnostic messages/warnings:
Application DNS: The RCode of the message is NXDomain(3). Please refer to section 4.1.1 in Technical Document RFC 1035.
Application TCP: Segment lost, missing 3-way handshake.
Application MSRPCE: Missing context for the PContId. It might be caused by missing MSRPCE binding messages.
Application RDPEUDP: Congestion Notification
Validation RDPEUDP: Parse TLS message failed due to incomplete data.
So based on trice you should be able to see whether there is any network/infrastructure issues such as network congestion leading to datagram losses and inability to reassemble TCP segments or something else. For example network congestion can affect Kerberos at some point leading to the situation where wrong flags are detected by it probably just because of data loss/network level errors. Example of related warning:
Validation KerberosV5: The flags field in type KERB_EXT_ERROR should be set to 1, not 3 (0x00000003).
Resolution
Use network trace as described above to verify if there are any problem with network or DNS infrastructure.