Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RANGER-4342: Fix new log file creation every time there is an error/exception writing audits to HDFS as Json #277

Merged
merged 12 commits into from
Sep 12, 2023

Conversation

kumaab
Copy link
Contributor

@kumaab kumaab commented Aug 8, 2023

What changes were proposed in this pull request?

Exceptions while writing ranger audits as JSON to hdfs results in closing the log file, the next successful write to hdfs should use the same file (for logging) which was closed but a new log file is created instead. This may cause unusually high number of audit log files to be generated.

2023-08-02 13:45:54,795 DEBUG org.apache.hadoop.io.retry.RetryInvocationHandler: [org.apache.ranger.audit.queue.AuditBatchQueue0]: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1996)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1510)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3237)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1174)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1037)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
, while invoking ClientNamenodeProtocolTranslatorPB.getFileInfo over XXX:<port_number>. Trying to failover immediately.
org.apache.hadoop.ipc.RemoteException: Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1996)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1510)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3237)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1174)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:1037)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1597) 
	at org.apache.hadoop.ipc.Client.call(Client.java:1543)
	at org.apache.hadoop.ipc.Client.call(Client.java:1440) 
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) 
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) 
	at com.sun.proxy.$Proxy31.getFileInfo(Unknown Source) 
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:957) 
	at jdk.internal.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) 
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
	at java.lang.reflect.Method.invoke(Method.java:566) 
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) 
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) 
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) 
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) 
	at com.sun.proxy.$Proxy32.getFileInfo(Unknown Source) 
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1701) 
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1746)
	at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1743) 
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1758) 
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1741)
	at org.apache.ranger.audit.utils.AbstractRangerAuditWriter.createFileSystemFolders(AbstractRangerAuditWriter.java:118) 
	at org.apache.ranger.audit.utils.AbstractRangerAuditWriter.createWriter(AbstractRangerAuditWriter.java:298) 
	at org.apache.ranger.audit.utils.RangerJSONAuditWriter.getLogFileStream(RangerJSONAuditWriter.java:138) 
	at org.apache.ranger.audit.utils.RangerJSONAuditWriter$1.run(RangerJSONAuditWriter.java:68) 
	at org.apache.ranger.audit.utils.RangerJSONAuditWriter$1.run(RangerJSONAuditWriter.java:65) 
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:423) 
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
	at org.apache.ranger.audit.provider.MiscUtil.executePrivilegedAction(MiscUtil.java:545)
	at org.apache.ranger.audit.utils.RangerJSONAuditWriter.logJSON(RangerJSONAuditWriter.java:65)
	at org.apache.ranger.audit.utils.RangerJSONAuditWriter.log(RangerJSONAuditWriter.java:104) 
	at org.apache.ranger.audit.destination.HDFSAuditDestination.logJSON(HDFSAuditDestination.java:79)
	at org.apache.ranger.audit.destination.HDFSAuditDestination.log(HDFSAuditDestination.java:171) 
	at org.apache.ranger.audit.queue.AuditBatchQueue.runLogAudit(AuditBatchQueue.java:309) 
	at org.apache.ranger.audit.queue.AuditBatchQueue.run(AuditBatchQueue.java:215) 
	at java.lang.Thread.run(Thread.java:829) 
2023-08-02 13:45:54,795 DEBUG org.apache.hadoop.io.retry.RetryUtils: [org.apache.ranger.audit.queue.AuditBatchQueue0]: multipleLinearRandomRetry = null

How was this patch tested?

Added unit tests and also tested the changes on a cluster with this patch.

@kumaab kumaab self-assigned this Aug 8, 2023
@kumaab kumaab added the testing complete PR testing is complete label Sep 7, 2023
@kumaab
Copy link
Contributor Author

kumaab commented Sep 12, 2023

Updated the patch to fallback to file creation if the append operation fails (due to other reasons which cause lease holder issues as it is programmatically not feasible to recover leases).

@kumaab kumaab merged commit 4d9648b into apache:master Sep 12, 2023
1 check failed
@kumaab kumaab deleted the RANGER-4342 branch September 12, 2023 19:54
mneethiraj pushed a commit that referenced this pull request Jun 20, 2024
…riting audits to HDFS as Json (#277)

* Append to last log file in case of errors/exceptions encountered
* Close streams and reset writers
* Add unit tests
* Fallback to create if append fails or is not supported
---------

Co-authored-by: abhishek-kumar <abhishek.kumar@cloudera.com>
(cherry picked from commit 4d9648b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing complete PR testing is complete
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants