Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure OutputFormat thread safety #18

Open
kboom opened this issue Sep 22, 2019 · 0 comments
Open

Ensure OutputFormat thread safety #18

kboom opened this issue Sep 22, 2019 · 0 comments
Labels
enhancement New feature or request performance

Comments

@kboom
Copy link
Owner

kboom commented Sep 22, 2019

If VERTEX_OUTPUT_FORMAT_THREAD_SAFE is set to true and there are multiple threads set in NUM_COMPUTE_THREADS (so in NUM_OUTPUT_THREADS by default as well too) then we get:

2019-09-22 09:04:53,987 ERROR [org.apache.giraph.utils.LogStacktraceCallable] - Execution of callable failed
java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:89)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /user/kbhit/1569143060/_temporary/1/_temporary/attempt_1569140710858_0005_m_000001_1/step-0/part-m-00001 for DFSClient_NONMAPREDUCE_-1931685888_1 on 10.164.0.19 because DFSClient_NONMAPREDUCE_-1931685888_1 is already the current lease holder.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2412)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:357)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2309)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2230)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:745)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1507)
	at org.apache.hadoop.ipc.Client.call(Client.java:1453)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy10.create(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:297)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:267)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1206)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1148)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:477)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:477)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:418)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
	at org.apache.giraph.io.formats.GiraphTextOutputFormat.getRecordWriter(GiraphTextOutputFormat.java:67)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.createLineRecordWriter(TextVertexOutputFormat.java:116)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.initialize(TextVertexOutputFormat.java:97)
	at edu.agh.iga.adi.giraph.direction.io.StepVertexOutputFormat$IdWithValueVertexWriter.initialize(StepVertexOutputFormat.java:80)
	at org.apache.giraph.io.internal.WrappedVertexOutputFormat$1.initialize(WrappedVertexOutputFormat.java:82)
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:87)
	... 7 more
2019-09-22 09:04:53,997 ERROR [org.apache.giraph.worker.BspServiceWorker] - unregisterHealth: Got failure, unregistering health on /_hadoopBsp/giraph_yarn_application_1569140710858_0005/_applicationAttemptsDir/0/_superstepDir/0/_workerHealthyDir/iga-adi-w-1.europe-west4-a.c.charismatic-cab-252315.internal_1 on superstep 0
2019-09-22 09:04:54,000 ERROR [org.apache.giraph.yarn.GiraphYarnTask] - GiraphYarnTask threw a top-level exception, failing task
java.lang.RuntimeException: run: Caught an unrecoverable exception Exception occurred
	at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:106)
	at org.apache.giraph.yarn.GiraphYarnTask.main(GiraphYarnTask.java:184)
Caused by: java.lang.IllegalStateException: Exception occurred
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:274)
	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:813)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:361)
	at org.apache.giraph.yarn.GiraphYarnTask.run(GiraphYarnTask.java:93)
	... 1 more
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:271)
	... 4 more
Caused by: java.lang.IllegalStateException: getVertexWriter: IOException occurred
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:89)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:153)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:70)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to CREATE_FILE /user/kbhit/1569143060/_temporary/1/_temporary/attempt_1569140710858_0005_m_000001_1/step-0/part-m-00001 for DFSClient_NONMAPREDUCE_-1931685888_1 on 10.164.0.19 because DFSClient_NONMAPREDUCE_-1931685888_1 is already the current lease holder.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2412)
	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:357)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2309)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2230)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:745)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:413)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1507)
	at org.apache.hadoop.ipc.Client.call(Client.java:1453)
	at org.apache.hadoop.ipc.Client.call(Client.java:1363)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
	at com.sun.proxy.$Proxy10.create(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:297)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy11.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:267)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1206)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1148)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:480)
	at org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:477)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:477)
	at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:418)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1067)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1048)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:937)
	at org.apache.giraph.io.formats.GiraphTextOutputFormat.getRecordWriter(GiraphTextOutputFormat.java:67)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.createLineRecordWriter(TextVertexOutputFormat.java:116)
	at org.apache.giraph.io.formats.TextVertexOutputFormat$TextVertexWriter.initialize(TextVertexOutputFormat.java:97)
	at edu.agh.iga.adi.giraph.direction.io.StepVertexOutputFormat$IdWithValueVertexWriter.initialize(StepVertexOutputFormat.java:80)
	at org.apache.giraph.io.internal.WrappedVertexOutputFormat$1.initialize(WrappedVertexOutputFormat.java:82)
	at org.apache.giraph.io.superstep_output.MultiThreadedSuperstepOutput.getVertexWriter(MultiThreadedSuperstepOutput.java:87)
	... 7 more
End of LogType:task-3-stdout.log.
***************************************
@kboom kboom added enhancement New feature or request performance labels Sep 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

1 participant