ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) – Error: starting log segment 47312001 failed for required journal
问题描述:
2022-04-17 15:27:15,901 ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: starting log segment 47312001 failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.3:8485, 192.168.1.4:8485, 192.168.1.5:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:407)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:95)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$1.apply(JournalSet.java:210)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:207)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1383)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegmentAndWriteHeaderTxn(FSEditLog.java:1395)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:335)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:103)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1253)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1890)
at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1749)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1742)
at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2022-04-17 15:27:15,904 INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Error: starting log segment 47312001 failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.3:8485, 192.168.1.4:8485, 192.168.1.5:8485], stream=null))
2022-04-17 15:27:15,907 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(533)) - ==> JVMShutdownHook.run()
2022-04-17 15:27:15,907 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(534)) - JVMShutdownHook: Signalling async audit cleanup to start.
2022-04-17 15:27:15,907 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(504)) - RangerAsyncAuditCleanup: Starting cleanup
2022-04-17 15:27:15,907 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(538)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.
2022-04-17 15:27:15,907 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:flush(191)) - Flush called. name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,909 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(106)) - Stop called. name=hdfs.async
2022-04-17 15:27:15,909 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(110)) - Interrupting consumerThread. name=hdfs.async, consumer=hdfs.async.multi_dest
2022-04-17 15:27:15,909 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(508)) - RangerAsyncAuditCleanup: Done cleanup
2022-04-17 15:27:15,909 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(497)) - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
2022-04-17 15:27:15,909 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(541)) - JVMShutdownHook: Audit cleanup finished after 2 milli seconds
2022-04-17 15:27:15,909 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(155)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,910 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(548)) - JVMShutdownHook: Interrupting ranger async audit cleanup thread
2022-04-17 15:27:15,911 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(171)) - Exiting polling loop. name=hdfs.async
2022-04-17 15:27:15,911 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(550)) - <== JVMShutdownHook.run()
2022-04-17 15:27:15,911 INFO provider.AuditProviderFactory (AuditProviderFactory.java:run(501)) - RangerAsyncAuditCleanup: Interrupted while waiting for audit startCleanup signal! Exiting the thread...
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.ranger.audit.provider.AuditProviderFactory$RangerAsyncAuditCleanup.run(AuditProviderFactory.java:499)
at java.lang.Thread.run(Thread.java:748)
2022-04-17 15:27:15,911 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(175)) - Calling to stop consumer. name=hdfs.async, consumer.name=hdfs.async.multi_dest
2022-04-17 15:27:15,912 INFO queue.AuditBatchQueue (AuditBatchQueue.java:stop(125)) - Stop called. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,913 INFO namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1/192.168.1.4
************************************************************/
2022-04-17 15:27:15,913 INFO destination.HDFSAuditDestination (HDFSAuditDestination.java:flush(191)) - Flush called. name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,913 INFO queue.AuditBatchQueue (AuditBatchQueue.java:stop(130)) - Interrupting consumerThread. name=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,913 INFO queue.AuditBatchQueue (AuditBatchQueue.java:stop(125)) - Stop called. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,913 INFO queue.AuditBatchQueue (AuditBatchQueue.java:stop(130)) - Interrupting consumerThread. name=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,913 INFO queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(183)) - Exiting consumerThread.run() method. name=hdfs.async
2022-04-17 15:27:15,914 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,914 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,914 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,914 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,915 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,914 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,915 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(362)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,915 INFO queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(362)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch
解决办法:
在hdfs-site.xml中添加
dfs.qjournal.start-segment.timeout.ms = 90000
dfs.qjournal.select-input-streams.timeout.ms = 90000
dfs.qjournal.write-txns.timeout.ms = 90000
在core-site.xml中添加
ipc.client.connect.timeout = 90000