ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) – Error: starting log segment 47312001 failed for required journal

jetty 大数据

问题描述:

2022-04-17 15:27:15,901 ERROR namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: starting log segment 47312001 failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.3:8485, 192.168.1.4:8485, 192.168.1.5:8485], stream=null))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.startLogSegment(QuorumJournalManager.java:407)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.startLogSegment(JournalSet.java:95)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet$1.apply(JournalSet.java:210)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.startLogSegment(JournalSet.java:207)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegment(FSEditLog.java:1383)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.startLogSegmentAndWriteHeaderTxn(FSEditLog.java:1395)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:335)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:103)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1253)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1890)
	at org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
	at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1749)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1742)
	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
2022-04-17 15:27:15,904 INFO  util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: Error: starting log segment 47312001 failed for required journal (JournalAndStream(mgr=QJM to [192.168.1.3:8485, 192.168.1.4:8485, 192.168.1.5:8485], stream=null))
2022-04-17 15:27:15,907 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(533)) - ==> JVMShutdownHook.run()
2022-04-17 15:27:15,907 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(534)) - JVMShutdownHook: Signalling async audit cleanup to start.
2022-04-17 15:27:15,907 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(504)) - RangerAsyncAuditCleanup: Starting cleanup
2022-04-17 15:27:15,907 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(538)) - JVMShutdownHook: Waiting up to 30 seconds for audit cleanup to finish.
2022-04-17 15:27:15,907 INFO  destination.HDFSAuditDestination (HDFSAuditDestination.java:flush(191)) - Flush called. name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,909 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(106)) - Stop called. name=hdfs.async
2022-04-17 15:27:15,909 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:stop(110)) - Interrupting consumerThread. name=hdfs.async, consumer=hdfs.async.multi_dest
2022-04-17 15:27:15,909 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(508)) - RangerAsyncAuditCleanup: Done cleanup
2022-04-17 15:27:15,909 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(497)) - RangerAsyncAuditCleanup: Waiting to audit cleanup start signal
2022-04-17 15:27:15,909 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(541)) - JVMShutdownHook: Audit cleanup finished after 2 milli seconds
2022-04-17 15:27:15,909 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(155)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,910 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(548)) - JVMShutdownHook: Interrupting ranger async audit cleanup thread
2022-04-17 15:27:15,911 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(171)) - Exiting polling loop. name=hdfs.async
2022-04-17 15:27:15,911 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(550)) - <== JVMShutdownHook.run()
2022-04-17 15:27:15,911 INFO  provider.AuditProviderFactory (AuditProviderFactory.java:run(501)) - RangerAsyncAuditCleanup: Interrupted while waiting for audit startCleanup signal!  Exiting the thread...
java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
	at org.apache.ranger.audit.provider.AuditProviderFactory$RangerAsyncAuditCleanup.run(AuditProviderFactory.java:499)
	at java.lang.Thread.run(Thread.java:748)
2022-04-17 15:27:15,911 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(175)) - Calling to stop consumer. name=hdfs.async, consumer.name=hdfs.async.multi_dest
2022-04-17 15:27:15,912 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:stop(125)) - Stop called. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,913 INFO  namenode.NameNode (LogAdapter.java:info(51)) - SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop1/192.168.1.4
************************************************************/
2022-04-17 15:27:15,913 INFO  destination.HDFSAuditDestination (HDFSAuditDestination.java:flush(191)) - Flush called. name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,913 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:stop(130)) - Interrupting consumerThread. name=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,913 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:stop(125)) - Stop called. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,913 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:stop(130)) - Interrupting consumerThread. name=hdfs.async.multi_dest.batch, consumer=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,913 INFO  queue.AuditAsyncQueue (AuditAsyncQueue.java:runLogAudit(183)) - Exiting consumerThread.run() method. name=hdfs.async
2022-04-17 15:27:15,914 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,914 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(281)) - Caught exception in consumer thread. Shutdown might be in progress
2022-04-17 15:27:15,914 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,914 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(347)) - Exiting consumerThread. Queue=hdfs.async.multi_dest.batch, dest=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,915 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.solr
2022-04-17 15:27:15,914 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(351)) - Calling to stop consumer. name=hdfs.async.multi_dest.batch, consumer.name=hdfs.async.multi_dest.batch.hdfs
2022-04-17 15:27:15,915 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(362)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch
2022-04-17 15:27:15,915 INFO  queue.AuditBatchQueue (AuditBatchQueue.java:runLogAudit(362)) - Exiting consumerThread.run() method. name=hdfs.async.multi_dest.batch

解决办法:

在hdfs-site.xml中添加

dfs.qjournal.start-segment.timeout.ms = 90000
dfs.qjournal.select-input-streams.timeout.ms = 90000
dfs.qjournal.write-txns.timeout.ms = 90000

在core-site.xml中添加

ipc.client.connect.timeout = 90000

回复

我来回复
  • 暂无回复内容