Problem with bluemind core (restart impossible)

Aurelien · October 23, 2024, 1:23pm

Hi

my env:
debian 12 / 24Go of RAM / nproc 8

Installation done with bluemind-installer-5.0.7519-bookworm.bin binary

Since tonight impossible to restart my bluemind instance

 bmctl all_status
bm-nginx is running.
bm-core.service fail - check systemctl status bm-core.service and journalctl -xe -u bm-core.service
bm-eas.service is running.
bm-elasticsearch.service is running.
bm-iptables.service is running.
bm-keycloak.service is running.
bm-milter.service is running.
bm-node.service is running.
bm-pimp.service is running.
bm-postgresql.service is running.
bm-tika.service is running.
bm-webserver.service is running.
bm-ysnp.service is running.
postgresql.service is running.

Bluemind try to restart at each time core and I have some errors on core.log:

2024-10-23 15:05:20,611 [bm-hz-connect] [none:anon] n.b.p.BMPoolActivator INFO - Starting connection pool 185.255.28.214/bj-data, schema: null, dbtype: PGSQL
2024-10-23 15:05:20,611 [bm-hz-connect] [none:anon] c.z.h.HikariDataSource INFO - bj-data@185.255.28.214 - Starting...
2024-10-23 15:05:20,617 [bm-hz-connect] [none:anon] c.z.h.p.HikariPool INFO - bj-data@185.255.28.214 - Added connection org.postgresql.jdbc.PgConnection@707a596e
2024-10-23 15:05:20,618 [bm-hz-connect] [none:anon] c.z.h.HikariDataSource INFO - bj-data@185.255.28.214 - Start completed.
2024-10-23 15:05:20,618 [bm-hz-connect] [none:anon] n.b.p.BMPoolActivator INFO - Got DS HikariDataSource (bj-data@185.255.28.214)
2024-10-23 15:05:20,619 [bm-hz-connect] [none:anon] n.b.a.l.ApplicationLauncher INFO - 1 mailbox datasource found, servers: [bm-master]
2024-10-23 15:05:20,620 [bm-hz-connect] [none:anon] n.b.l.v.i.BMModule INFO - BM module created.
2024-10-23 15:05:20,620 [vert.x-eventloop-thread-2] [none:anon] n.b.l.v.i.BMModule INFO - Starting net.bluemind.lib.vertx.internal.BMModule@3f75233...
2024-10-23 15:05:23,168 [vertx-blocked-thread-checker] [none:anon] i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 2547 ms, time limit is 2000 ms
2024-10-23 15:05:24,168 [vertx-blocked-thread-checker] [none:anon] i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 3548 ms, time limit is 2000 ms
2024-10-23 15:05:25,168 [vertx-blocked-thread-checker] [none:anon] i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 4547 ms, time limit is 2000 ms
2024-10-23 15:05:26,173 [vertx-blocked-thread-checker] [none:anon] i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 5548 ms, time limit is 2000 ms
io.vertx.core.VertxException: Thread blocked
	at org.rocksdb.RocksDB.open(Native Method)
	at org.rocksdb.RocksDB.open(RocksDB.java:325)
	at net.bluemind.retry.support.rocks.RocksQueue.openDb(RocksQueue.java:152)
	at net.bluemind.retry.support.rocks.RocksQueue.<init>(RocksQueue.java:124)
	at net.bluemind.retry.support.RetryQueueVerticle.<init>(RetryQueueVerticle.java:68)
	at net.bluemind.core.auditlogs.client.es.datastreams.AuditQueueFactory$AuditQueue.<init>(AuditQueueFactory.java:77)
	at net.bluemind.core.auditlogs.client.es.datastreams.AuditQueueFactory.<init>(AuditQueueFactory.java:86)
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at org.eclipse.core.internal.registry.osgi.RegistryStrategyOSGI.createExecutableExtension(RegistryStrategyOSGI.java:204)
	at org.eclipse.core.internal.registry.ExtensionRegistry.createExecutableExtension(ExtensionRegistry.java:920)
	at org.eclipse.core.internal.registry.ConfigurationElement.createExecutableExtension(ConfigurationElement.java:246)
	at org.eclipse.core.internal.registry.ConfigurationElementHandle.createExecutableExtension(ConfigurationElementHandle.java:63)
	at net.bluemind.eclipse.common.RunnableExtensionLoader.loadExtensions(RunnableExtensionLoader.java:127)
	at net.bluemind.lib.vertx.internal.BMModule.start(BMModule.java:38)
	at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$5(DeploymentManager.java:195)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:277)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:259)
	at io.vertx.core.impl.EventLoopContext.lambda$runOnContext$0(EventLoopContext.java:43)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)

As we can see we have some WARN like:
i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 4547 ms, time limit is 2000 ms
2024-10-23 15:05:26,173 [vertx-blocked-thread-checker] [none:anon] i.v.c.i.BlockedThreadChecker WARN - Thread Thread[#144,vert.x-eventloop-thread-2,5,main] has been blocked for 5548 ms, time limit is 2000 ms

And with restart on bold, server load increase to 100% (idem for RAM used)

I tried to use a backup and restart with this backup but problem is the same. (No upgrade have been done wince many days on this server)

Thanks for help

tchu · October 23, 2024, 2:33pm

Hi,

With the elements provided, let me know if this command correct the incident :

mv /var/cache/bm-core/retry-rocks-audit /var/cache/bm-core/retry-rocks-audit-old && bmctl restart

Best regards,

Aurelien · October 23, 2024, 2:56pm

Great it works
Tanks for this magical command

In 5 seconds my instance is now UP

 bmctl all_status
bm-nginx is running.
bm-core.service is running.
bm-eas.service is running.
bm-elasticsearch.service is running.
bm-iptables.service is running.
bm-keycloak.service is running.
bm-milter.service is running.
bm-node.service is running.
bm-pimp.service is running.
bm-postgresql.service is running.
bm-tika.service is running.
bm-webserver.service is running.
bm-ysnp.service is running.
postgresql.service is running.

How can we explain this issue?
(No maintenance has been carried out on this server, no updates have been made, the problem appeared in the middle of the night when no specific action was supposed to take place.)

Thank you again for your precious help

Aurélien

tchu · October 24, 2024, 8:07am

Hi Aurélien,

Thanks for the feedback.
It’s good to know that everything’s back to normal.

In version 5.2 of BlueMind, this problem will be a thing of the past with the implementation of KeyDB to replace RocksDB for better performance.

Best regards,