ceph-mon goes CrashLoopBackOff in Rocky Linux 9.2 (Blue Onyx) #12687
Unanswered
gowthamsadasivam
asked this question in
Q&A
Replies: 2 comments
-
The mon restart seems caused by something outside the mon. This is commonly caused by the liveness probe. Does a
Since the third mon is not restarting, assuming they are all on the same platform, it wouldn't seem specific to Rocky linux 9.2. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I have the same problem. How's it going |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am unable to get rook-ceph cluster working in my new kubernetes cluster. The ceph-mon pods are keep on crashing with CrashLoopBackOff error. The exact same setup with
Rocky Linux 8.8
is working good.I have a Kubernetes cluster up & running with the below mentioned configurations:
Host OS:
Rocky Linux 9.2
Linux Kernel:
5.14.0-284.25.1.el9_2.x86_64
Number of Nodes:
5
Kubernetes version:
1.24.15
Docker version:
20.10.24
Kubernetes Distro:
RKE
Followed official documentation to install and deploy rook-ceph:
helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph
# k -n rook-ceph get pods -l "app=rook-ceph-operator"
Output:
created a ceph cluster using the official example yaml file -
https://github.com/rook/rook/blob/master/deploy/examples/cluster.yaml
k apply -f ./cluster.yaml
Checking status of the cluster creation:
# k get cephclusters.ceph.rook.io -n rook-ceph
Output:
By checking the pods in rook-ceph ns, I figured the monitor daemons are CrashLoopBackOff :
# kubectl get po -n rook-ceph | grep mon
After few minutes:
# k get po -n rook-ceph | grep mon
Also noticed couple of the osd also were restarting :
# k get po -n rook-ceph | grep osd
Last few lines of the ceph-mon pod logs which are crashing :
Not sure if the issue is caused by this line:
Last few lines of ceph-operator logs :
Anybody faced similar issue with
Rocky Linux 9.2
or similar alternative OS? As I mentioned in the beginning of the post, I have the exact same setup working good withRocky Linux 8.8
host OS nodes.Found a similar issue (an year old): #10110
Tried adding the solution provided, i.e., setting the
LimitNOFILE
property to the value of1048576
instead ofinfinity
.reloaded systemd daemon and restarted docker systemd service and redeployed the Kubernetes cluster.
But even after this change, I end up reaching the same issue mentioned above.
I really appreciate, any help pointing out what went wrong with the above setup.
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions