Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico-Node Error on Ubuntu20.04 after upgrade to tigera-operator Helmchart v3.27.2 #8833

Closed
Klemmerik opened this issue May 16, 2024 · 6 comments

Comments

@Klemmerik
Copy link

Klemmerik commented May 16, 2024

After upgrades our kubernetes clusters to a tigera-operator higher than v3.27.2 we encountered network issues and high CPU and RAM load on our cluster nodes with ubuntu 20.04 (kernel version: 5.4.0-182-generic). On kubernetes clusters nodes with ubuntu 22.04 we did not see any issues. We upgraded to a newer kernel version (5.15.0-107-generic) on the affected ubuntu 20.04 nodes. That seemed to have solved the issue.

Expected Behavior

After installing the new tigera-operator we expected that the cluster network continues to work as before.

Current Behavior

The newly installed version of calico evoked the following error logs (systemd-udevd):

  • calico_tmp_A: Could not generate persistent MAC: No data available
  • calico_tmp_B: Could not generate persistent MAC: No data available
  • calico_tmp_A: Failed to get link config: No buffer space available
  • calico_tmp_B: Failed to get link config: No buffer space available

The following log could be found in the calico-node pod:

[WARNING][65] felix/int_dataplane.go 1747: failed to wipe the XDP state error=failed to load BPF program (/usr/lib/calico/bpf/filter.o): stat /sys/fs/bpf/calico/xdp/prefilter_v1_calico_tmp_A: no such file or directory   

The CPU and RAM usage was very high. The cluster network did not function properly.

Possible Solution

As a workaround we upgraded the kernel version from 5.4.0-182-generic to 5.15.0-107-generic.

Context

The upgrade was not successful. The network plugin was not functional on Ubuntu 20.04.

Your Environment

  • Calico version: 3.27.3 and 3.28.0
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes 1.28.8 (kubeadm installation)
  • Operating System and version: Ubuntu 20.04.6 LTS
@Klemmerik Klemmerik changed the title Problem with Ubuntu20.04 until tigera-operator Helmchart v3.27.2 Calico-Node Error on Ubuntu20.04 after upgrade to tigera-operator Helmchart v3.27.2 May 16, 2024
@leejoyful
Copy link

We also encountered the same problem.
Calico version: 3.28.0
Operating System and version: Ubuntu 20.04.6 LTS
kubernetes version: v1.28.8

@leejoyful
Copy link

leejoyful commented May 20, 2024

We checked the network cache usage of the host.

Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts InCsumErrors
Tcp: 1 200 120000 -1 17436779 11769717 917568 6179267 252 1411444179 1502054229 3051851 21 9896414 0

Optimized the system kernel.

cat <<EOF | sudo tee /etc/sysctl.d/99-sysctl.conf
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 250000
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.ip_local_port_range = 1024 65535
EOF
sudo sysctl -p /etc/sysctl.d/99-sysctl.conf

But this problem still occurs:

May 20 11:06:57 ts-cpu-11 systemd-udevd[1476180]: calico_tmp_B: Could not generate persistent MAC: No data available
May 20 11:06:57 ts-cpu-11 systemd-udevd[1476170]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 20 11:06:57 ts-cpu-11 systemd-udevd[1476170]: calico_tmp_A: Could not generate persistent MAC: No data available
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: ERROR:Unknown interface index 45206407 seen even after reload
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: WARNING:Unknown index 45206408 seen, reloading interface list
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: ERROR:Unknown interface index 45206408 seen even after reload
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: WARNING:Unknown index 45206408 seen, reloading interface list
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: ERROR:Unknown interface index 45206408 seen even after reload
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: WARNING:Unknown index 45206407 seen, reloading interface list
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: ERROR:Unknown interface index 45206407 seen even after reload
May 20 11:06:57 ts-cpu-11 networkd-dispatcher[1476201]: WARNING:Unknown index 45206409 seen, reloading interface list

@kingnarmer
Copy link

Ran into this issue , is there any work around ?

@Klemmerik
Copy link
Author

Ran into this issue , is there any work around ?

We have updated the kernel version ( 5.4.0-182-generic to 5.15.0-107-generic ) and it looks like this has fixed the CPU and RAM issues.

@tomastigera
Copy link
Contributor

@mazdakn is looks like a dupe of #8856

@caseydavenport
Copy link
Member

Going to keep discussion of this one in #8856 for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants