Introduction

Have you recently upgraded FortiGate and noticed after a week that Memory Utilization is much higher than before? You might be facing a memory leak.

I'll describe that issue using example process: WAD - responsible for Explicit Proxy and WAN Optimization.

Investigation

You have noticed that RAM usage is much higher than expected. How to drill down which process is eating all memory?

Remember, that FortiGate is a Unix box in general. Some troubleshooting commands are ported to FortiOS. I'll use htop:

FGT# diag sys top-summary
   CPU [||||||||||||                            ]  30.7%
   Mem [||||||||||||||||||||||||||||||||||||    ]  90.3%  1761M/1863M
   Processes: 20 (running=2 sleeping=134)

   PID      RSS   CPU% ^MEM%   FDS     TIME+  NAME
 * 153     451M    0.0 24.1   461  53:57.80  wad [x10]
   136     367M    0.0 19.7   442  01:32.16  ipsmonitor [x6]
   156      46M  100.8  2.5    22  19:51.69  updated [x2]
   133      42M    0.8  2.3    25  02:49.33  httpsd [x4]
   155      31M    0.0  1.7    47  15:39.36  scanunitd [x5]
   107      29M    0.0  1.6    14  11:25.40  cmdbsvr
   143      29M    0.0  1.6    25  56:54.89  forticron
   31121    24M    0.0  1.3    12  00:00.86  pyfcgid [x4]
   183      21M    0.0  1.1    26  10:12.94  cw_acd
(...)

In my example, wad process (10 instances of it to be specific) is eating around 24% of memory. Depends on the config and appliance total memory, it can eat even 90% of RAM if the memory leak is present.

NOTE: I ignored ipsmonitor because in my configuration 20% of memory utilization is constant for that process and it was a similar value before an upgrade.

I prefer top-summary compared to top, because it summarize all instances of a single process, which gives you a nice overview how much RAM each feature is using.

Fix

Check if there is new software available. Usually, memory leak issues are quickly traced down and fixed with the next minor upgrade. Read the Release Notes! If you are running the latest avaiable software the main branch, there are 2 options:

  1. downgrade - I try to avoid it, as it's a messy solution
  2. schedule auto-restart of a process

I'll focus on the second solution. In many cases, you can use it untill a new software version is released.

I'll write a simple script that is executed every 12 hours:

config system auto-script
  edit restart_wad
    set interval 43200
    set repeat 356
    set start auto
    set script 'diag test app wad 99'
  next
end

That script will automatically, every 12 hours, restart a wad process. Simple, but effective. Remember to remove it after a software upgrade to the verison which resolves this bug.

Summary

If the bug is present in the latest available software, make sure that you open a case with Fortinet Support. If they are not aware of the bug, it won't be fixed.

That procedure applies to all other processes. If you can't restart it using diag command, try killing a process (FortiOS should automatically restart it):

FGT# diagnose sys kill 9 <pid>
  • <pid> - replcae it with PID number obtainted from top or top-summary output.