Skip to main content


It seems my #Perl memory leak issue is ongoing. While I managed to fix one big one, it turns out there's still a *slight* heap growth ongoing:

This time I'm going to post about debugging it in real-time, so this will be more of a story of exploration.

It starts with a monitoring graph, in which I noticed still a slight upwards trend. I restarted the process in the middle here with some extra debug modules loaded.

#perl
in reply to Paul Evans

Looking on the more detailed graph, it seems there's one object class in particular of note that seems to correlate to this upwards growth: Future::Exception.

This seems reasonable, as from the staircase shape of the graph, the growth appears to come in occasional bursts, around longer otherwise-stable periods.

in reply to Paul Evans

The reason for that restart in the middle was so I could load `Devel::MAT::Dumper` into the process. This let me take two heapdump files; one shortly after startup, and one much later after an accumulation of these Future::Exception objects.

I did this by using the `-dump_at_SIGQUIT` option, and sending a SIGQUIT to the running program each time.

I now have two `.pmat` files, so I can compare them for differences.

metacpan.org/pod/Devel::MAT::D…

in reply to Paul Evans

First port of call is the `pmat-counts` script, which counts a summary of how many of various types of SV are found in the dump files. When given more than one file in a row, it also prints a count of differences of each file from the previous.

Here we can definitely see the second file has grown in various categories, including adding a bunch of blessed ARRAYs, which seems to agree with the graphs.

in reply to Paul Evans

Next, loading the second file into the `pmat` tool shell lets us poke at it in more detail. Since we already have a good idea it's related to these `Future::Exception` instances, lets start by grabbing a list of those.

Seems like there's quite a few.

This entry was edited (6 days ago)
in reply to Paul Evans

Picking one of these at random, we can inspect it to see its contents.

The message says it's about a timeout that happened, which again seems to correlate with the occasional staircase shape. These timeouts were rare, so it seems these exception values have been retained by something somewhere.

in reply to Paul Evans

Next we can try using the `identify` command on this example object, to see what is retaining it, and hopefully gain some insight into the actual bug.

Unfortunately for us, this doesn't lead anywhere. The ARRAY is being held by a scalar REF instance, but nothing in the heapdump can find what is holding on to that.

in reply to Paul Evans

At this point there are no more immediate steps we can take, so we will have to try a few other ideas.

Knowing the surrounding code, I know there's a couple of XS-implemented modules around Future handling, so it seems likely one of those has forgotten to `SvREFCNT_dec()` an exception reference.

One of those modules, `Future::XS`, is an optional performance improvement and can be skipped. By restarting the program without it we can wait and see if the problem continues.

in reply to Paul Evans

It also turns out that I forgot to actually document the way to skip loading the `Future::XS` module, so I can't link to any docs of that for this thread ;)

While I leave this to run overnight again and maybe or maybe-not accumulate any more leaks, I can spend a moment to write about that ;)

I think that's enough for today, I'll wait and see what this yields tomorrow...

in reply to Paul Evans

I left this running without `Future::XS` over the weekend to see what would happen.

It did use a constant amount more memory than with (that's to be expected), but it didn't grow at all over the time. Moreover, the per-class breakdown doesn't show any growing collection of `Future::Exception` objects at all. It seems we are on the right path.

in reply to Paul Evans

Knowing the source of the leak is in `Future::XS` helps narrow it down. That's not a big distribution, not many lines of code that deal with `Future::Exception` objects anyway.

It didn't take me long to manage to write a test using `Test::MemoryGrowth` that indicates failure:

metacpan.org/pod/Test::MemoryG…

in reply to Paul Evans

For an extra bit of confirmation that this seems to be the bug we're looking for, we can also inspect the `.pmat` files that the failing test wrote.

By running both through the `pmat-diff` tool, it compares the gained and lost SVs on each side, and tries to associate them together. It'll draw a summary of what it finds:

Here we indeed find an ARRAY blessed as `Future::Exception`, holding five other SVs. Seems to match what we'd expect.

in reply to Paul Evans

Knowing the shape of the code around handling of `Future::Exception` instances, it didn't take me too long to poke around and find a place where a reference is accidentally retained.

After a nested call to the `Future::Exception->new`, the result is `SvREFCNT_inc()`ed to retain it over the `FREETMPS; LEAVE;` at the end of the method call, but we forgot to re-mortalize it when pushing it to the stack for the next call.

An additional `sv_2mortal()` is needed.

This entry was edited (3 days ago)
in reply to Paul Evans

With this one-line fix in place, the newly-added unit test passes at least.

I'll put this code on my real machine and see how it holds up. If that appears to fix it, then I'll ship it to CPAN as the next version of `Future::XS` tomorrow.

in reply to Paul Evans

Success 😀 Having left this version of the code running for over 24 hours, it indeed hasn't shown any signs of a leak.

I think we're onto a winner here, I'll pop this up on CPAN.

in reply to Paul Evans

And here it is.

metacpan.org/release/PEVANS/Fu…

That concludes this particular story, though I have at least two follow-up thoughts to add, but I'll write those in separate threads. I think this one has gone on long enough now.

in reply to Rue Mohr

@RueNahcMohr Well the failure itself comes from this line here:

metacpan.org/dist/Device-Seria…

but knowing this isn't directly any help. The problem is that something somewhere is retaining these exception objects indefinitely and not throwing them away. That's the part I need to investigate.

in reply to Paul Evans

Maybe its not a matter of someone holding on to them, but that nobody is throwing them away.

;]

in reply to Rue Mohr

@RueNahcMohr Quite so. As I said, I suspect a missing `SvREFCNT_dec()` call somewhere in one of my XS modules. The trick now is identifying which module and where it should go.

Missing code is always the hardest kind to find. ;)

in reply to Paul Evans

is the whole object being replicated? or just the string?