Hi! Please consider following me on twitter: @hanekomu.

Perl benchmarks: Nested data structures

The conclusion first: Nested arrays and hashes in Perl are slow. They get slower the more levels you nest. For most applications this won't be a problem, but if you cache values in a hash, you might want to use as few levels of nesting as possible.

First, let's benchmark reading from and writing to nested array elements. We use different arrays for six different levels of nesting, and all element indices are constants.

use Benchmark qw(:all);

our (@array1, @array2, @array3, @array4, @array5, @array6);

cmpthese(timethese(10_000_000, {
    L1w => sub { $array1[1] = 1 },
    L2w => sub { $array2[1][2] = 1 },
    L3w => sub { $array3[1][2][3] = 1 },
    L4w => sub { $array4[1][2][3][4] = 1 },
    L5w => sub { $array5[1][2][3][4][5] = 1 },
    L6w => sub { $array6[1][2][3][4][5][6] = 1 },
    L1r => sub { $array1[1] },
    L2r => sub { $array2[1][2] },
    L3r => sub { $array3[1][2][3] },
    L4r => sub { $array4[1][2][3][4] },
    L5r => sub { $array5[1][2][3][4][5] },
    L6r => sub { $array6[1][2][3][4][5][6] },
}));

The results:

Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
       L1r:  0 wallclock secs ( 0.20 usr + -0.00 sys =  0.20 CPU) @ 50000000.00/s (n=10000000)
            (warning: too few iterations for a reliable count)
       L1w:  1 wallclock secs ( 0.72 usr +  0.01 sys =  0.73 CPU) @ 13698630.14/s (n=10000000)
       L2r:  1 wallclock secs ( 1.19 usr +  0.01 sys =  1.20 CPU) @ 8333333.33/s (n=10000000)
       L2w:  1 wallclock secs ( 2.01 usr +  0.02 sys =  2.03 CPU) @ 4926108.37/s (n=10000000)
       L3r:  2 wallclock secs ( 1.82 usr +  0.02 sys =  1.84 CPU) @ 5434782.61/s (n=10000000)
       L3w:  3 wallclock secs ( 2.67 usr +  0.03 sys =  2.70 CPU) @ 3703703.70/s (n=10000000)
       L4r:  3 wallclock secs ( 2.63 usr +  0.01 sys =  2.64 CPU) @ 3787878.79/s (n=10000000)
       L4w:  4 wallclock secs ( 3.54 usr +  0.02 sys =  3.56 CPU) @ 2808988.76/s (n=10000000)
       L5r:  3 wallclock secs ( 3.04 usr +  0.00 sys =  3.04 CPU) @ 3289473.68/s (n=10000000)
       L5w:  4 wallclock secs ( 4.13 usr +  0.04 sys =  4.17 CPU) @ 2398081.53/s (n=10000000)
       L6r:  4 wallclock secs ( 3.76 usr +  0.01 sys =  3.77 CPU) @ 2652519.89/s (n=10000000)
       L6w:  5 wallclock secs ( 4.46 usr +  0.02 sys =  4.48 CPU) @ 2232142.86/s (n=10000000)
          Rate   L6w   L5w   L6r   L4w   L5r   L3w   L4r  L2w  L3r  L2r  L1w  L1r
L6w  2232143/s    --   -7%  -16%  -21%  -32%  -40%  -41% -55% -59% -73% -84% -96%
L5w  2398082/s    7%    --  -10%  -15%  -27%  -35%  -37% -51% -56% -71% -82% -95%
L6r  2652520/s   19%   11%    --   -6%  -19%  -28%  -30% -46% -51% -68% -81% -95%
L4w  2808989/s   26%   17%    6%    --  -15%  -24%  -26% -43% -48% -66% -79% -94%
L5r  3289474/s   47%   37%   24%   17%    --  -11%  -13% -33% -39% -61% -76% -93%
L3w  3703704/s   66%   54%   40%   32%   13%    --   -2% -25% -32% -56% -73% -93%
L4r  3787879/s   70%   58%   43%   35%   15%    2%    -- -23% -30% -55% -72% -92%
L2w  4926108/s  121%  105%   86%   75%   50%   33%   30%   --  -9% -41% -64% -90%
L3r  5434783/s  143%  127%  105%   93%   65%   47%   43%  10%   -- -35% -60% -89%
L2r  8333333/s  273%  248%  214%  197%  153%  125%  120%  69%  53%   -- -39% -83%
L1w 13698630/s  514%  471%  416%  388%  316%  270%  262% 178% 152%  64%   -- -73%
L1r 50000000/s 2140% 1985% 1785% 1680% 1420% 1250% 1220% 915% 820% 500% 265%   --

Next we do the same for hashes, again with constant hash keys:

use Benchmark qw(:all);

our (%hash1, %hash2, %hash3, %hash4, %hash5, %hash6);

cmpthese(timethese(10_000_000, {
    L1w => sub { $hash1{L1} = 1 },
    L2w => sub { $hash2{L1}{L2} = 1 },
    L3w => sub { $hash3{L1}{L2}{L3} = 1 },
    L4w => sub { $hash4{L1}{L2}{L3}{L4} = 1 },
    L5w => sub { $hash5{L1}{L2}{L3}{L4}{L5} = 1 },
    L6w => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} = 1 },
    L1r => sub { $hash1{L1} },
    L2r => sub { $hash2{L1}{L2} },
    L3r => sub { $hash3{L1}{L2}{L3} },
    L4r => sub { $hash4{L1}{L2}{L3}{L4} },
    L5r => sub { $hash5{L1}{L2}{L3}{L4}{L5} },
    L6r => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} },
}));

The results:

Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
       L1r:  0 wallclock secs ( 0.63 usr + -0.00 sys =  0.63 CPU) @ 15873015.87/s (n=10000000)
       L1w:  0 wallclock secs ( 1.53 usr +  0.01 sys =  1.54 CPU) @ 6493506.49/s (n=10000000)
       L2r:  2 wallclock secs ( 1.58 usr + -0.00 sys =  1.58 CPU) @ 6329113.92/s (n=10000000)
       L2w:  2 wallclock secs ( 2.35 usr +  0.02 sys =  2.37 CPU) @ 4219409.28/s (n=10000000)
       L3r:  3 wallclock secs ( 2.58 usr +  0.01 sys =  2.59 CPU) @ 3861003.86/s (n=10000000)
       L3w:  4 wallclock secs ( 3.67 usr +  0.02 sys =  3.69 CPU) @ 2710027.10/s (n=10000000)
       L4r:  4 wallclock secs ( 3.07 usr +  0.02 sys =  3.09 CPU) @ 3236245.95/s (n=10000000)
       L4w:  5 wallclock secs ( 4.69 usr +  0.01 sys =  4.70 CPU) @ 2127659.57/s (n=10000000)
       L5r:  5 wallclock secs ( 4.67 usr +  0.02 sys =  4.69 CPU) @ 2132196.16/s (n=10000000)
       L5w:  5 wallclock secs ( 5.20 usr +  0.02 sys =  5.22 CPU) @ 1915708.81/s (n=10000000)
       L6r:  6 wallclock secs ( 5.77 usr +  0.01 sys =  5.78 CPU) @ 1730103.81/s (n=10000000)
       L6w:  7 wallclock secs ( 6.35 usr +  0.01 sys =  6.36 CPU) @ 1572327.04/s (n=10000000)
          Rate  L6w  L6r  L5w  L4w  L5r  L3w  L4r  L3r  L2w  L2r  L1w  L1r
L6w  1572327/s   --  -9% -18% -26% -26% -42% -51% -59% -63% -75% -76% -90%
L6r  1730104/s  10%   -- -10% -19% -19% -36% -47% -55% -59% -73% -73% -89%
L5w  1915709/s  22%  11%   -- -10% -10% -29% -41% -50% -55% -70% -70% -88%
L4w  2127660/s  35%  23%  11%   --  -0% -21% -34% -45% -50% -66% -67% -87%
L5r  2132196/s  36%  23%  11%   0%   -- -21% -34% -45% -49% -66% -67% -87%
L3w  2710027/s  72%  57%  41%  27%  27%   -- -16% -30% -36% -57% -58% -83%
L4r  3236246/s 106%  87%  69%  52%  52%  19%   -- -16% -23% -49% -50% -80%
L3r  3861004/s 146% 123% 102%  81%  81%  42%  19%   --  -8% -39% -41% -76%
L2w  4219409/s 168% 144% 120%  98%  98%  56%  30%   9%   -- -33% -35% -73%
L2r  6329114/s 303% 266% 230% 197% 197% 134%  96%  64%  50%   --  -3% -60%
L1w  6493506/s 313% 275% 239% 205% 205% 140% 101%  68%  54%   3%   -- -59%
L1r 15873016/s 910% 817% 729% 646% 644% 486% 390% 311% 276% 151% 144%   --

I hadn't expected the differences to be so dramatic...

Tags: .

Write a comment | Bookmark and Share

posted at: 16:38 | path: /benchmarks | permalink | 5 comments | 0 trackbacks

Leon Timmermans wrote at 2009-02-23 18:42:

I disagree. they aren't slow at all. Surely, nesting has a price, but even at six levels of nesting you can still do it millions of times per second. By the time that starts to become your bottleneck, you already have a pretty fast application.

Ask Bjørn Hansen wrote at 2009-02-24 01:01:

yeah, I agree with Leon. It'd be quite a particular application where this would ever show up as a bottleneck.

Leon Brocard wrote at 2009-02-24 08:22:

I think the problem is that you are comparing the number of times per second instead the amount of time it takes to do it. Of course it gets slower the more work you do, it's just that comparing the inverses makes it look bad when it's not.

Geoff Broadwell wrote at 2009-02-24 21:40:

If I'm reading that right, you're doing all your reads for each array/hash before the corresponding writes. You are thus measuring time spent in the "no such key, return undef" code path, rather than the "key found, return value" code path.

Carl Johnstone wrote at 2009-02-25 15:58:

Geoff, correct - reading becomes quicker if you make sure the keys exist beforehand. Of course like the other comments say, at the end of the day 1,000,000 reads/sec is hardly slow.

Comments are closed for this story.

Trackbacks are closed for this story.