Hi! Please consider following me on twitter: @hanekomu.

Perl benchmarks: Nested data structures

The conclusion first: Nested arrays and hashes in Perl are slow. They get slower the more levels you nest. For most applications this won't be a problem, but if you cache values in a hash, you might want to use as few levels of nesting as possible.

First, let's benchmark reading from and writing to nested array elements. We use different arrays for six different levels of nesting, and all element indices are constants.

use Benchmark qw(:all);

our (@array1, @array2, @array3, @array4, @array5, @array6);

cmpthese(timethese(10_000_000, {
    L1w => sub { $array1[1] = 1 },
    L2w => sub { $array2[1][2] = 1 },
    L3w => sub { $array3[1][2][3] = 1 },
    L4w => sub { $array4[1][2][3][4] = 1 },
    L5w => sub { $array5[1][2][3][4][5] = 1 },
    L6w => sub { $array6[1][2][3][4][5][6] = 1 },
    L1r => sub { $array1[1] },
    L2r => sub { $array2[1][2] },
    L3r => sub { $array3[1][2][3] },
    L4r => sub { $array4[1][2][3][4] },
    L5r => sub { $array5[1][2][3][4][5] },
    L6r => sub { $array6[1][2][3][4][5][6] },
}));

The results:

Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
       L1r:  0 wallclock secs ( 0.20 usr + -0.00 sys =  0.20 CPU) @ 50000000.00/s (n=10000000)
            (warning: too few iterations for a reliable count)
       L1w:  1 wallclock secs ( 0.72 usr +  0.01 sys =  0.73 CPU) @ 13698630.14/s (n=10000000)
       L2r:  1 wallclock secs ( 1.19 usr +  0.01 sys =  1.20 CPU) @ 8333333.33/s (n=10000000)
       L2w:  1 wallclock secs ( 2.01 usr +  0.02 sys =  2.03 CPU) @ 4926108.37/s (n=10000000)
       L3r:  2 wallclock secs ( 1.82 usr +  0.02 sys =  1.84 CPU) @ 5434782.61/s (n=10000000)
       L3w:  3 wallclock secs ( 2.67 usr +  0.03 sys =  2.70 CPU) @ 3703703.70/s (n=10000000)
       L4r:  3 wallclock secs ( 2.63 usr +  0.01 sys =  2.64 CPU) @ 3787878.79/s (n=10000000)
       L4w:  4 wallclock secs ( 3.54 usr +  0.02 sys =  3.56 CPU) @ 2808988.76/s (n=10000000)
       L5r:  3 wallclock secs ( 3.04 usr +  0.00 sys =  3.04 CPU) @ 3289473.68/s (n=10000000)
       L5w:  4 wallclock secs ( 4.13 usr +  0.04 sys =  4.17 CPU) @ 2398081.53/s (n=10000000)
       L6r:  4 wallclock secs ( 3.76 usr +  0.01 sys =  3.77 CPU) @ 2652519.89/s (n=10000000)
       L6w:  5 wallclock secs ( 4.46 usr +  0.02 sys =  4.48 CPU) @ 2232142.86/s (n=10000000)
          Rate   L6w   L5w   L6r   L4w   L5r   L3w   L4r  L2w  L3r  L2r  L1w  L1r
L6w  2232143/s    --   -7%  -16%  -21%  -32%  -40%  -41% -55% -59% -73% -84% -96%
L5w  2398082/s    7%    --  -10%  -15%  -27%  -35%  -37% -51% -56% -71% -82% -95%
L6r  2652520/s   19%   11%    --   -6%  -19%  -28%  -30% -46% -51% -68% -81% -95%
L4w  2808989/s   26%   17%    6%    --  -15%  -24%  -26% -43% -48% -66% -79% -94%
L5r  3289474/s   47%   37%   24%   17%    --  -11%  -13% -33% -39% -61% -76% -93%
L3w  3703704/s   66%   54%   40%   32%   13%    --   -2% -25% -32% -56% -73% -93%
L4r  3787879/s   70%   58%   43%   35%   15%    2%    -- -23% -30% -55% -72% -92%
L2w  4926108/s  121%  105%   86%   75%   50%   33%   30%   --  -9% -41% -64% -90%
L3r  5434783/s  143%  127%  105%   93%   65%   47%   43%  10%   -- -35% -60% -89%
L2r  8333333/s  273%  248%  214%  197%  153%  125%  120%  69%  53%   -- -39% -83%
L1w 13698630/s  514%  471%  416%  388%  316%  270%  262% 178% 152%  64%   -- -73%
L1r 50000000/s 2140% 1985% 1785% 1680% 1420% 1250% 1220% 915% 820% 500% 265%   --

Next we do the same for hashes, again with constant hash keys:

use Benchmark qw(:all);

our (%hash1, %hash2, %hash3, %hash4, %hash5, %hash6);

cmpthese(timethese(10_000_000, {
    L1w => sub { $hash1{L1} = 1 },
    L2w => sub { $hash2{L1}{L2} = 1 },
    L3w => sub { $hash3{L1}{L2}{L3} = 1 },
    L4w => sub { $hash4{L1}{L2}{L3}{L4} = 1 },
    L5w => sub { $hash5{L1}{L2}{L3}{L4}{L5} = 1 },
    L6w => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} = 1 },
    L1r => sub { $hash1{L1} },
    L2r => sub { $hash2{L1}{L2} },
    L3r => sub { $hash3{L1}{L2}{L3} },
    L4r => sub { $hash4{L1}{L2}{L3}{L4} },
    L5r => sub { $hash5{L1}{L2}{L3}{L4}{L5} },
    L6r => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} },
}));

The results:

Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
       L1r:  0 wallclock secs ( 0.63 usr + -0.00 sys =  0.63 CPU) @ 15873015.87/s (n=10000000)
       L1w:  0 wallclock secs ( 1.53 usr +  0.01 sys =  1.54 CPU) @ 6493506.49/s (n=10000000)
       L2r:  2 wallclock secs ( 1.58 usr + -0.00 sys =  1.58 CPU) @ 6329113.92/s (n=10000000)
       L2w:  2 wallclock secs ( 2.35 usr +  0.02 sys =  2.37 CPU) @ 4219409.28/s (n=10000000)
       L3r:  3 wallclock secs ( 2.58 usr +  0.01 sys =  2.59 CPU) @ 3861003.86/s (n=10000000)
       L3w:  4 wallclock secs ( 3.67 usr +  0.02 sys =  3.69 CPU) @ 2710027.10/s (n=10000000)
       L4r:  4 wallclock secs ( 3.07 usr +  0.02 sys =  3.09 CPU) @ 3236245.95/s (n=10000000)
       L4w:  5 wallclock secs ( 4.69 usr +  0.01 sys =  4.70 CPU) @ 2127659.57/s (n=10000000)
       L5r:  5 wallclock secs ( 4.67 usr +  0.02 sys =  4.69 CPU) @ 2132196.16/s (n=10000000)
       L5w:  5 wallclock secs ( 5.20 usr +  0.02 sys =  5.22 CPU) @ 1915708.81/s (n=10000000)
       L6r:  6 wallclock secs ( 5.77 usr +  0.01 sys =  5.78 CPU) @ 1730103.81/s (n=10000000)
       L6w:  7 wallclock secs ( 6.35 usr +  0.01 sys =  6.36 CPU) @ 1572327.04/s (n=10000000)
          Rate  L6w  L6r  L5w  L4w  L5r  L3w  L4r  L3r  L2w  L2r  L1w  L1r
L6w  1572327/s   --  -9% -18% -26% -26% -42% -51% -59% -63% -75% -76% -90%
L6r  1730104/s  10%   -- -10% -19% -19% -36% -47% -55% -59% -73% -73% -89%
L5w  1915709/s  22%  11%   -- -10% -10% -29% -41% -50% -55% -70% -70% -88%
L4w  2127660/s  35%  23%  11%   --  -0% -21% -34% -45% -50% -66% -67% -87%
L5r  2132196/s  36%  23%  11%   0%   -- -21% -34% -45% -49% -66% -67% -87%
L3w  2710027/s  72%  57%  41%  27%  27%   -- -16% -30% -36% -57% -58% -83%
L4r  3236246/s 106%  87%  69%  52%  52%  19%   -- -16% -23% -49% -50% -80%
L3r  3861004/s 146% 123% 102%  81%  81%  42%  19%   --  -8% -39% -41% -76%
L2w  4219409/s 168% 144% 120%  98%  98%  56%  30%   9%   -- -33% -35% -73%
L2r  6329114/s 303% 266% 230% 197% 197% 134%  96%  64%  50%   --  -3% -60%
L1w  6493506/s 313% 275% 239% 205% 205% 140% 101%  68%  54%   3%   -- -59%
L1r 15873016/s 910% 817% 729% 646% 644% 486% 390% 311% 276% 151% 144%   --

I hadn't expected the differences to be so dramatic...

Tags: .

Write a comment | Bookmark and Share

posted at: 16:38 | path: /benchmarks | permalink | 5 comments | 0 trackbacks

Benchmarking immutable Mouse

Dann has added benchmarks for immutable Mouse to App::Benchmark::Accessors; I've also updated the benchmarks page.

Tags: , .

Write a comment | Bookmark and Share

posted at: 13:14 | path: /moose | permalink | 0 comments | 0 trackbacks

Dissecting the Moose Part 5 - Accessor Generator Benchmarks Updated

The results of the accessor generator benchmarks generated by App::Benchmark::Accessors now have a permanent page. I will update the page from time to time as new versions of the accessor generators are released.

Tags: , .

Write a comment | Bookmark and Share

posted at: 21:12 | path: /moose | permalink | 0 comments | 0 trackbacks

Persuading the Cloud to run your benchmarks

I've had quite a bit of feedback about the accessor generator benchmarks; mostly requests to add this or that module to the benchmarks. Adding the code, rerunning the benchmarks and updating the blog entries took quite a bit of time, so I thought I could automate the process.

I've made the accessor generator benchmarks into a CPAN distribution called App::Benchmark::Accessors. That way you can run them yourself and the prerequisite modules will install automatically as well (if you use the CPAN shell).

At first I thought about simply having the benchmark programs in the bin/ directory, but then Ricardo Signes (rjbs) had a really cool idea: Run the benchmarks as part of the test suite and report them in a way that doesn't interfere with the TAP output.

That way the CPAN testers will run your benchmarks for you and you will get nice results for many different platforms and different perl versions. For example, take a look at the benchmark reports for App::Benchmark::Accessors.

The foundation for running benchmarks in test suites is in App::Benchmark, so you can use that to easily write your own cloudy benchmarks. However, be nice to the CPAN testers and don't run benchmarks that will take forever or use all of their machines' resources.

Tags: .

Write a comment | Bookmark and Share

posted at: 14:12 | path: /dev | permalink | 0 comments | 0 trackbacks