Hi! Please consider following me on twitter: @hanekomu.
2009年02月23日
Perl benchmarks: Nested data structures
The conclusion first: Nested arrays and hashes in Perl are slow. They get slower the more levels you nest. For most applications this won't be a problem, but if you cache values in a hash, you might want to use as few levels of nesting as possible.
First, let's benchmark reading from and writing to nested array elements. We use different arrays for six different levels of nesting, and all element indices are constants.
use Benchmark qw(:all); our (@array1, @array2, @array3, @array4, @array5, @array6); cmpthese(timethese(10_000_000, { L1w => sub { $array1[1] = 1 }, L2w => sub { $array2[1][2] = 1 }, L3w => sub { $array3[1][2][3] = 1 }, L4w => sub { $array4[1][2][3][4] = 1 }, L5w => sub { $array5[1][2][3][4][5] = 1 }, L6w => sub { $array6[1][2][3][4][5][6] = 1 }, L1r => sub { $array1[1] }, L2r => sub { $array2[1][2] }, L3r => sub { $array3[1][2][3] }, L4r => sub { $array4[1][2][3][4] }, L5r => sub { $array5[1][2][3][4][5] }, L6r => sub { $array6[1][2][3][4][5][6] }, }));
The results:
Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
L1r: 0 wallclock secs ( 0.20 usr + -0.00 sys = 0.20 CPU) @ 50000000.00/s (n=10000000)
(warning: too few iterations for a reliable count)
L1w: 1 wallclock secs ( 0.72 usr + 0.01 sys = 0.73 CPU) @ 13698630.14/s (n=10000000)
L2r: 1 wallclock secs ( 1.19 usr + 0.01 sys = 1.20 CPU) @ 8333333.33/s (n=10000000)
L2w: 1 wallclock secs ( 2.01 usr + 0.02 sys = 2.03 CPU) @ 4926108.37/s (n=10000000)
L3r: 2 wallclock secs ( 1.82 usr + 0.02 sys = 1.84 CPU) @ 5434782.61/s (n=10000000)
L3w: 3 wallclock secs ( 2.67 usr + 0.03 sys = 2.70 CPU) @ 3703703.70/s (n=10000000)
L4r: 3 wallclock secs ( 2.63 usr + 0.01 sys = 2.64 CPU) @ 3787878.79/s (n=10000000)
L4w: 4 wallclock secs ( 3.54 usr + 0.02 sys = 3.56 CPU) @ 2808988.76/s (n=10000000)
L5r: 3 wallclock secs ( 3.04 usr + 0.00 sys = 3.04 CPU) @ 3289473.68/s (n=10000000)
L5w: 4 wallclock secs ( 4.13 usr + 0.04 sys = 4.17 CPU) @ 2398081.53/s (n=10000000)
L6r: 4 wallclock secs ( 3.76 usr + 0.01 sys = 3.77 CPU) @ 2652519.89/s (n=10000000)
L6w: 5 wallclock secs ( 4.46 usr + 0.02 sys = 4.48 CPU) @ 2232142.86/s (n=10000000)
Rate L6w L5w L6r L4w L5r L3w L4r L2w L3r L2r L1w L1r
L6w 2232143/s -- -7% -16% -21% -32% -40% -41% -55% -59% -73% -84% -96%
L5w 2398082/s 7% -- -10% -15% -27% -35% -37% -51% -56% -71% -82% -95%
L6r 2652520/s 19% 11% -- -6% -19% -28% -30% -46% -51% -68% -81% -95%
L4w 2808989/s 26% 17% 6% -- -15% -24% -26% -43% -48% -66% -79% -94%
L5r 3289474/s 47% 37% 24% 17% -- -11% -13% -33% -39% -61% -76% -93%
L3w 3703704/s 66% 54% 40% 32% 13% -- -2% -25% -32% -56% -73% -93%
L4r 3787879/s 70% 58% 43% 35% 15% 2% -- -23% -30% -55% -72% -92%
L2w 4926108/s 121% 105% 86% 75% 50% 33% 30% -- -9% -41% -64% -90%
L3r 5434783/s 143% 127% 105% 93% 65% 47% 43% 10% -- -35% -60% -89%
L2r 8333333/s 273% 248% 214% 197% 153% 125% 120% 69% 53% -- -39% -83%
L1w 13698630/s 514% 471% 416% 388% 316% 270% 262% 178% 152% 64% -- -73%
L1r 50000000/s 2140% 1985% 1785% 1680% 1420% 1250% 1220% 915% 820% 500% 265% --
Next we do the same for hashes, again with constant hash keys:
use Benchmark qw(:all); our (%hash1, %hash2, %hash3, %hash4, %hash5, %hash6); cmpthese(timethese(10_000_000, { L1w => sub { $hash1{L1} = 1 }, L2w => sub { $hash2{L1}{L2} = 1 }, L3w => sub { $hash3{L1}{L2}{L3} = 1 }, L4w => sub { $hash4{L1}{L2}{L3}{L4} = 1 }, L5w => sub { $hash5{L1}{L2}{L3}{L4}{L5} = 1 }, L6w => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} = 1 }, L1r => sub { $hash1{L1} }, L2r => sub { $hash2{L1}{L2} }, L3r => sub { $hash3{L1}{L2}{L3} }, L4r => sub { $hash4{L1}{L2}{L3}{L4} }, L5r => sub { $hash5{L1}{L2}{L3}{L4}{L5} }, L6r => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} }, }));
The results:
Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
L1r: 0 wallclock secs ( 0.63 usr + -0.00 sys = 0.63 CPU) @ 15873015.87/s (n=10000000)
L1w: 0 wallclock secs ( 1.53 usr + 0.01 sys = 1.54 CPU) @ 6493506.49/s (n=10000000)
L2r: 2 wallclock secs ( 1.58 usr + -0.00 sys = 1.58 CPU) @ 6329113.92/s (n=10000000)
L2w: 2 wallclock secs ( 2.35 usr + 0.02 sys = 2.37 CPU) @ 4219409.28/s (n=10000000)
L3r: 3 wallclock secs ( 2.58 usr + 0.01 sys = 2.59 CPU) @ 3861003.86/s (n=10000000)
L3w: 4 wallclock secs ( 3.67 usr + 0.02 sys = 3.69 CPU) @ 2710027.10/s (n=10000000)
L4r: 4 wallclock secs ( 3.07 usr + 0.02 sys = 3.09 CPU) @ 3236245.95/s (n=10000000)
L4w: 5 wallclock secs ( 4.69 usr + 0.01 sys = 4.70 CPU) @ 2127659.57/s (n=10000000)
L5r: 5 wallclock secs ( 4.67 usr + 0.02 sys = 4.69 CPU) @ 2132196.16/s (n=10000000)
L5w: 5 wallclock secs ( 5.20 usr + 0.02 sys = 5.22 CPU) @ 1915708.81/s (n=10000000)
L6r: 6 wallclock secs ( 5.77 usr + 0.01 sys = 5.78 CPU) @ 1730103.81/s (n=10000000)
L6w: 7 wallclock secs ( 6.35 usr + 0.01 sys = 6.36 CPU) @ 1572327.04/s (n=10000000)
Rate L6w L6r L5w L4w L5r L3w L4r L3r L2w L2r L1w L1r
L6w 1572327/s -- -9% -18% -26% -26% -42% -51% -59% -63% -75% -76% -90%
L6r 1730104/s 10% -- -10% -19% -19% -36% -47% -55% -59% -73% -73% -89%
L5w 1915709/s 22% 11% -- -10% -10% -29% -41% -50% -55% -70% -70% -88%
L4w 2127660/s 35% 23% 11% -- -0% -21% -34% -45% -50% -66% -67% -87%
L5r 2132196/s 36% 23% 11% 0% -- -21% -34% -45% -49% -66% -67% -87%
L3w 2710027/s 72% 57% 41% 27% 27% -- -16% -30% -36% -57% -58% -83%
L4r 3236246/s 106% 87% 69% 52% 52% 19% -- -16% -23% -49% -50% -80%
L3r 3861004/s 146% 123% 102% 81% 81% 42% 19% -- -8% -39% -41% -76%
L2w 4219409/s 168% 144% 120% 98% 98% 56% 30% 9% -- -33% -35% -73%
L2r 6329114/s 303% 266% 230% 197% 197% 134% 96% 64% 50% -- -3% -60%
L1w 6493506/s 313% 275% 239% 205% 205% 140% 101% 68% 54% 3% -- -59%
L1r 15873016/s 910% 817% 729% 646% 644% 486% 390% 311% 276% 151% 144% --
I hadn't expected the differences to be so dramatic...
Tags: benchmarks.
posted at: 16:38 | path: /benchmarks | permalink | 5 comments | 0 trackbacks
2009年02月16日
Benchmarking immutable Mouse
Dann has added benchmarks for immutable Mouse to App::Benchmark::Accessors; I've also updated the benchmarks page.
Tags: benchmarks, Moose.
posted at: 13:14 | path: /moose | permalink | 0 comments | 0 trackbacks
2009年02月15日
Dissecting the Moose Part 5 - Accessor Generator Benchmarks Updated
The results of the accessor generator benchmarks generated by App::Benchmark::Accessors now have a permanent page. I will update the page from time to time as new versions of the accessor generators are released.
Tags: benchmarks, Moose.
posted at: 21:12 | path: /moose | permalink | 0 comments | 0 trackbacks
2008年11月04日
Persuading the Cloud to run your benchmarks
I've had quite a bit of feedback about the accessor generator benchmarks; mostly requests to add this or that module to the benchmarks. Adding the code, rerunning the benchmarks and updating the blog entries took quite a bit of time, so I thought I could automate the process.
I've made the accessor generator benchmarks into a CPAN distribution called App::Benchmark::Accessors. That way you can run them yourself and the prerequisite modules will install automatically as well (if you use the CPAN shell).
At first I thought about simply having the benchmark programs in the bin/ directory, but then Ricardo Signes (rjbs) had a really cool idea: Run the benchmarks as part of the test suite and report them in a way that doesn't interfere with the TAP output.
That way the CPAN testers will run your benchmarks for you and you will get nice results for many different platforms and different perl versions. For example, take a look at the benchmark reports for App::Benchmark::Accessors.
The foundation for running benchmarks in test suites is in App::Benchmark, so you can use that to easily write your own cloudy benchmarks. However, be nice to the CPAN testers and don't run benchmarks that will take forever or use all of their machines' resources.
Tags: benchmarks.
posted at: 14:12 | path: /dev | permalink | 0 comments | 0 trackbacks