Hi! Please consider following me on twitter: @hanekomu.
2009年02月23日
Perl benchmarks: Nested data structures
The conclusion first: Nested arrays and hashes in Perl are slow. They get slower the more levels you nest. For most applications this won't be a problem, but if you cache values in a hash, you might want to use as few levels of nesting as possible.
First, let's benchmark reading from and writing to nested array elements. We use different arrays for six different levels of nesting, and all element indices are constants.
use Benchmark qw(:all); our (@array1, @array2, @array3, @array4, @array5, @array6); cmpthese(timethese(10_000_000, { L1w => sub { $array1[1] = 1 }, L2w => sub { $array2[1][2] = 1 }, L3w => sub { $array3[1][2][3] = 1 }, L4w => sub { $array4[1][2][3][4] = 1 }, L5w => sub { $array5[1][2][3][4][5] = 1 }, L6w => sub { $array6[1][2][3][4][5][6] = 1 }, L1r => sub { $array1[1] }, L2r => sub { $array2[1][2] }, L3r => sub { $array3[1][2][3] }, L4r => sub { $array4[1][2][3][4] }, L5r => sub { $array5[1][2][3][4][5] }, L6r => sub { $array6[1][2][3][4][5][6] }, }));
The results:
Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
L1r: 0 wallclock secs ( 0.20 usr + -0.00 sys = 0.20 CPU) @ 50000000.00/s (n=10000000)
(warning: too few iterations for a reliable count)
L1w: 1 wallclock secs ( 0.72 usr + 0.01 sys = 0.73 CPU) @ 13698630.14/s (n=10000000)
L2r: 1 wallclock secs ( 1.19 usr + 0.01 sys = 1.20 CPU) @ 8333333.33/s (n=10000000)
L2w: 1 wallclock secs ( 2.01 usr + 0.02 sys = 2.03 CPU) @ 4926108.37/s (n=10000000)
L3r: 2 wallclock secs ( 1.82 usr + 0.02 sys = 1.84 CPU) @ 5434782.61/s (n=10000000)
L3w: 3 wallclock secs ( 2.67 usr + 0.03 sys = 2.70 CPU) @ 3703703.70/s (n=10000000)
L4r: 3 wallclock secs ( 2.63 usr + 0.01 sys = 2.64 CPU) @ 3787878.79/s (n=10000000)
L4w: 4 wallclock secs ( 3.54 usr + 0.02 sys = 3.56 CPU) @ 2808988.76/s (n=10000000)
L5r: 3 wallclock secs ( 3.04 usr + 0.00 sys = 3.04 CPU) @ 3289473.68/s (n=10000000)
L5w: 4 wallclock secs ( 4.13 usr + 0.04 sys = 4.17 CPU) @ 2398081.53/s (n=10000000)
L6r: 4 wallclock secs ( 3.76 usr + 0.01 sys = 3.77 CPU) @ 2652519.89/s (n=10000000)
L6w: 5 wallclock secs ( 4.46 usr + 0.02 sys = 4.48 CPU) @ 2232142.86/s (n=10000000)
Rate L6w L5w L6r L4w L5r L3w L4r L2w L3r L2r L1w L1r
L6w 2232143/s -- -7% -16% -21% -32% -40% -41% -55% -59% -73% -84% -96%
L5w 2398082/s 7% -- -10% -15% -27% -35% -37% -51% -56% -71% -82% -95%
L6r 2652520/s 19% 11% -- -6% -19% -28% -30% -46% -51% -68% -81% -95%
L4w 2808989/s 26% 17% 6% -- -15% -24% -26% -43% -48% -66% -79% -94%
L5r 3289474/s 47% 37% 24% 17% -- -11% -13% -33% -39% -61% -76% -93%
L3w 3703704/s 66% 54% 40% 32% 13% -- -2% -25% -32% -56% -73% -93%
L4r 3787879/s 70% 58% 43% 35% 15% 2% -- -23% -30% -55% -72% -92%
L2w 4926108/s 121% 105% 86% 75% 50% 33% 30% -- -9% -41% -64% -90%
L3r 5434783/s 143% 127% 105% 93% 65% 47% 43% 10% -- -35% -60% -89%
L2r 8333333/s 273% 248% 214% 197% 153% 125% 120% 69% 53% -- -39% -83%
L1w 13698630/s 514% 471% 416% 388% 316% 270% 262% 178% 152% 64% -- -73%
L1r 50000000/s 2140% 1985% 1785% 1680% 1420% 1250% 1220% 915% 820% 500% 265% --
Next we do the same for hashes, again with constant hash keys:
use Benchmark qw(:all); our (%hash1, %hash2, %hash3, %hash4, %hash5, %hash6); cmpthese(timethese(10_000_000, { L1w => sub { $hash1{L1} = 1 }, L2w => sub { $hash2{L1}{L2} = 1 }, L3w => sub { $hash3{L1}{L2}{L3} = 1 }, L4w => sub { $hash4{L1}{L2}{L3}{L4} = 1 }, L5w => sub { $hash5{L1}{L2}{L3}{L4}{L5} = 1 }, L6w => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} = 1 }, L1r => sub { $hash1{L1} }, L2r => sub { $hash2{L1}{L2} }, L3r => sub { $hash3{L1}{L2}{L3} }, L4r => sub { $hash4{L1}{L2}{L3}{L4} }, L5r => sub { $hash5{L1}{L2}{L3}{L4}{L5} }, L6r => sub { $hash6{L1}{L2}{L3}{L4}{L5}{L6} }, }));
The results:
Benchmark: timing 10000000 iterations of L1r, L1w, L2r, L2w, L3r, L3w, L4r, L4w, L5r, L5w, L6r, L6w...
L1r: 0 wallclock secs ( 0.63 usr + -0.00 sys = 0.63 CPU) @ 15873015.87/s (n=10000000)
L1w: 0 wallclock secs ( 1.53 usr + 0.01 sys = 1.54 CPU) @ 6493506.49/s (n=10000000)
L2r: 2 wallclock secs ( 1.58 usr + -0.00 sys = 1.58 CPU) @ 6329113.92/s (n=10000000)
L2w: 2 wallclock secs ( 2.35 usr + 0.02 sys = 2.37 CPU) @ 4219409.28/s (n=10000000)
L3r: 3 wallclock secs ( 2.58 usr + 0.01 sys = 2.59 CPU) @ 3861003.86/s (n=10000000)
L3w: 4 wallclock secs ( 3.67 usr + 0.02 sys = 3.69 CPU) @ 2710027.10/s (n=10000000)
L4r: 4 wallclock secs ( 3.07 usr + 0.02 sys = 3.09 CPU) @ 3236245.95/s (n=10000000)
L4w: 5 wallclock secs ( 4.69 usr + 0.01 sys = 4.70 CPU) @ 2127659.57/s (n=10000000)
L5r: 5 wallclock secs ( 4.67 usr + 0.02 sys = 4.69 CPU) @ 2132196.16/s (n=10000000)
L5w: 5 wallclock secs ( 5.20 usr + 0.02 sys = 5.22 CPU) @ 1915708.81/s (n=10000000)
L6r: 6 wallclock secs ( 5.77 usr + 0.01 sys = 5.78 CPU) @ 1730103.81/s (n=10000000)
L6w: 7 wallclock secs ( 6.35 usr + 0.01 sys = 6.36 CPU) @ 1572327.04/s (n=10000000)
Rate L6w L6r L5w L4w L5r L3w L4r L3r L2w L2r L1w L1r
L6w 1572327/s -- -9% -18% -26% -26% -42% -51% -59% -63% -75% -76% -90%
L6r 1730104/s 10% -- -10% -19% -19% -36% -47% -55% -59% -73% -73% -89%
L5w 1915709/s 22% 11% -- -10% -10% -29% -41% -50% -55% -70% -70% -88%
L4w 2127660/s 35% 23% 11% -- -0% -21% -34% -45% -50% -66% -67% -87%
L5r 2132196/s 36% 23% 11% 0% -- -21% -34% -45% -49% -66% -67% -87%
L3w 2710027/s 72% 57% 41% 27% 27% -- -16% -30% -36% -57% -58% -83%
L4r 3236246/s 106% 87% 69% 52% 52% 19% -- -16% -23% -49% -50% -80%
L3r 3861004/s 146% 123% 102% 81% 81% 42% 19% -- -8% -39% -41% -76%
L2w 4219409/s 168% 144% 120% 98% 98% 56% 30% 9% -- -33% -35% -73%
L2r 6329114/s 303% 266% 230% 197% 197% 134% 96% 64% 50% -- -3% -60%
L1w 6493506/s 313% 275% 239% 205% 205% 140% 101% 68% 54% 3% -- -59%
L1r 15873016/s 910% 817% 729% 646% 644% 486% 390% 311% 276% 151% 144% --
I hadn't expected the differences to be so dramatic...
Tags: benchmarks.
posted at: 16:38 | path: /benchmarks | permalink | 5 comments | 0 trackbacks