testing array_rand
While testing a script I had made that selects an output link at random (for split testing purposes) I began to suspect that perhaps the array_rand function wasn't quite as random as I'd hoped. Since I've just upgraded to PHP 7.3 on my servers, I thought perhaps there was a change in the randomization and so I decided to test it.
First, I created a simple script that emulates the array format and selection method I was using in my other script. Then I ran the test 10,000 times and displayed the result, using the script below:
<?php
$arr['test1']['urls'][]='http://url1.com';
$arr['test2']['urls'][]='http://url2.com';
$arr['test2']['urls'][]='http://url3.com';
$arr['test2']['urls'][]='http://url4.com';
$arr['test2']['urls'][]='http://url5.com';
$arr['test3']['urls'][]='http://url6.com';
$arr['test4']['urls'][]='http://url7.com';
for ($i=0; $i<10000; $i++){
$key1 = array_rand($arr,1);
$url = $arr[$key1]['urls'][array_rand($arr[$key1]['urls'],1)];
$key_selected[$key1]++;
$url_selected[$url]++;
}
echo "<pre>\n\$key_selected:<br>\n";
print_r($key_selected);
echo "<br>\n\$url_selected:<br>\n";
print_r($url_selected);
echo "</pre>";
?>
From my own observations of a dozen or so clicks, I kept seeing url1.com showing up much more frequently than the others, which made me wonder just how random this actually was. After viewing the results, I was pleasantly surprised to find that with a larger sample it does indeed appear to be functioning as one would expect:
$key_selected:
Array
(
[test2] => 2455
[test1] => 2466
[test3] => 2523
[test4] => 2556
)
$url_selected:
Array
(
[http://url5.com] => 618
[http://url1.com] => 2466
[http://url2.com] => 624
[http://url6.com] => 2523
[http://url4.com] => 620
[http://url7.com] => 2556
[http://url3.com] => 593
)
From the data above I can see that the first set of keys (test1-test4) are being chosen relatively randomly, with a variation of only about +/-2% from what would be expected (~2500 each).
The URLs data also looks good; url1, 6, and 7 all came in around 2500 each, with the remaining urls being a subset of the key test2, and those 4 adding up to a total of 2455.
So, it appears as though my initial concerns were unfounded, and I simply needed to look at a larger sample size.
Category: General Coding