RegalCoding.com

A Programming Blog

Hello mobile user!

testing array_rand

While testing a script I had made that selects an output link at random (for split testing purposes) I began to suspect that perhaps the array_rand function wasn't quite as random as I'd hoped. Since I've just upgraded to PHP 7.3 on my servers, I thought perhaps there was a change in the randomization and so I decided to test it.

First, I created a simple script that emulates the array format and selection method I was using in my other script. Then I ran the test 10,000 times and displayed the result, using the script below:

<?php
$arr['test1']['urls'][]='http://url1.com';
$arr['test2']['urls'][]='http://url2.com';
$arr['test2']['urls'][]='http://url3.com';
$arr['test2']['urls'][]='http://url4.com';
$arr['test2']['urls'][]='http://url5.com';
$arr['test3']['urls'][]='http://url6.com';
$arr['test4']['urls'][]='http://url7.com';
for ($i=0; $i<10000; $i++){
	$key1 = array_rand($arr,1);
	$url = $arr[$key1]['urls'][array_rand($arr[$key1]['urls'],1)];
	$key_selected[$key1]++;
	$url_selected[$url]++;
}
echo "<pre>\n\$key_selected:<br>\n";
print_r($key_selected);
echo "<br>\n\$url_selected:<br>\n";
print_r($url_selected);
echo "</pre>";
?>


From my own observations of a dozen or so clicks, I kept seeing url1.com showing up much more frequently than the others, which made me wonder just how random this actually was. After viewing the results, I was pleasantly surprised to find that with a larger sample it does indeed appear to be functioning as one would expect:

$key_selected:
Array
(
    [test2] => 2455
    [test1] => 2466
    [test3] => 2523
    [test4] => 2556
)


$url_selected:
Array
(
    [http://url5.com] => 618
    [http://url1.com] => 2466
    [http://url2.com] => 624
    [http://url6.com] => 2523
    [http://url4.com] => 620
    [http://url7.com] => 2556
    [http://url3.com] => 593
)


From the data above I can see that the first set of keys (test1-test4) are being chosen relatively randomly, with a variation of only about +/-2% from what would be expected (~2500 each).

The URLs data also looks good; url1, 6, and 7 all came in around 2500 each, with the remaining urls being a subset of the key test2, and those 4 adding up to a total of 2455.

So, it appears as though my initial concerns were unfounded, and I simply needed to look at a larger sample size.

Category: General Coding