Revisions to How to efficiently generate large, uniformly distributed, random integers in bash?

replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/

Source Link

edited Apr 13, 2017 at 12:36

1

All ideas by the other answerers were great. The answers by terdon terdon, J.F. Sebastian J.F. Sebastian, and jimmij jimmij used external tools to do the task in a simple and efficient manner. However, I preferred a true bash solution for maximum portability, and maybe a little bit, simply out of love for bash ;)

Ramesh Ramesh's and l0b0 l0b0's answers used /dev/urandom or /dev/random in combination with od. That's good, however, their approaches had the disadvantage of only being able to sample random integers in the range 0 to 2⁸ⁿ-1 for some n, since this method samples bytes, i.e., bitstrings of length 8. These are quite big jumps with increasing n.

Finally, Falco Falco's answer describes the general idea how this could be done for arbitrary ranges (not only powers of two). Basically, for a given range {0..max}, we can determine what the next power of two is, i.e., exactly how many bits are required to represent max as a bitstring. Then we can sample just that many bits and see whether this bistring, as an integer, is greater than max. If so, repeat. Since we sample just as many bits as are required to represent max, each iteration has a probability greater or equal than 50% of succeeding (50% in the worst case, 100% in the best case). So this is very efficient.

All ideas by the other answerers were great. The answers by terdon, J.F. Sebastian, and jimmij used external tools to do the task in a simple and efficient manner. However, I preferred a true bash solution for maximum portability, and maybe a little bit, simply out of love for bash ;)

Ramesh's and l0b0's answers used /dev/urandom or /dev/random in combination with od. That's good, however, their approaches had the disadvantage of only being able to sample random integers in the range 0 to 2⁸ⁿ-1 for some n, since this method samples bytes, i.e., bitstrings of length 8. These are quite big jumps with increasing n.

Finally, Falco's answer describes the general idea how this could be done for arbitrary ranges (not only powers of two). Basically, for a given range {0..max}, we can determine what the next power of two is, i.e., exactly how many bits are required to represent max as a bitstring. Then we can sample just that many bits and see whether this bistring, as an integer, is greater than max. If so, repeat. Since we sample just as many bits as are required to represent max, each iteration has a probability greater or equal than 50% of succeeding (50% in the worst case, 100% in the best case). So this is very efficient.

All ideas by the other answerers were great. The answers by terdon, J.F. Sebastian, and jimmij used external tools to do the task in a simple and efficient manner. However, I preferred a true bash solution for maximum portability, and maybe a little bit, simply out of love for bash ;)

Ramesh's and l0b0's answers used /dev/urandom or /dev/random in combination with od. That's good, however, their approaches had the disadvantage of only being able to sample random integers in the range 0 to 2⁸ⁿ-1 for some n, since this method samples bytes, i.e., bitstrings of length 8. These are quite big jumps with increasing n.

Finally, Falco's answer describes the general idea how this could be done for arbitrary ranges (not only powers of two). Basically, for a given range {0..max}, we can determine what the next power of two is, i.e., exactly how many bits are required to represent max as a bitstring. Then we can sample just that many bits and see whether this bistring, as an integer, is greater than max. If so, repeat. Since we sample just as many bits as are required to represent max, each iteration has a probability greater or equal than 50% of succeeding (50% in the worst case, 100% in the best case). So this is very efficient.

replaced http://askubuntu.com/ with https://askubuntu.com/

Source Link

edited Apr 13, 2017 at 12:22

Community Bot

1

My script is basically a concrete implementation of Falco's answer, written in pure bash and highly efficient since it uses bash's built-in bitwise operations to sample bitstrings of the desired length. It additionally honors an idea by Eliah Kagan Eliah Kagan that suggests to use the built-in $RANDOM variable by concatening bitstrings resulting from repeated invocations of $RANDOM. I actually implemented both the possibilities to use /dev/urandom and $RANDOM. By default the above script uses $RANDOM. (And ok, if using /dev/urandom we need od and tr, but these are backed by POSIX.)

This uses the idea idea mentioned by Eliah Kagan. Basically, since $RANDOM samples a 15-bit integer, we can use $((RANDOM<<15|RANDOM)) to sample a 30-bit integer. That means, shift a first invocation of $RANDOM by 15 bits to the left, and apply a bitwise or with a second invocation of $RANDOM, effectively concatening two independently sampled bitstrings (or at least as independent as bash's built-in $RANDOM goes).

Inspired by this this, I might try to use dieharder to test and benchmark this PRNG, and put my findings in here. :-)

My script is basically a concrete implementation of Falco's answer, written in pure bash and highly efficient since it uses bash's built-in bitwise operations to sample bitstrings of the desired length. It additionally honors an idea by Eliah Kagan that suggests to use the built-in $RANDOM variable by concatening bitstrings resulting from repeated invocations of $RANDOM. I actually implemented both the possibilities to use /dev/urandom and $RANDOM. By default the above script uses $RANDOM. (And ok, if using /dev/urandom we need od and tr, but these are backed by POSIX.)

This uses the idea mentioned by Eliah Kagan. Basically, since $RANDOM samples a 15-bit integer, we can use $((RANDOM<<15|RANDOM)) to sample a 30-bit integer. That means, shift a first invocation of $RANDOM by 15 bits to the left, and apply a bitwise or with a second invocation of $RANDOM, effectively concatening two independently sampled bitstrings (or at least as independent as bash's built-in $RANDOM goes).

Inspired by this, I might try to use dieharder to test and benchmark this PRNG, and put my findings in here. :-)

My script is basically a concrete implementation of Falco's answer, written in pure bash and highly efficient since it uses bash's built-in bitwise operations to sample bitstrings of the desired length. It additionally honors an idea by Eliah Kagan that suggests to use the built-in $RANDOM variable by concatening bitstrings resulting from repeated invocations of $RANDOM. I actually implemented both the possibilities to use /dev/urandom and $RANDOM. By default the above script uses $RANDOM. (And ok, if using /dev/urandom we need od and tr, but these are backed by POSIX.)

This uses the idea mentioned by Eliah Kagan. Basically, since $RANDOM samples a 15-bit integer, we can use $((RANDOM<<15|RANDOM)) to sample a 30-bit integer. That means, shift a first invocation of $RANDOM by 15 bits to the left, and apply a bitwise or with a second invocation of $RANDOM, effectively concatening two independently sampled bitstrings (or at least as independent as bash's built-in $RANDOM goes).

Inspired by this, I might try to use dieharder to test and benchmark this PRNG, and put my findings in here. :-)

fixed a small bug when using `/dev/urandom`, as mentioned in a comment by @J.F.Sebastian

Source Link

edited Sep 27, 2014 at 18:27

Malte Skoruppa

1.7k
1
15
20

#!/usr/bin/env bash
#
# Generates a random integer in a given range

# computes the ceiling of log2
# i.e., for parameter x returns the lowest integer l such that 2**l >= x
log2() {
  local x=$1 n=1 l=0
  while (( x>n && n>0 ))
  do
    let n*=2 l++
  done
  echo $l
}

# uses $RANDOM to generate an n-bit random bitstring uniformly at random
#  (if we assume $RANDOM is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 60 bits
get_n_rand_bits() {
  local n=$1 rnd=$RANDOM rnd_bitlen=15
  while (( rnd_bitlen < n ))
  do
    rnd=$(( rnd<<15|$RANDOM ))
    let rnd_bitlen+=15
  done
  echo $(( rnd>>(rnd_bitlen-n) ))
}

# alternative implementation of get_n_rand_bits:
# uses /dev/urandom to generate an n-bit random bitstring uniformly at random
#  (if we assume /dev/urandom is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 56 bits
get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uformat=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

# for parameter max, generates an integer in the range {0..max} uniformly at random
# max can be an arbitrary integer, needs not be a power of 2
rand() {
  local rnd max=$1
  # get number of bits needed to represent $max
  local bitlen=$(log2 $((max+1)))
  while
    # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM
    rnd=$(get_n_rand_bits $bitlen)
    (( rnd > max ))
  do :
  done
  echo $rnd
}

# MAIN SCRIPT

# check number of parameters
if (( $# != 1 && $# != 2 ))
then
  cat <<EOF 1>&2
Usage: $(basename $0) [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
EOF
  exit 1
fi

# If we have one parameter, set min to 0 and max to $1
# If we have two parameters, set min to $1 and max to $2
max=0
while (( $# > 0 ))
do
  min=$max
  max=$1
  shift
done

# ensure that min <= max
if (( min > max ))
then
  echo "$(basename $0): error: min is greater than max" 1>&2
  exit 1
fi

# need absolute value of diff since min (and also max) may be negative
diff=$((max-min)) && diff=${diff#-}

echo $(( $(rand $diff) + min ))

get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uformat=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

#!/usr/bin/env bash
#
# Generates a random integer in a given range

# computes the ceiling of log2
# i.e., for parameter x returns the lowest integer l such that 2**l >= x
log2() {
  local x=$1 n=1 l=0
  while (( x>n && n>0 ))
  do
    let n*=2 l++
  done
  echo $l
}

# uses $RANDOM to generate an n-bit random bitstring uniformly at random
#  (if we assume $RANDOM is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 60 bits
get_n_rand_bits() {
  local n=$1 rnd=$RANDOM rnd_bitlen=15
  while (( rnd_bitlen < n ))
  do
    rnd=$(( rnd<<15|$RANDOM ))
    let rnd_bitlen+=15
  done
  echo $(( rnd>>(rnd_bitlen-n) ))
}

# alternative implementation of get_n_rand_bits:
# uses /dev/urandom to generate an n-bit random bitstring uniformly at random
#  (if we assume /dev/urandom is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 56 bits
get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=u /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

# for parameter max, generates an integer in the range {0..max} uniformly at random
# max can be an arbitrary integer, needs not be a power of 2
rand() {
  local rnd max=$1
  # get number of bits needed to represent $max
  local bitlen=$(log2 $((max+1)))
  while
    # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM
    rnd=$(get_n_rand_bits $bitlen)
    (( rnd > max ))
  do :
  done
  echo $rnd
}

# MAIN SCRIPT

# check number of parameters
if (( $# != 1 && $# != 2 ))
then
  cat <<EOF 1>&2
Usage: $(basename $0) [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
EOF
  exit 1
fi

# If we have one parameter, set min to 0 and max to $1
# If we have two parameters, set min to $1 and max to $2
max=0
while (( $# > 0 ))
do
  min=$max
  max=$1
  shift
done

# ensure that min <= max
if (( min > max ))
then
  echo "$(basename $0): error: min is greater than max" 1>&2
  exit 1
fi

# need absolute value of diff since min (and also max) may be negative
diff=$((max-min)) && diff=${diff#-}

echo $(( $(rand $diff) + min ))

get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=u /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

#!/usr/bin/env bash
#
# Generates a random integer in a given range

# computes the ceiling of log2
# i.e., for parameter x returns the lowest integer l such that 2**l >= x
log2() {
  local x=$1 n=1 l=0
  while (( x>n && n>0 ))
  do
    let n*=2 l++
  done
  echo $l
}

# uses $RANDOM to generate an n-bit random bitstring uniformly at random
#  (if we assume $RANDOM is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 60 bits
get_n_rand_bits() {
  local n=$1 rnd=$RANDOM rnd_bitlen=15
  while (( rnd_bitlen < n ))
  do
    rnd=$(( rnd<<15|$RANDOM ))
    let rnd_bitlen+=15
  done
  echo $(( rnd>>(rnd_bitlen-n) ))
}

# alternative implementation of get_n_rand_bits:
# uses /dev/urandom to generate an n-bit random bitstring uniformly at random
#  (if we assume /dev/urandom is uniformly distributed)
# takes the length n of the bitstring as parameter, n can be up to 56 bits
get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

# for parameter max, generates an integer in the range {0..max} uniformly at random
# max can be an arbitrary integer, needs not be a power of 2
rand() {
  local rnd max=$1
  # get number of bits needed to represent $max
  local bitlen=$(log2 $((max+1)))
  while
    # could use get_n_rand_bits_alt instead if /dev/urandom is preferred over $RANDOM
    rnd=$(get_n_rand_bits $bitlen)
    (( rnd > max ))
  do :
  done
  echo $rnd
}

# MAIN SCRIPT

# check number of parameters
if (( $# != 1 && $# != 2 ))
then
  cat <<EOF 1>&2
Usage: $(basename $0) [min] max

Returns an integer distributed uniformly at random in the range {min..max}
min defaults to 0
(max - min) can be up to 2**60-1  
EOF
  exit 1
fi

# If we have one parameter, set min to 0 and max to $1
# If we have two parameters, set min to $1 and max to $2
max=0
while (( $# > 0 ))
do
  min=$max
  max=$1
  shift
done

# ensure that min <= max
if (( min > max ))
then
  echo "$(basename $0): error: min is greater than max" 1>&2
  exit 1
fi

# need absolute value of diff since min (and also max) may be negative
diff=$((max-min)) && diff=${diff#-}

echo $(( $(rand $diff) + min ))

get_n_rand_bits_alt() {
  local n=$1
  local nb_bytes=$(( (n+7)/8 ))
  local rnd=$(od --read-bytes=$nb_bytes --address-radix=n --format=uL /dev/urandom | tr --delete " ")
  echo $(( rnd>>(nb_bytes*8-n) ))
}

fixed a small bug in the definition of the function log2, which returned 1 for $(log2 1) instead of returning 0. This did not affect correctness of the script as a whole in any way, and is a rather cosmetic fix.

Source Link

edited Sep 27, 2014 at 17:55

Malte Skoruppa

1.7k
1
15
20

Loading

Source Link

answered Sep 26, 2014 at 23:25

Malte Skoruppa

1.7k
1
15
20

Loading

Stack Exchange Network

Return to Answer