I’m excited to announce that peruse 0.3.0 is now available on CRAN!

Install peruse from CRAN with:

install.packages("peruse")

Alternatively, if you need the development version from GitHub install it with:

devtools::install_github("jacgoldsm/peruse")

Changes

Release 0.3.0 has significant changes to the way that Iterators are implemented, but almost all existing code will still work. In addition, the new implementation will allow for more flexibility in the code that you can write. The two main changes are:

Example of New Functionality

Suppose we want to investigate the question of how many trials it takes for a random walk with drift to reach a given threshold. We know that this would follow a Negative Binomial distribution, but how could we use the Iterator to look at this empirically in a way that easily allows us to adjust the drift term and see how the result changes? We might do something like this:

p_success <- 0.5
threshold <- 100

expr <- "
          set.seed(seeds[.iter])
          n <- n + sample(c(1,-1), 1, prob = c(p_success, 1 - p_success))
        "
iter <- Iterator(expr, list(n = 0, seeds = 1000:1e6), n)
sequence <- yield_while(iter, "n <= threshold")

plot(sequence, main = "How many iterations does it take to get to 100?")

How would we apply this same function to a grid of probabilities? We could do something like this:

probs <- seq(0.5,0.95, by = 0.01)
exprs <- rep(NA, length(probs))
num_iter <- rep(NA, length(probs))
threshold <- 20
seeds <- 1000:1e6

for (i in seq_along(probs)) {
  exprs[i] <- glue::glue(
    "
      set.seed(seeds[.iter])
      n <- n + sample(c(1,-1), 1, prob = c({probs[i]}, 1 - {probs[i]}))
    "
    )

  iter <- Iterator(exprs[i],
                   list(n = 0),
                   yield = n)
  num_iter[i] <- length(yield_while(iter, "n <= threshold"))
}

plot(x = probs,
     y = log(num_iter),
     main = "Probability of Success vs How long it takes to get to 20 (Log Scale)",
     xlab = "Probability of Success",
     ylab = "Log Number of Iterations")

This illustrates a few useful features of Iterators:

  • We can use environment variables in either our expression or our while condition to represent constants. In this case, threshold doesn’t change between iterations or between parameters. If you are creating many Iterators, it can be faster to use environment variables, since you don’t have to make a new object for each new Iterator.

  • We can use glue::glue() to generate a range of expressions that we can then fill in to create an Iterator with a range of parameters.

  • We can refer to the current iteration number in yield_while(), yield_more(), or their silent variants with the environment variable .iter.

Basic Introduction

peruse has two main distinct capabilities, related by the idea that they ‘peruse’ a sequence:

Iterator

In R, sequences are normally represented as atomic vectors. For example, here is how we might represent a weighted sequence of 50 1s and -1s, with 1 having 75% probability and -1 having 25% probability:

sample(c(-1L, 1L), size = 50L, prob = c(0.25, 0.75), replace = T)
#>  [1]  1  1  1  1 -1  1  1  1  1  1 -1  1  1 -1  1  1  1 -1 -1 -1  1  1 -1  1  1
#> [26]  1  1 -1  1 -1  1  1  1  1  1  1  1 -1  1  1 -1  1 -1  1  1 -1  1 -1  1 -1

From the perspective of the R user, all these values are generated at once. This brings up two issues:

The Iterator object in peruse is made to solve these problem. For example, suppose we want to simulate a random walk with drift that has two end conditions: success is if/when it reaches 50, and failure is if/when it reaches -50. To be efficient, we want to stop the simulation when the sequence reaches either of the end conditions.

expr <-  "
           set.seed(seeds[.iter])
           n <- n + sample(c(-1L, 1L), size = 1L, prob = c(0.25, 0.75))
         "
rwd <- Iterator(result = expr,
                initial = list(n = 0, seeds = 1:1e3),
                yield = n)


Value <- yield_while(rwd, "n != 50L & n != -50L")

plot(Value, main = "The Value of the Iterator after a Given Number of Iterations")

This scenario illustrates the capabilities of the Iterator:

Set Building

peruse develops a simple API for set comprehension. R already makes it easy to develop simple sets, like getting all the even numbers from 1 to 100:

(1:100)[which(1:100 %% 2 == 0)]
#>  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
#> [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
#> [39]  78  80  82  84  86  88  90  92  94  96  98 100

But more complex sets require comparing a set of elements to another set and only including an element if it matches a condition. For example, a prime number is defined as \(i \in \mathbb{N} | \forall m \in \mathbb{N} \setminus \{1,i \}, i \equiv 0 \mod m\). How do we represent that in R? The set-builder API can help!

Here, we use set comprehension to generate prime numbers 1-100:

2:100 %>%
    that_for_all(range(2, .x)) %>%
    we_have(~.x %% .y != 0)
#>  [1]  2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

This doesn’t, however, help us if we want to generate a certain number of prime numbers, regardless of what interval they are in. Of course, we could generate a vector and then subset it, but that would be inefficient! We want to only generate what we need.

We can bring together the set-builder and Iterator capabilities to do that, for example with the first 100 primes:

# 10,000 is just a number that we can be pretty sure is sufficiently high
primes <- 2:10000 %>%
          that_for_all(range(2, .x)) %>%
          we_have(~.x %% .y != 0, "Iterator")


sequence <- yield_more(primes, 100)

sequence
#>   [1]   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61
#>  [19]  67  71  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151
#>  [37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
#>  [55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
#>  [73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
#>  [91] 467 479 487 491 499 503 509 521 523 541

This illustrates a few things:

Feedback welcome!

peruse is a new package and needs help! If you do run into a bug or think of a new feature that would work well in peruse please open an issue.

Acknowledgments

Big thank you to Hadley Wickham, from whose book Advanced R I learned to do the stuff I did in the package, and whose book R Packages was invaluable in getting peruse published on CRAN. Also, the piped set builder workflow was made possible by the magrittr pipe, so thank you to the developers: Stefan Milton Bache and Hadley Wickham.