2

R and probability noob here. I'm looking to create a histogram that shows the distribution of how many attempts it took to return a heads, repeated over 1000+ simulated runs on the equivalent of an unfairly weighted coin (0.1 heads, 0.9 tails).

From my understanding, this is not a geometric distribution or binomial distribution (but might make use of either of these to create the simulated results).

The real-world (ish) scenario I am looking to model this for is a speedrun of the game Zelda: Ocarina of Time. One of the goals in this speedrun is to obtain an item from a character that has a 1 in 10 chance of giving the player the item each attempt. As such, the player stops attempting once they receive the item (which they have a 1/10 chance of receiving each attempt). Every run, runners/viewers will keep track of how many attempts it took to receive the item during that run, as this affects the time it takes the runner to complete the game.

This is an example of what I'm looking to create:

(though with more detailed labels on the x axis if possible). In this, I manually flipped a virtual coin with a 1/10 chance of heads over and over. Once I got a successful result I recorded how many attempts it took into a vector in R and then repeated about 100 times - I then mapped this vector onto a histogram to visualise what the distribution would look like for the usual amount of attempts it will take to get a successful result - basically, i'd like to automate this simulation instead of me having to manually flip the virtual unfair coin, write down how many attempts it took before heads, and then enter it into R myself).

New contributor
Conor is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
  • 2
    Why do you think the geometric distribution is inappropriate? – duckmayr 2 days ago
3

I'm not sure if this is quite what you're looking for, but if you create a function for your manual coin flipping, you can just use replicate() to call it many times:

foo <- function(p = 0.1) {
    i <- 0
    failure <- TRUE
    while ( failure ) {
        i <- i + 1
        if ( sample(x = c(TRUE, FALSE), size = 1, prob = c(p, 1-p)) ) {
            failure <- FALSE
        }
    }
    return(i)
}

set.seed(42)
number_of_attempts <- replicate(1000, foo())
hist(number_of_attempts, xlab = "Number of Attempts Until First Success")

enter image description here

As I alluded to in my comment though, I'm not sure why you think the geometric distribution is inappropriate. It "is used for modeling the number of failures until the first success" (from the Wikipedia on it). So, we can just sample from it and add one; the approaches are equivalent, but this will be faster when your number of samples is high:

number_of_attempts2 <- rgeom(1000, 0.1) + 1
hist(number_of_attempts2, xlab = "Number of Attempts Until First Success")

enter image description here

| improve this answer | |
  • 1
    Thanks for the quick and helpful reply. I think you're right that my question is answered by the geometric distribution. I think I was having a bit of a conceptual misunderstanding. The basic question I was looking to answer was how common each number of attempts is (e.g. 1 attempt is the most common, 2 attempts second most common), but a geometric distribution answers this precisely I think. Being shown both ways really cleared that up for me, I really appreciate it. – Conor yesterday
  • @Conor No worries! Glad it helped, cheers – duckmayr yesterday
1

I would use the 'rle' function since you can make a lot of simulations in a short period of time. Use this to count the run of tails before a head: > n <- 1e6

> # generate a long string of flips with unfair coin
> flips <- sample(0:1, 
+                 n, 
+                 replace = TRUE, 
+                .... [TRUNCATED] 

> counts <- rle(flips)

> # now pull out the "lengths" of "0" which will be the tails before
> # a head is flipped
> runs <- counts$lengths[counts$value == 0]

> sprintf("# of simulations: %d  max run of tails: %d  mean: %.1f\n",
+         length(runs),
+         max(runs),
+         mean(runs))
[1] "# of simulations: 90326  max run of tails: 115  mean: 10.0\n"

> ggplot()+
+   geom_histogram(aes(runs),
+                  binwidth = 1,
+                  fill = 'blue')

and you get a chart like this: Histograph of runs

| improve this answer | |
0

I would tabulate the cumsum.

p=.1
N <- 1e8

set.seed(42)
tosses <- sample(0:1, N, T, prob=c(1-p, p))

attempts <- tabulate(cumsum(tosses))

length(attempts)
# [1] 10003599

hist(attempts, freq=F, col="#F48024")

enter image description here

| improve this answer | |

Your Answer

Conor is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.