suggestion: add reference table of nb shares probabilities to the share calculator #11

Open
opened 2023-12-14 04:39:25 +00:00 by orzo · 3 comments

I added a section on the distribution of the number of blocks/shares for a given cumulative effort to the pastebin I shared previously (https://pastebin.com/EXXKm50D, copy-pasted below for future reference). In it I derive the table of probabilities of the number of shares for a given cumulative effort. This is nothing but the Poisson distribution. This could be of interest to add to the observer's share calculator. The cumulative effort can of course be written back in terms of elapsed mining time

elapsed time T = cumulative effort * difficulty / hashrate

which is probably more meaningful to someone using the calculator since the observer doesn't report the cumulative effort, only the effort for individual shares/blocks.

Here's the reference table together with the html code (nbshares_distribution.html). This table is fixed, the only thing that varies is the time as a function of the hash rate and difficulty.

nbshares_distribution.png

Distribution of the number of shares/blocks for a given cumulative effort

Given once again the probability p = 1/diff of hitting a share, the probability of a single typical run i of n trials out of which k shares were found could look something like this

Q_i(k | n) = (1 - p)(1 - p) p (1 - p) p (1 - p)(1 - p) p p (1 - p)...(1 - p)(1 - p) = p^k (1-p)^(n-k)
           \____k successes/shares with prob p, n-k failures with prob 1 - p____/

To get the total probability of finding k shares within n attempts, we must account for all possible permutations i of failures and successes, i.e. all possible arrangement of n-k failure probability (1 - p)'s and k success probabilities p's. There are n_choose_k of them, namely as many as there are ways of picking k positions among n possible positions. The probability of hitting k shares within n attempts is therefore given by the binomial distribution with success probability p:

                                   n!
P_exact(k shares | n hashes) = ---------- p^k (1 - p)^(n - k).
                               k!(n - k)!

Since the cumulative effort f = np, we can write p^k = f^k/n^k. For n much larger than k we have n!/k!/(n-k)!/n^k -> 1, and together with large difficulty, that is p << 1, we therefore find (1-p)^(n-k) -> exp(-f). Putting those limits together we find to a good approximation

                         f^k
P(k shares | effort f) = ---- exp(-f)
                          k!

which is nothing but the Poisson distribution. The mode of the Poisson distribution is floor(f), the mean is f, and the standard deviation sqrt(f). Using the CDF of the Poisson distribution, we find the probability of the d-th deviate

                                         
P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = 

Gamma(floor(f + d*sqrt(f) + 1), f)     Gamma(floor(f - d*sqrt(f)), f)
----------------------------------  -  ------------------------------
  Gamma(floor(f + d*sqrt(f) + 1))       Gamma(floor(f - d*sqrt(f)))

where Gamma(x, s) is the upper incomplete gamma function. To give a practical example, this means that for a cumulative effort f = 1500% one will likely have found roughly 15 ± 4 blocks/shares with 63% probability (1st deviate), and 15 ± 8 blocks/shares with 95% probability (2nd deviate). As the cumulative effort f becomes large, those deviate probabilities will converge to those of the normal deviates, namely 68.3% at 1 standard deviation and 95.4% at 2 standard deviations. For d-th deviates with d >= sqrt(2 + 1/f + f) the lower bound in the above expression falls outside the support of the Poisson distribution, i.e. k < 0, and should instead be interpreted as 0. Computationally we simply replace every instances of f - d*sqrt(f) with max(0, f - d*sqrt(f)) to keep the above expression valid for all d > 0.

We can of course rewrite the all expressions above in terms of the local hash rate h and elapsed mining time T by substituting n = hT such that f = hT/diff and thus

                     (hT/diff)^k 
P(k shares | h, T) = ----------- exp(-hT/diff).
                          k!
I added a section on the distribution of the number of blocks/shares for a given cumulative effort to the pastebin I shared previously (https://pastebin.com/EXXKm50D, copy-pasted below for future reference). In it I derive the table of probabilities of the number of shares for a given cumulative effort. This is nothing but the Poisson distribution. This could be of interest to add to the observer's share calculator. The cumulative effort can of course be written back in terms of elapsed mining time ``` elapsed time T = cumulative effort * difficulty / hashrate ``` which is probably more meaningful to someone using the calculator since the observer doesn't report the cumulative effort, only the effort for individual shares/blocks. Here's the reference table together with the html code ([nbshares_distribution.html](/attachments/7a86751d-8d26-42f6-bfcc-44f286445dfa)). This table is fixed, the only thing that varies is the time as a function of the hash rate and difficulty. ![nbshares_distribution.png](/attachments/b0031e0e-d369-41c1-b4a3-69e3c88010f2) > ### Distribution of the number of shares/blocks for a given cumulative effort > > Given once again the probability `p = 1/diff` of hitting a share, the probability of a single typical run `i` of `n` trials out of which `k` shares were found could look something like this > > ``` > Q_i(k | n) = (1 - p)(1 - p) p (1 - p) p (1 - p)(1 - p) p p (1 - p)...(1 - p)(1 - p) = p^k (1-p)^(n-k) > \____k successes/shares with prob p, n-k failures with prob 1 - p____/ > ``` > > To get the total probability of finding `k` shares within `n` attempts, we must account for all possible permutations `i` of failures and successes, i.e. all possible arrangement of `n-k` failure probability `(1 - p)`'s and `k` success probabilities `p`'s. There are n_choose_k of them, namely as many as there are ways of picking `k` positions among `n` possible positions. The probability of hitting `k` shares within `n` attempts is therefore given by the binomial distribution with success probability `p`: > ``` > n! > P_exact(k shares | n hashes) = ---------- p^k (1 - p)^(n - k). > k!(n - k)! > ``` > Since the cumulative effort `f = np`, we can write `p^k = f^k/n^k`. For `n` much larger than `k` we have `n!/k!/(n-k)!/n^k -> 1`, and together with large difficulty, that is `p << 1`, we therefore find `(1-p)^(n-k) -> exp(-f)`. Putting those limits together we find to a good approximation > ``` > f^k > P(k shares | effort f) = ---- exp(-f) > k! > ``` > which is nothing but the Poisson distribution. The mode of the Poisson distribution is `floor(f)`, the mean is `f`, and the standard deviation `sqrt(f)`. Using the CDF of the Poisson distribution, we find the probability of the d-th deviate > ``` > > P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = > > Gamma(floor(f + d*sqrt(f) + 1), f) Gamma(floor(f - d*sqrt(f)), f) > ---------------------------------- - ------------------------------ > Gamma(floor(f + d*sqrt(f) + 1)) Gamma(floor(f - d*sqrt(f))) > ``` > where `Gamma(x, s)` is the upper incomplete gamma function. To give a practical example, this means that for a cumulative effort `f = 1500%` one will likely have found roughly 15 ± 4 blocks/shares with 63% probability (1st deviate), and 15 ± 8 blocks/shares with 95% probability (2nd deviate). As the cumulative effort `f` becomes large, those deviate probabilities will converge to those of the normal deviates, namely 68.3% at 1 standard deviation and 95.4% at 2 standard deviations. For d-th deviates with `d >= sqrt(2 + 1/f + f)` the lower bound in the above expression falls outside the support of the Poisson distribution, i.e. k < 0, and should instead be interpreted as 0. Computationally we simply replace every instances of `f - d*sqrt(f)` with `max(0, f - d*sqrt(f))` to keep the above expression valid for all `d > 0`. > > We can of course rewrite the all expressions above in terms of the local hash rate `h` and elapsed mining time `T` by substituting `n = hT` such that `f = hT/diff` and thus > ``` > (hT/diff)^k > P(k shares | h, T) = ----------- exp(-hT/diff). > k!
Owner

This looks pretty good. I will probably implement this, maybe the colors as well, although it won't match p2pool.io behavior

This looks pretty good. I will probably implement this, maybe the colors as well, although it won't match p2pool.io behavior
DataHoarder self-assigned this 2024-02-25 14:17:55 +00:00
Owner

@orzo Within this

P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = 

Gamma(floor(f + d*sqrt(f) + 1), f)     Gamma(floor(f - d*sqrt(f)), f)
----------------------------------  -  ------------------------------
  Gamma(floor(f + d*sqrt(f) + 1))       Gamma(floor(f - d*sqrt(f)))

Is Gamma regularized or not? What is the second parameter to the denominator, 0?

Got a partial implementation without the Bulk working at least.

@orzo Within this ``` P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = Gamma(floor(f + d*sqrt(f) + 1), f) Gamma(floor(f - d*sqrt(f)), f) ---------------------------------- - ------------------------------ Gamma(floor(f + d*sqrt(f) + 1)) Gamma(floor(f - d*sqrt(f))) ``` Is Gamma regularized or not? What is the second parameter to the denominator, 0? Got a partial implementation without the Bulk working at least.
Author

Is Gamma regularized or not? What is the second parameter to the denominator, 0?

Got a partial implementation without the Bulk working at least.

It's the un-regularized incomplete gamma function Gamma(s, x). The ratios on the other hand give you the regularized incomplete gamma function.

A more straightforward way to get the same answer is to do the sum explicitly, i.e.

P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = sum_k f^k/k! exp(-f)

for k between max(0, f - d*sqrt(f))) and f + d*sqrt(f) (inclusive)

You just have to mind the factorial because 13! > 2^32 so you want to sum over exp(log(k * log(f) - f - lngamma(k+1))

A different way to look at the bulk is to look at the quantiles by lining up the cumulative probabilities

  cumsum([exp(-f),   f * exp(-f),   f^2/2! exp(-f),   f^3/3! exp(-f) ... ])
= [exp(-f),
   exp(-f) + f * exp(-f),
   exp(-f) + f * exp(-f) + f^2/2! exp(-f),
   exp(-f) + f * exp(-f) + f^2/2! exp(-f) + f^3/3! exp(-f),
   ...]

and check which entries are between, say, 0.25 and 0.75, or 0.1 and 0.9. Those constitute the rough ~50% or ~80% bulk of the distribution. You can then color them, and do the exact sum of the entries (pre cumsum()) to find the exact probability represented by this version of the bulk. It's never going to be exactly 50% or 80% because the Poisson distribution isn't continuous.

So you can decide that the bulk is everything within d standard deviations of the mean (first way), or the middle X% percent bounded by cumulative probabilities (1 - X/100)/2 and 1 - (1 - X/100)/2 (second way). Both are valid and more or less equivalent, and in the screenshot you can see that the first way (with d=1) also bounds roughly ~80% of the distribution.

> Is Gamma regularized or not? What is the second parameter to the denominator, 0? > > Got a partial implementation without the Bulk working at least. It's the un-regularized incomplete gamma function `Gamma(s, x)`. The ratios on the other hand give you the regularized incomplete gamma function. A more straightforward way to get the same answer is to do the sum explicitly, i.e. ``` P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = sum_k f^k/k! exp(-f) for k between max(0, f - d*sqrt(f))) and f + d*sqrt(f) (inclusive) ``` You just have to mind the factorial because 13! > 2^32 so you want to sum over `exp(log(k * log(f) - f - lngamma(k+1))` A different way to look at the bulk is to look at the quantiles by lining up the cumulative probabilities ``` cumsum([exp(-f), f * exp(-f), f^2/2! exp(-f), f^3/3! exp(-f) ... ]) = [exp(-f), exp(-f) + f * exp(-f), exp(-f) + f * exp(-f) + f^2/2! exp(-f), exp(-f) + f * exp(-f) + f^2/2! exp(-f) + f^3/3! exp(-f), ...] ``` and check which entries are between, say, 0.25 and 0.75, or 0.1 and 0.9. Those constitute the rough ~50% or ~80% bulk of the distribution. You can then color them, and do the exact sum of the entries (pre cumsum()) to find the exact probability represented by this version of the bulk. It's never going to be exactly 50% or 80% because the Poisson distribution isn't continuous. So you can decide that the bulk is everything within `d` standard deviations of the mean (first way), or the middle X% percent bounded by cumulative probabilities `(1 - X/100)/2` and `1 - (1 - X/100)/2` (second way). Both are valid and more or less equivalent, and in the screenshot you can see that the first way (with `d=1`) also bounds roughly ~80% of the distribution.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: P2Pool/consensus#11
No description provided.