Journal #1: Literature Review

STAT375

Ryan McShane

Overview

In this journal, I’ll be walking you through the process of doing a literature review (or at least the first steps thereof). In a literature review, we are seeking to understand what work has been published so far, what points the existing literature make, and what the agreed upon facts are. In this process, we can learn whether a problem has been solved or not, what aspects of the question remain, and perhaps whether the literature is useful.

In this case, we will be digging into the claims made in a 538 Podcast hosted by Neil Paine and determining whether they were appropriate to make.

General Instructions

To start, create a separate .RMD file with output: pdf_document in the YAML header. Do not include any of the actual prompts below. Instead, respond to each of the prompts, and connect your thoughts from prompt to prompt.

When you write a summary of an article or other source, include the research question, to what extent they answered the question, the highlights of their process, and the results of the analysis. Additionally, add critiques of the work as necessary. Perhaps most importantly, focus on the topics which connect the research articles when writing your summaries.

Please be aware that part of this assignment is practicing your communication skills. This should be a formally written document which uses when needed, and with minimal grammatical and spelling errors and proper citation of sources.

First Pass

538 Podcast

Start by listening to this 538 podcast on Apple or Android/a browser. Write a paragraph summary (i.e. 3 to 8 sentences, say) of the podcast.

Charles Reep, in Memorium

Then, read Charles Reep (1904-2002)- pioneer of notational and performance analysis in football (also available on Moodle). Write a paragraph summary (i.e. 3 to 8 sentences, say) of the paper.

Create a .bib file with all of the sources you’ve read.

At minimum, you should include title, author, year, DOI or URL, as well as any other fields relevant to the source. Whenever you write about these sources, use inline references, as described here. To learn more about .bib file, read here. As you read other sources, include them as well.

Another Perspective; a Blog Post

Read this blog post History of Performance Analysis: The Controversial Pioneer Charles Reep, which is also available on Moodle in a printer-friendly version. Write a paragraph summary (i.e. 3 to 8 sentences, say) of the post here.

Skill and Chance in Ball Games

Revisit the Charles Reep memorium paper. Who wrote it? Find (on Google Scholar) the 1971 article, Skill and Chance in Ball Games by Reep and others which the memorium paper references. Who are the other two co-authors on this paper? (Look to page 623!)

Write a summary of Skill and Chance in Ball Games. Note where and why it references Yule and Greenwood (1920) and Reep and Benjamin (1968).

Reep and Benjamin (1968)

On Google Scholar, find the Reep and Benjamin (1968) paper, read it, and write a summary of it.

Yule and Greenwood (1920)

On Google Scholar, find the Yule and Greenwood (1920) paper, read Section I and Section IV, and write a summary of those sections. Report the left-most and right-most parts of equations 51 and 52.

Synthesize what you’ve read so far

How is it all connected? Have we gotten to the bottom of Paine and Pollard’s perhaps contradictory claims?

A Closer Look at Reep et al (1971)

On Wikipedia, in a bullet point just below the previous link, we get a revision of the binomial coefficient to its continuous Gamma function version. Remember that for continuous values \(a\) in the Gamma function, \(\Gamma(a) = (a-1)!\). So, we get this re-expression of the binomial coefficient:

\[\binom{x+r-1}{x} = \frac{(x + r -1)(x+r-2) \dotsm (r)}{x!} = \frac{(x+r - 1)!}{x!(r-1)!} = \frac{\Gamma (x + r)}{x!\Gamma(r)}\]

So, in the notes we saw the Negative Binomial PMF (where \(r =\) the number of successes, \(x =\) the number of failures before a success, and \(p =\) the probability of a success) is

\[f(x;r, p) = \binom{r+x-1}{x}p^r(1-p)^{x} \text{ on } x\in \{0, 1, ...\},\]

which we can now re-express as

\[f(x;r, p) = \frac{\Gamma (x + r)}{x!\Gamma(r)}p^r(1-p)^{x} \text{ on } x\in \{0, 1, ...\}.\]

On Wikipedia, we see that the Negative Binomial distribution was derived as a Gamma-Poisson mixture here. In fact, this section references Greenwood and Yule (1920), which is the same paper that Reep et al (1971) references. However, it should be noted that the derivation here effectively uses \(p =\) the probability of failure, but this is irrelevant to identifying what Reep et al (1971) mean by \(r\) and \(c\).

So, compare the Wikipedia Gamma-Poisson mixture derivation to equations 46A through 52 in Greenwood and Yule (1920). In Greenwood and Yule (1920), then, we see that \(c_\text{Yule} = \dfrac{p}{1-p}\) and \(r_\text{Yule} = r\) (i.e., in terms of the parameters we’re familiar with from the PMF above).

Now, we have equations 51 and 52 in Greenwood and Yule (1920), which are indeed the equations Reep et al (1971) uses to fit his negative binomial distribution. From our notes, \(E[X] = M\) (from Greenwood and Yule (1920)) and from our notes, \(Var(X) = \mu_2\) (in Greenwood and Yule (1920)). Re-express Greenwood and Yule’s equations 51 and 52 using the parameters \(r\) and \(p\) and quantities \(E[X]\) and \(Var(X)\) as you defined them in the previous step, which gives you a system of equations.

Solve that system of equations to solve for Reep’s “Mean” (our \(E[X]\)) and Reep’s “Variance” (our \(Var(X)\)). (You’ll know your solution to the system of equations is correct when you are able to replicate Reep et al (1971)’s Table 1)

Record the four “Actual” and “Expected” columns in Reep et al (1971)’s Table 1 as columns in an appropriately labeled data.frame, and print it with kable.

Plug the first set of mean and variance he calculated (from the “Original data”) into your previous solutions to find \(r\) and \(p\) and state the \(PMF\) of the Negative Binomial distribution he found.

From this Negative Binomial distribution, create a kable table which shows the PMF of your NB distribution (i.e., \(P(X= 0), P(X = 1), ...\)), and make the last row \(P(X \ge 9)\).

From here, using \(n = 23,805\) (note: Reep et al should have used \(n = 23,754\)), find the expected counts. Do they match what Reep et al found? If not, describe what errors you see and their magnitude.

Now, revisit Reep and Benjamin (1968) and find the table that Reep et al (1971) replicates. What transcription error was made?

It turns out that we’ll need to calculate \(E[X]\) and \(Var(X)\) ourselves from the “Actual” column. Consider \(X \ge 9\) to be \(X = 9\) for the purposes of \(E[X]\) and \(Var(X)\) calculations.

Using your earlier system of equations, solve for \(r\) and \(p\) again. Now, using \(n = 23,805\), find the expected counts. Now do they match Reep et al (1971) (after correcting the transcription error you found)? Perform the \(\chi^2\) Goodness of Fit test that Reep et al (1971) does. Bracketing a \(p_\text{val}\) as Reep et al (1971) did was common practice at the time, when probability tables were used for calculations like this. Is your \(p_\text{val}\) consistent with theirs? What does this \(p_\text{val}\) mean in this HT, at an \(\alpha = 0.05\) level (i.e., what can you/can’t you conclude)?

Now, replicate the second set of expected values and \(\chi^2\) GOF test.

Using the second fitted distribution, what is the probability of 2 or fewer pass movement (i.e., \(P(X\le 2)\))?

Final Words

Read Pollard’s Letter to OSSJ

Find Pollard’s letter to The Open Sports Sciences Journal, “Invalid Interpretation of Passing Sequence Data to Assess Team Performance in Football: Repairing the Tarnished Legacy of Charles Reep”, published in 2019. Read it and summarize it.

Read Hughes and Franks’ 2005 article

Find the 2005 Hughes and Franks’ article which is the target of Pollard (2019)’s ire (hint: it’s the first reference!). Read it and summarize it. If you’d like, read the earlier work these authors reference and discuss that as well!

Compare and Contrast Reep (1968), Reep et al (1971), Pollard (2019), and Hughes and Franks (2005)

Conclusion

So, what do you make of these two positions (Pollard/Reep vs Paine/Hughes/Franks)? Who’s right? Who’s wrong? Is there common ground? What questions remain? What can we learn from this? What insights may this have into science in academia, in general?

Coda

How are these works connected? Are there any articles that appear in multiple references sections of the works you read that you didn’t take a look at? If so, what avenues of inquiry might you follow by reading these shared references?