Talk:Statistical Analysis: an Introduction using R

New Book
I'm filling in this text in order to teach my partner some statistics before she starts medical school. A brief intro is on the Cover page. Feel free to add stuff: this is my first wikibook, and I hope I've started this in the right way. Discuss stuff here if you aren't sure. Yan 12:17, 15 May 2007 (UTC)

R code
If you want to add stuff, note that a lot of use is made of Template:Code, in particular Template:Code:Interpreted. Some notes on usage are at Template:Code. HYanWong (talk) 23:01, 16 November 2008 (UTC)

To do

 * Ask in new users forum about uploading text files, to use titanic data
 * Ask about toggling sections (to globally hide R topics)
 * Correct links in /R folder


 * Get in fit state for contributions (comments on each chapter)!

General points
Try to use "classic data sets" in the main text: these usually make good examples and are not software dependent. R-specific datasets can be used in topic boxes. Here are some suggestions: please add any you think fit. Many are in the R package "HistData".
 * von Bortkiewicz's deaths by horse kicks in the Prussian army ("prussian" in R package "pscl")
 * Titanic death/survival ("titanic" in R "datasets", or (better), titanic3 in "PASWR")
 * Student's criminal data ("crimtab" in R "datasets")
 * Fisher's iris data ("iris" in R "datasets")
 * Bumpus' sparrow data
 * Michelson's speed of light data ("morley" in R "datasets")
 * John Snow's cholera spatial data
 * Richard Doll's smoking in doctors dataset (try "breslow" in R "boot")
 * Anscombe's quartet ("anscombe" in R "datasets")
 * ?old Faithful data
 * ? data from Gosset, "The Probable Error of a Mean,"

General points

 * Aim to introduce the "why" behind statistics - this is not a book of statistical recipes.
 * Should aim to introduce the idea of statistical models asap

Chapter ideas

 * Intro Basic principles of data analysis, plus plots
 * Chapter 0 We should provide a "getting started in R" to allow users to install R, and see how it works etc. Since this is not statistics-specific, it is considered a pre-chapter
 * Chapter 1 To provide the background to statistics, have a "philosophy of analysis" type chapter at the very start
 * Chapter 2 We cannot advance into modelling without some prior knowledge about what it is that we are trying to model - the atoms of the system, as it were. Hence next chapter has to be to introduce data types
 * Now we can introduce modelling. There are two distinct parts to this
 * What is a statistical model (examples, etc.)
 * How can we come up with statistical models
 * What order should these be done in? On this principle that it is useful to have an example in one's mind when discussing a topic, (1) has been taken first. Hence
 * Chapter 3 Statistical models
 * Chapter 4 IDA/EDA data exploration

Conventions for markup used in this book
Some general suggestions for formatting. Are these sensible?


 * Variables in italics.
 * Factor levels in 'single quotes' (reserve double quotes for clarifying prose)
 * Topic boxes
 * No R code outside topic boxes
 * R code that is interspersed with discussion text within a topic box should be on its own line with an indent, to give &lt;pre&gt;-type formatting
 * Named R objects in discussion text should be surrounded by &lt;code&gt; tags. Functions should have braces.
 * Headings etc
 * TOC on each page shows header levels down to h4 (but not h5 or h6)
 * Main text uses header levels h2, h3, and h4.
 * Figures and tables labelled with &lt;h5&gt; (so they are not put in the TOC)
 * Topic boxes are titled using &lt;h4&gt;-level tags (to be included in the TOC)