Wikibooks talk:Analytics


 * This should probably be discussed before being voted on..... I do not have any opinion on the proposal other than noting that real page view stats would be nice to have. -- xixtas talk 05:01, 19 July 2007 (UTC)


 * Having Google provide stats for our site requires us to have an account with them. This sort of thing shouldn't just be opened by a user here and realistically should be linked to an account that services all Wikimedia wikis. This needs to be discussed somewhere greater, most likely at Meta with the board or some other developer community. I don't feel this is our decision to make and the entire community should be involved. I also expect this will not be approved by those groups and don't support this here if they don't. We shouldn't be running some one-off product. -within focus 12:24, 19 July 2007 (UTC)


 * The text that goes on a given page is:

  _uacct = "UA-XXXXXXX-X"; urchinTracker;


 * In the first instance a template could be provided that allows any value of "UA-XXXXXXX-X" to be substituted so that any interested editor can insert an analytic script. There is no need to get permission from on high for something like this. (It is no worse than including a google search ability on a page). RobinH 10:43, 21 September 2007 (UTC)


 * What are the ways to raise the proposal to the board or to the entire community, if this page is not the right place? Sutambe 12:32, 19 July 2007 (UTC)


 * I personally do not know but I'm sure someone more comfortable with the goings-ons at Meta can answer this shortly. -within focus 12:34, 19 July 2007 (UTC)

I just want to say that the mediawiki software has page statistics of how many people have viewed a given page, built-in, but is just turned off for some reason on the Wikimedia projects. So there is no need to get an external service. --dark lama  15:16, 19 July 2007 (UTC)


 * I went through this loop a couple of years ago. Hit counting is considered to be an unacceptable performance overhead, full-stop. I proposed a way round this by sampling but hit an even worse, technical, problem than performance overhead. If we are ever going to have hit counting it will need to be external to mediawiki. RobinH 08:58, 21 September 2007 (UTC)


 * Just to clarify, I have looked up some of the original comments, the most important, technical impasse, is at http://bugzilla.wikimedia.org/show_bug.cgi?id=5667


 * "We cant count page hits on a WikiMedia hosted wiki. This is because most of the pages are served by the squids, thus never hit the apaches and the counter doesnt reflect the reality."


 * Incidently, this means that Google or some other external service is the only way we can do hit counting. RobinH 09:12, 21 September 2007 (UTC)


 * Noting a reason here for why these stats are turned off: These stat gathering portions of MediaWiki are very CPU intensive, and also a need to record these stats in some way, usually by updating a database field each time a page is accessed.  This means that data has to be written to the hard drive (much slower than just a memory write) on most pages that are retrieved.  For smaller wikis that are just a couple hundred pages, it may be a good idea to turn this feature on, but for the massive volume of people who read Wikibooks (or much worse, Wikipedia), this would require a server farm that is several times the current one used by the Wikimedia Foundation.


 * Google has the server bandwidth and the rationale to process this sort of data, but my concern is the commercial aspects of trying to add this data. One of the overriding issues that has impacted nearly all of the Wikimedia projects is a small but very vocal (perhaps larger, but numbers are hard to get here) group that opposes nearly any commercialization of Wikimedia projects, including banner ads and other such nonsense.  This proposal needs to be made with enough detail that it can clearly demonstrate how this is not going to turn into a huge advertisement for Google as opposed to other statistics gathering business, why Google is offering this service (they are a for-profit public company.... they offer this for more than just to feel good about society).


 * I do question the gathering of geographical data from just an IP address, as there are many known flaws to that concept, and there are other holes in the whole data gathering concept as well. This is something that can be "gamed" as much as anything else.  Adding external links to allow a 3rd party to gather what amounts to be private information (the IP addresses of each person who views a page on Wikibooks... including special pages like edit pages or talk pages) is also something that needs to be addressed and if this is information that ought to be released in this manner.


 * I will point out, however, that this sort of data can be useful in terms of helping to fix up Wikibooks in terms of working on the "Main Page" or finding what other high traffic pages may be on this site. Perhaps some pages are very busy due to some external links that "bypass" the Main Page.  But caution is in order before this happens.  --Rob Horning 19:27, 23 July 2007 (UTC)

It certainly would be nice to have at least some basic pageview stats. If a CPU load is a major (the major?) concern, would it be possible to turn stats gathering on for only the main page of each book? --Tdvorak 20:54, 6 August 2007 (UTC)


 * If performance is the only concern there are ways to make the performance work. You don't need to update the database on each hit, just keep an in-memory counter for each page and increment it on a hit (original + new hits for that server).  At the end of the day, or once an hour or whatever then refresh the counter from the database, increment it with the in-memory hits and then write it back. Being able to do searches in all the wikis by popularity would be invaluable. 8:59, 16 August 2007.

I believe that page views are the single most important metric for Wikibooks. They allow authors to discover how people are reading the books, for instance if a book gets 10 hits a day on the first page, 5 on the second, and 2 on the third it is clear that there is a problem with style and presentation; if one part of a book gets zero hits when the rest of the book gets 50 the section is obviously irrelevant. I believe that counting page hits will be an incredibly valuable tool and we should press for this to be available. I cannot see a problem with using Google - they count page hits anyway (on my Google searches my own usage rate of sites is displayed). RobinH 13:37, 16 August 2007 (UTC)


 * I agree with the need for a simple page hit counter. It doesn't have to be updated in realtime, but some form of feedback to authors is essential. While Google hits might be a start, it's also important to know what pages people visit after starting to read a book. Selden 11:34, 5 September 2007 (UTC)


 * I believe that Google Analytics can be applied to specific pages within Wikibooks, it is a script that is placed on a particular page. This means that the hits for any chosen page can be counted. RobinH 10:09, 21 September 2007 (UTC)

Toolserver's Page counter
Sorry I haven't responded in awhile. The reason given by RobinH for why they don't enable the built-in page counting sounds the most plausible reason. I just want to say Google is still not needed in order to achieve this when we could be using the page counter on the Wikimedia toolserver. We would just have to add the proper javascript, just like for google. --dark lama  12:52, 21 September 2007 (UTC)


 * Given the option between the two, and all else being equal, I would prefer to unload this task onto the google servers, and not onto wikimedia-owned servers. Google may make some sort of profit from the venture, but part of that would go back into maintaining their own hardware. On the other hand, if we used the tool server, nobody would make any profit from it, and it would actually cost the entire Wikimedia community because the tool server is a shared resource. Also, we already use google in a number of places, including a google option in the search page, and options to "search this book" located in the toolbox. I would like a more in-depth analysis as to the costs of implementing this (as in increased load times, slower browser-side response, etc) compared to the benefits (what features do we get, accuracy, availability, etc). I would also like to see (although I know this might be a long-shot) some kind of "opt-out" mechanism that individual users could implement in order to be exempted from these counts. --Whiteknight (Page) (Talk) 13:10, 21 September 2007 (UTC)


 * The counter is used by English Wikinews, English Wikiversity, English Wiktionary, and even English Wikipedia, so you be the judge of its performance impact. Also I should note the javascript uses a random sampling, so it doesn't send the information on every page load and the tool only updates the database once a hour. --dark lama  13:16, 21 September 2007 (UTC)


 * Why not give google a spin on a trial basis? The template to activate the Java script can always be withdrawn. RobinH 15:44, 21 September 2007 (UTC)


 * I think we're better off using the version available from the toolserver. If it was a problem, I don't think it would be available on the toolserver and be actively used on Wikipedia. --dark lama  15:49, 21 September 2007 (UTC)


 * Where is the counter on the toolserver. I have looked at http://meta.wikimedia.org/wiki/Ruwiki_pgcounter but it seems like it would load the server. RobinH 16:36, 21 September 2007 (UTC)


 * It only loads the server if you don't use the right javascript for the size of the wiki. There is also WikiCharts, which is basically the same way. Thats the one used on WP and the other wikis. --dark lama  17:03, 21 September 2007 (UTC)


 * Will this work if we want to monitor pages from books that are only viewed once a day? In some configurations it could take 500 days to register 1 hit. RobinH 18:14, 21 September 2007 (UTC)

(indention reset)

Maybe they will allow us to use a more customized configuration. Like limiting the counting to pages in the Main namespace, Cookbook namespace, and Wikijunior namespace, using a different random sampling for registered users and anonymous users. and/or using a different random sampling for the main pages of each book and subpages of a book, in exchange for a configuration that could take 30 days to register 1 hit. From what I've seen Wikipedia uses the medium sized configuration, so I don't think it should be much of an issue. --<span style="font: bold 10pt 'courier new', comic, sans, ms;"><font color="midnightblue">dark lama  22:52, 21 September 2007 (UTC)


 * Why not just use google and get accurate figures? You could permit it on a trial basis and withdraw the facility if there is a problem. RobinH 18:39, 23 September 2007 (UTC)


 * I can already think of one possible problem. I think it would be a violation of the Wikimedia foundation's privacy policy, unless it was make completely optional, as in they would have to opt in, which would make this pointless. However no personal identifiable information is send with the toolserver option. Surely a random sampling is enough information to get a general idea of how often a page is visited? Why not just use what its already being provided and used by other wikimedia projects? --<span style="font: bold 10pt 'courier new', comic, sans, ms;"><font color="midnightblue">dark lama  20:04, 23 September 2007 (UTC)


 * The privacy issue can be answered by only permitting one Wikibooks analytic ID and only allowing certain admins to examine the statistics (certain admins can look up IP addresses at the moment). I would be in favour of using a Wikimedia counter if it can be used to gain actual statistics of page reads on a weekly basis, it would then be a tool that can be used by authors to improve books. Perhaps the Wikimedia tool could be used on a trial basis configured so that it provides actual data (not averaged) and the impact that this has on service levels could be monitored.  I would like to make the following proposal for a trial:


 * 1. Design a template that can be used with either counter.


 * 2. Monitor server load due to Wikibooks for a week.


 * 3. Set up the template to use either the Google or Wikimedia method, the Google method being limited to a single analytic user.


 * 3. Tell everyone how to install the counters.


 * 4. Monitor server load for a week.


 * 5. At the end of the week make the usage stats public and assess whether either method could be used.


 * 6. Deactivate the counter, monitor server load for a week.


 * 7. Set the template to activate the other method.


 * 8. Goto 4.


 * 9. Repeat 4-8 until a clear, stable pattern emerges or server usage becomes too high.


 * RobinH 09:12, 24 September 2007 (UTC)


 * I think a limitation on who can view the stats, also defeats any purpose and benefits in keeping stats. Also if this means certain users have to share an account, thats going to be a problem too. A previous suggestion of account sharing was met with disapproval and rejected. If the toolserver method is used it would be installed globally, so nobody would need to install it. Just need to tell everyone how to opt-out of it. Why is exact numbers necessary? Average daily/weekly stats of page views can still be a useful tool for helping contributors improve books.


 * The admin who has control of the google account would copy the usage information to Wikibooks, ensuring that private info is excluded. This answers your concern about privacy.


 * Exact numbers are necessary because particular books may only have 10 or 20 hits a week and subpages within a book might only be hit 2 or 3 times. This would be the case with new and partially complete books in particular.  The reason authors need hit counts is to refine their books, not just for the vanity of seeing that someone is reading them.


 * What sort of template are you suggesting be created and what would it be used for?


 * The script to activate Google or Wikimedia counting could be made available using a template so that the javascript is under the control of the admin responsible for the hit counter. Users would be able to add the template to any page that they wished to monitor.


 * I think any page counting method used, needs to have stats accessible by anyone, not send any identifiable information, and have little impact on browser performance. Having not mentioned my opinion about the Google script yet, I noticed that it was quite long, perhaps 5 times as long or more then the javascript needed to use the toolserver option. This means the Google option will have more of an impact on download time and run time then the toolserver option. --<span style="font: bold 10pt 'courier new', comic, sans, ms;"><font color="midnightblue">dark lama  11:02, 24 September 2007 (UTC)


 * Privacy. Google ranked 'worst' on privacy. http://news.bbc.co.uk/1/hi/technology/6740075.stm Also consider amending the Privacy policy even if just testing. E5ricky 06:15, 20 October 2007 (UTC)


 * The stats would be accessible to everyone on a weekly basis, copied into Wikibooks from the Google account.


 * The Google option operates at the user's browser so the extra run time would be slight. Authors should only place scripts (via templates) on particular pages for ad hoc monitoring in any case.


 * So, to summarise the Google option, for the sake of privacy only 1 or 2 admins would have access to the stats on Google. They would copy the public component of these to a page in Wikibooks that can be viewed by anyone. Authors would be able to add a template to a page to activate a Google count on this page (it may be necessary to copy the URL of the page to another, admin page, for inclusion in Google analytics). At the end of each week an admin would log on to the Google account and copy hit data to a report in Wikibooks. Of course, if it were decided that IP addresses of readers should not be private each interested author could have their own Google account.


 * It's not really an issue that some admin at wiki can see the data, but that in the process personal identifiable information is passed to Google. This information is then kept possibly forever and used by Google to do whatever they choose. Look at it this way if a commercial partner came to wiki and said we'll give you $$$$$ to see your user traffic including members/guests personal identifiable information, do you think members would be happy to say yes? Well by using Analytics you are giving that information away (for free!) to be used by Google or 3rd parties. E5ricky 12:34, 22 October 2007 (UTC)


 * The Wikimedia option would only be of value if it could give meaningful stats for low usage pages (ie: if it is used in a non-sampling mode). The value of hit counting is that it will let editors improve navigation, style and content by showing whether users are put off by certain pages and navigation routes. I would be in favour of a trial of the Wikimedia option - just to see if it really does burden the servers as much as we fear. If the trial shows that the burden is slight then the Wikimedia option should be used.


 * Given that a trial can always be terminated if it causes problems why not start one? RobinH 11:48, 24 September 2007 (UTC)

Just a thought.. Why not implement a form of the Wikimedia counter that has an expiry date? Editors would add a template that had the start date as a parameter and the code would only run if todays date were less than or equal to 14 days after the start date. This would allow editors to intensively monitor use of specific pages without overloading the servers. RobinH 12:07, 24 September 2007 (UTC)


 * Both options involve running in the browser and the browser sending information somewhere else. If a page is low usage, that could be indication of a problem with the page itself. I think the only way to resolve the privacy issues with the google suggestion would be to ask about it on the foundation mailing list or contacting the wikimedia foundation some other way. We could do all kinds of things with the WikiCharts counter once its setup, including implementing an expire date. I've tried to notify the person responsible for WikiCharts and am still waiting for a response.
 * Another possibility might be for one of us to try to get an account on the toolserver and setup are own counter. --<span style="font: bold 10pt 'courier new', comic, sans, ms;"><font color="midnightblue">dark lama  12:48, 2 October 2007 (UTC)

Google Analytics Test
I've installed the google analytics onto my own personal javascript, and after the first 24 hours there is some good data. We can see things like page views and other related information. Also, I can't find any personally-identifiable information in the reports, certainly nothing like what a CU would see (no IP addresses, etc). Somce concerns have already been raised by other people about security problems associated with this script. I'm going to do some research into this issue. I suggest that until we hammer out these problems that we do not install google analytics globally. However, if other people would like to participate in this test with me, let me know and I can tell you how to add the script to your account. --Whiteknight (Page) (Talk) 17:56, 25 September 2007 (UTC)


 * Excellent work. It will be interesting to see the results after the trial. If it does work it will be fascinating to get an insight into Wikibooks usage and usage within the books that I have written.  Is my writing style so bad that people stop after the first 10 pages?  Was that section on indicial notation as riveting for the reader as it was for me? Is the "advanced" part of the book of any interest to anyone? etc. etc. RobinH 08:32, 26 September 2007 (UTC)

A couple of early questions:

1. What is the performance impact on the servers?

2. Is it possible to put the script on 10 featured book front pages themselves for a while to see how big a shock this is to the system? RobinH 11:07, 2 October 2007 (UTC)


 * Whiteknight says above 'no IP addresses, etc'. this gives the wrong impression. Google does collect IP's using Analytics. http://www.roirevolution.com/blog/2006/09/view_visitor_ip_address_in_google_analytics.html E5ricky 11:56, 22 October 2007 (UTC)


 * It's a moot point because the foundation said we couldnt use the google analytics anyway. Google did collect the IP addresses, but I didn't have access to them. --Whiteknight (Page) (Talk) 12:08, 22 October 2007 (UTC)
 * Yeah, I realised you didn't have access to them, but they are collected. Not surprised the foundation said no as it don't comply with the privacy policy. E5ricky 12:50, 22 October 2007 (UTC)

Google Webmaster tools
Just had a thought, you can find out a lot already by using the webmaster tools. http://www.google.com/webmastertools You have to validate to prove your are the webmaster of a domain but once you do, it shows you what people type into Google search engine to bring you into your domain, you get a Top20. If you then enter the search term into Google search and pick the appropriate option, you'll see where the majority of visitors are landing to. So in effect you can see the Top20 wikibooks pages visited by people using Google. E5ricky 16:04, 22 October 2007 (UTC)
 * PS It's free, simple and doesn't invade anyone's privacy. E5ricky 16:06, 22 October 2007 (UTC)


 * It's interesting, but we can't verify. That means that we can only get the most basic level of information, which is the kind of stuff that we can get from google anyway. --Whiteknight (Page) (Talk) 00:26, 23 October 2007 (UTC)
 * Why can't someone here verify? E5ricky 00:47, 23 October 2007 (UTC)
 * Two methods possible to verify 1) Add file to root, seems maybe no wikibooks root as all redirect to /wiki/ ?
 * Second method is to add a meta tag, this is a temporary thing, does no one have access to add a meta-tag? E5ricky 00:53, 23 October 2007 (UTC)


 * "somebody" has access, but nobody here. We could ask the developers, but They will probably freak out. I'll ask, but no guarantees. --Whiteknight (Page) (Talk) 01:01, 23 October 2007 (UTC)