User:Xbony2/badnumbers

This pages talks a bit about how comparing numbers like edit count can be bad/misleading.

Wiki vs. wiki
"Which wiki is better, the official wiki or the unofficial wiki?"

How much this matters is another question, particularly if you aren't an editor (I have a rather predictable answer in my FAQ if you want to know my opinion, which also happens to link to here). But one of the first ways to attempt to answer this question (or one of the first ways someone might) is by asking another question, "which wiki is bigger?" The most obvious way is try to look at the article count or to go to Special:Statistics. However, each statistic can inflated for various reason or be generally bad for comparisons; that's more or less what this page is about. Let's go through each statistic and see what's up with them
 * Content pages (which is the article count) can be inflated/bad for comparing for various reasons:
 * Here on the FTB Wiki we have material pages, such as Aluminium and material form pages such as Ingots. Instead of having an article for Aluminium Ingot, Aluminium Plate, etc, we simplify and centralize it by having articles on just each material and each material form, and leave behind redirects. On the wiki that shall not be named, they sorta have material pages, but they also have an article for each item, although here we would believe it to be unnecessary. It's the same amount of information but with a different amount of articles. As of the current revision of the Aluminium page, if we did one article per item there, we'd have 62 (!) articles. The biggest material article that I can think of at the moment is probably the Copper article, and if we did one item-one-article there, we'd have 150 (!) articles. The average material page probably has a lot less, but you can see how we would have many more articles if we did it a different way, even though that way would be much more complicated and hard.
 * But, how much would the difference really be? You might be surprised. On the popular Chinese modded Minecraft wiki-like website www.mcmod.cn, their GregTech 6 section has 17,255 (!!!!) pages on GregTech 6. Yes, they're mainly components. Most of them were made via bot (I believe, there's no way they did it manually). That's not even up-to-date with all of the stuff GregTech 6 adds. As of the latest version of GregTech 6, it looks like it would be 155,078 pages. So, um, yeah, if we did that, our article count would be multiplied by a significant factor. It would be pretty crazy. We would also have to disambiguate these for GregTech 5, Thermal Foundation, etc. Disambiguation between GregTech 4 and 5 is partially what recently caused the wiki that shall not be named to have like a thousand more disambiguation pages than us (which count as articles, btw).
 * On a similar stroke, we more often keep together dyed items. For example, see Magenta Mana Petal on the unofficial wiki and Magenta Mana Petal here. Rather than have 16 different articles we just have one for Mana Petals since they're the same.
 * On this here wiki, we allow translations and this of course makes the article count larger. Chances are most people only care about English articles. Should Avaritia/zh-cn count as an article? We'll leave that up to you. However, because of the way our translation extension functions, every page marked for translation automatically generates an "/en" page (for lack of a better name), such as Avaritia/en. This article contributes nothing and is only required because of that technical reason. There's hundreds of /en pages that are counted as articles.
 * Quality! I'm sure you've all heard of "quality over quantity." For example, we currently have about 400 articles marked as, whereas the unofficial wiki has (!) 1700 articles marked as stubs. I suspect they use that category as a catch-all for articles with spelling and formatting issues and whatnot, and some of those probably aren't very stub-y, but my point still stands. A complete article in theory contributes more than a stub.
 * Popularity/relevance. This a form of quality, one could argue. Some time ago, I documented this mod called Steel Sheep. It was a funny little mod, and it was fun to document. But, how many people will find documentation on it useful? Probably not a lot of people. It might be argued if I contributed towards Thaumcraft 4 (or whatever), more people would have seen it and used it, meaning my contribution would have been more valueable. The popularity/relevance of mods changes over time of course; for example, here we have more information suited towards newer versions, which is becoming more and more relevant, and on the wiki that shall not be named we have more information suited towards 1.7, which is becoming less and less relevant.
 * Outdated information. We have plenty of it, the wiki that shall not be named also has plenty of it. Most of it isn't marked. I'd like to think we're more up-to-date (since we've been described that way a few times), but nobody's been keeping track.
 * Pages can also be inflated/bad for various reasons:
 * Do see "uploaded files" under this. It's probably the majority of the pages on the unofficial wiki.
 * Because of the way they're set up, translated pages make a lot of pages in the "Translations:" namespace. See here. This is probably the majority of the pages on this wiki.
 * This isn't huge, but a lot of the users page on the wiki that shall not be named are wikia leftovers, like this one. There's somewhere between 500-750 of these. These user pages were automatically generated when Wikia users visited or logged into a wikia, when the wiki that shall not be named was on the Wikia network.
 * Any reason articles are.
 * Uploaded files. Woah, dude! The wiki that shall not be named has a shit ton of files! What's up with that, dude! Well, the majority of their files (and the majority of their pages in fact) are grid files, like this one here. On this wiki, we use tilesheets. Basically all of the grid/tile images for each mod are part one image, like File:Tilesheet GT5 16.png. At the moment we have about 300,000 tiles registered in our database. That's a lot of images that we would hypothetically upload if we did something similar to the wiki that shall not be named has done. Our tilesheet system saves a lot of time, but it also allows navboxes to load faster since only one image is loaded instead of possibly hundred. Anyway, moving along...
 * Page edits since Feed The Beast Wiki was set up (edits for short >.>) can be inflated for the same reasons the number of articles can be inflated.
 * Quality (again). This applies to edits even more than articles. Not all edits are equal; not even close. For example, this edit (the creation of the page) took much longer to do than this edit (which took like two seconds). And bot edits really don't take that many seconds at all to do! Like the time I typed  to help delete that category. Typing that command took three seconds (or so) and it modified ~400 pages, adding 400 edits to our count. According to my count, if it may interest you, bots edits make around 30% of the edits on this wiki. That's a large amount. It does take effort to program, execute and configure bots, so they're not worth nothing, but the average bot action takes less time per edit to do than the average action made by a user. Although, how it all adds up is a mystery.
 * I'm not sure how it is on the wiki that shall not be named, although one of the members there implied that they use their own accounts for some bot tasks related to disambiguation. As such I suspect it would be difficult to measure what percentage are bot edits. But regardless, most of our edits (both of us) are minor edits, such as spelling fixes, disambiguation and link corrections. This is how it is on most wikis, in fact.
 * Translations. Because of the extension we use, translating a page such as GregTech 5 would take about 20 edits. This is a lot compared to the amount of edits that making that page would take. It's a bit weird like that.
 * Tilesheets. Back to the uploaded files; on the wiki that shall not be named, each file upload is a page creation and therefor an edit. They also have to disambiguate files apparently, which makes extra edits (and if you remember, the differences between GT4 and GT5 has caused a lot of edits because of the disambiguation in the mainspace; this also applies for the file namespace). This also seems to include some amount of reuploads and changes to the file description. How many edits does this cause? Well, a lot of (maybe even the majority of, I'm not really sure) the edits of RZR0, the bureaucrat of the wiki that shall not be named, comes from images uploaded. He has over 100,000 (!) edits. I don't want to say he hasn't contributed a lot to the community and his wiki, because that's not true, he's a done a lot, but because of all of these uploaded files, his edit count is amplified. And hey, so I don't sound unbalanced, so is mine for various reasons.
 * Average edits per page might sound like a decent measure of quality. It's not, for the same reasons not all edits are of the same quality.
 * Registered users is hella amplified on our end, because of us being part of Gamepedia. The number of registered user accounts aren't everyone on Gamepedia (as the Minecraft Wiki's is higher), but I think it is pretty much everybody who already has a Gamepedia or Twitch account and visits the wiki, which is a lot of people. But back to the count itself, it's important to keep in mind it isn't the amount of people who have edited each wiki, and it isn't the amount of viewers or anything like that. It's kinda meaningless. A lot of accounts without any contributors are bots, btw. I don't know why, but a lot of accounts are generated by say, scripts in New Zealand (and stuff like that). They can't seem to do anything, but they are there. This affects both us and the wiki that shall not be named, although if I had to guess it probably affects more because Gamepedia is much higher in Alexa. This might be different now since you can't sign up normally but its legacy remains.
 * Active users is probably my favorite measure, but it's still not that great. For one thing, coming back to quality, some editors contribute more in a month than others. Don't get me wrong; we truly appreciate all contributions, but we'd be lying if we said some guy fixing a spell mistake was equal in value as someone who documented an entire Thaumcraft or so over the course of the month. Don't get me wrong; It's not a contest, but my point stands, it doesn't factor how much each users actually does. Lastly, this doesn't affect it much, but anonymous users aren't counted in this stat. Oh, and most importantly, ours has been stuck at 37 for at least a year now due to bugs in the software.
 * Bots. It's interesting to me that the wiki that shall not be named has many less bots than us. However, only about three of our 12 (not including 2 (maybe more?) bots that had their user group/rights stripped) bots are actively used. Some were never used or hardly used. But, one the wiki that shall not be named, bots appear to not be used. No serious activity has happened during or after 2017 (I will not count "DeathCamel57 bot" which has 12 edits, 7 actually being bot edits). I have heard that the admins sometimes use automation utilities on their own accounts but I have no idea to what extent this is. I doubt anyone would argue more bots equals better, but someone might argue that we're more automated (whatever that means) or something.
 * Administrators. We have 7, but really we only have 3, the others are bots/Gamepedia. The wiki that shall not be named has 28 (!), but really they only have like 2 that are active. On our wiki, we're less liberal with assigning admin rights, we'd rather give "editor" rights (or staff rights in the old days) since it covers most of a serious editor needs. Also we generally remove admin rights from those who become inactive (which is around 6 people I believe, maybe I missed one or two though). Some people think more admins means more editors or something; there's certainly a correlation when comparing wikis but I don't think there's one in this particular case (one might argue the wiki that shall not be named is more active than us, which probably isn't true but it is more arguable than specifically saying they were 4 times as active as us).
 * Bureaucrats. We have 2, the wiki that shall not be named has 1. The one there is the site owner; I suspect he doesn't plan to give bureaucratic status to anyone else. We are conservative for a different reason; we probably won't appoint any more bureaucrats unless we need one. Saying so might be frowned upon, but I'm the most likely to become bureaucrat if/when Retep or Santa retire (I would prefer to have at least two to keep balance, and I think most people would agree with me). This measurement is unuseful as it's difficult to spell.
 * Curse, banhammer, Gamepedia Staff, etc... all roles that are wiki-specific. I don't think I need to go into why these aren't good for measurement.
 * Editor and Contributor are roughly equivalent in idea. Both are a group allowing for most permissions that an editor/contributor would need, both are not difficult to obtain (on the wiki that shall not be named you pretty much just have to request it, here we have to have a vote but it's usually a short and uncontroversial vote. both you just have to a few edits over a few days to be trusted), and it's for life. It's important to keep in mind, however, we did not start using the Editor group the way we do now until quite recently, whereas the wiki that shall not be named has been using it since early 2013. Basically they've had a 3-4 year lead on us :P Before the Editor right we just had staff rights, which was harder to obtain and is now abolished. Some of our Editors are former staff, but most former staff do not have this right.
 * Semantic statistics... obviously we don't have SMW, so this isn't relevant.

Other measurements

 * Article wiki depth. That's this. (My calculator is here). It's an interesting way to measure quality and collaborativeness... except it can be flawed for the same reasons that the number of pages and edits can be amplified. Also, since a lot, probably even most of the collaboration and discussion goes on IRC and Discord (or in the case of the wiki that shall not be named, they also do some wiki discussion on their forums), that isn't counted anywhere, although it would be counted in wikis that solely use talk pages or noticeboards.
 * I invented another measurement, "balanced" wiki depth, which is the same thing as wiki depth, but it doesn't count files. This was developed for the funz to compare the Minecraft Wiki and us, as they have one image for each block/item while we use tilesheets, but it's still shit since edits can be amplified for other reasons as I explained earlier.
 * RC activity, as in looking at Special:RecentChanges and saying "hey, there's more edits today/this week than on the other wiki." It's important to keep in mind that it might look like there's more activity when there's really not much more. For example, maybe on the wiki that shall not be named, the owner uploaded a bunch of the grid images from one mod, and there's a bunch of entries, even though it's not really that hard to upload a bunch of images. And maybe here, the main activity of the day is some user writing a guide, which is only a few edits, but it's a few big edits that take hours to do. And maybe another day some loser does a bunch of small edits that take a few seconds apiece, over the course of an hour, and that's the main activity of the day. But on the wiki that shall not be named, some guy writes a few articles over the course over a couple hours, but it looks like less in the RC. You see the point. Also you can't see any bot edits per default; bot edits are arguable "less activity" but probably some amount of activity.

User vs. user
Everything that's wrong with wiki edit count can be wrong with user edit count, basically. Except when applied to me of course, I'm the best wiki editor there is. That's sarcasm btw, everyone know it's who is the greatest editor ever

TL;DR
When comparing two wikis or two users (or more than two), built-in statistics and other measurements may be useful, but they are far less than perfect due to a large variety of reasons, particularly quality.