Open main menu

UESPWiki β

UESPWiki:Administrator Noticeboard/Archive 8

< UESPWiki:Administrator Noticeboard
This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links.


Blocking Rogue IPs at the Server

Given that the site is generally struggling so much, it really irks me to notice IPs that are clearly trying to systematically download large fractions of the site (even though they keep getting blocked by 503 errors), or that are repeatedly trying to post spam (even though the attempts are getting all getting blocked by captcha). Yet these unquestionably bot-controlled IPs keep showing up in the server logs. For example, 72.165.35.198 was denied access by the server 353 times during a 12 hour period today; some of the articles that this IP was so interested in obtained included <sarcasm>highly popular</sarcasm> pages such as Category:Oblivion-Factions-Nine_Divines-Primate and Special:Recentchangeslinked/Oblivion:Esbern (each was denied 6 separate times). And these were just the requests denied by mod_limitipconn (denied because the IP was trying to open too many connections at the same time).

Using iptables, it is possible to completely block certain IPs. This is a block at the server level, not just at the wiki, and completely denies the IP all access to the site. The IP would no longer be able to view a single wiki page, view any of the old site, view the forums, or anything else. If used against a legitimate user, that user would have no way to contact the site to point out the mistake. It's a pretty extreme measure, but one that has been used in a few past cases (as documented at Bad Addresses).

So what I'd like to throw open for debate is: Should we start blocking a few more of these IPs? And if we want to start doing it more widely, should there be a protocol in place to prevent the possibility of an IP used by a real reader from getting blocked?

A few ideas:

  • Before blocking an IP at the server level, add a message to the IP's talk page. For example "Unusual server activity has been reported for this IP, as a result of which we believe that this IP is being used by a bot to monopolize system resources. To protect the site, this IP address is about to be completely blocked from any further access to UESP. If you have been directed to this page because you are using this IP address, please post a message here immediately to tell us that a legitimate reader is using this IP."
  • If after an hour (?) no responses appear on the IP talk page, and the IP is clearly continuing to download site content, then proceed to block.
  • Keep track of the IP, date, and time of all such blocks on Bad Addresses (tweak the table format perhaps, or add a new table to mark the start of this new protocol).
  • After sufficient time has elapsed (one week? one month?), lift the block, again recording the info at Bad Addresses.
  • As long as the IP resumes its suspicious activity, continue to reinstate blocks. I'm really reluctant to impose such an extreme block on a permanent basis. I think it's worth the small amount of extra effort to lift any such blocks periodically, even if the block just needs to be reinstated again the next day.

As for what types of behaviour would trigger this, unfortunately, I'm not sure that it's easy to come up with a clear set of rules. I think it will ultimately have to be a judgment call on the part of the person who makes the block. However, an IP would have to trigger numerous error messages (hundreds) over a period of several hours. We clearly want to avoid at all costs blocking a legitimate user who just hit refresh too many times while trying to get a page to load when the site was busy. Also, I'd say the downloaded pages would have to appear "unusual"... which is where the judgment comes in.

At the moment, the only person who can do iptable blocks is Daveh. If we wish to move forward with this, I'd like to request that I also be given permissions to add/delete IPs. If other admins notice highly suspicious behaviour from an IP in the server logs, they could post the user talk page warning and add a request (e.g., at UESPWiki talk:Bad Addresses); then Daveh or I could take care of the actual block.

Until we try it, it's hard to say whether this will have a noticeable effect on site performance. Worst case, it will at least reduce the frustration of seeing bots show up in the server logs when you're unable yourself to connect to the site. Even in the best case, I doubt it will fix all the server slowdowns (I'd like to believe that the majority of the connections to the site are coming from legitimate users rather than bots!), but maybe it can at least make it so that the site no longer refuses to respond to anything for 15 minutes at a time.

(P.S., I've also been posting a series of other more mundane/technically obscure suggestions for performance tweaks at UESPWiki talk:Upgrade History. So this isn't the only option for how to improve the site's performance.) --NepheleTalk 03:20, 17 January 2008 (EST)

Support: Not sure if this is a voting one but hey... As we discussed earlier, I'm in favour of this. I'm not going to deny that such an extreme measure makes me feel a bit nervous but I can't think of anything else that's going to have the desired effect and the safeguards you've mentioned seem adequate. My only remaining concern is that it's yet more work being loaded on to you and Daveh. –RpehTCE 04:44, 17 January 2008 (EST)
As an addendum to that, I'd suggest that any IP already blocked, say as a nonsense bot or for span, can be added immediately without the hour waiting period. If they're blocked, a legitimate user would already have appealed. I'm seeing several known nonsense bots accessing the site and it seems a waste of time to ask them if they'll be inconvenienced :-) –RpehTCE 06:13, 17 January 2008 (EST)
What would a blocked IP see if they tried to access the site? If it's some sort of error message (404, 503, etc.), can we customize that error message to explain to them exactly why they've been blocked, and maybe give them a means of contacting someone to contest it? I mean, I'm all for going gung-ho against bots whenever possible, as anyone knows who's seen some of my more extreme suggestions for dealing with them, but leaving people without any explanation or way to contest a block makes even me a bit nervous. I know it's possible to make your own 404, 503, etc. error messages instead of using the browser-default, and it seems to me that this would be one way to at least leave some sort of recourse on the off-chance that a legit user is somehow affected. (It's possible that a legit user might have a trojan that is running from their IP, or that a proxy could fake its IP from another location, or even that certain dymanic IPs which get moved around and used by many separate locations might be affected in this way.) All of our other methods of blocking, such as those used on Nonsense/Spam bots and other open proxies, all of them still allow the blocked IP to post on the talk page if they wish to contest the block, but this would prevent any such chance, and has the potential to affect legitimate users if we're not extra careful about it. --TheRealLurlock Talk 13:43, 17 January 2008 (EST)
I have seen scripts that automatically IP block an address at the server level my monitoring server logs for DoS like events (like the ones Nephele was talking about). This sort of block results in no error page (that I'm aware of)...its just like the server does not exist (the web server never sees the request). The 503 error page results from the web server DoS module kicking in but if the client is running some sort of download software (or whatever) it probably wouldn't make any difference. Perhaps a temporary automatic IP block (for a few days) is more appropriate in such an event. -- Daveh 13:56, 17 January 2008 (EST)
It's quite clear from the error logs that whatever bots are involved here are basically ignoring the 503 error message. They just keep trying again and again until they get the page they're trying to access. So it seems likely that ultimately the iplimitconn isn't doing anything to limit the number of IP connections; in fact, it's really doing the opposite since the bots will now make 5 or 10 HTTP requests instead of 1 to obtain a single page. To the extent that it's true that the bots keep trying, it may not be doing anything to limit the bandwidth use either, because they still get the document in the end. Not to say that iplimitconn is doing nothing. At least it's slowing down their requests: the downloads are spread out over a longer period of time, and in the meantime more regular users can get in (hopefully).
I'm also concerned, although I haven't been able to confirm it yet, that when the bots are blocked by iplimitconn, the bots are somehow forcing the 503 connection to stay open until apache forcibly times out the connection. It is clear that when the site gets busy there is a problem with incoming "R" requests hanging in "R" mode for a full 300 seconds; when one quarter of the site's connections are stuck open for 5 minutes at a time that's definitely going to have an impact on site accessibility. Unfortunately, the apache server status doesn't allow you to see the IP address of these "R" requests so I can't confirm where they're coming from. All I can say is that times when a lot of "R's" show up in the server status reports do correspond to times when a lot of iplimitconn blocks show up in the error logs (which admittedly could also just be that when the site is busy, there's more of everything going on).
In any case, staring at the logs too much over the last few days does make me think that we need something that's more effective against these bots. Even if it's only a short term measure until we can find other ways to improve the site performance: if the site was running smoothly 100% (or even 95%!) of the time, I wouldn't really care about them. But right now, it seems to me very likely that legitimate readers (and editors) of the site are being denied access because of these bots every time there's a site slowdown. I'd much rather take the (small chance) of locking out a real person with a bot-infested computer than continue to very certainly turn away real users day after day.
More specific comments:
  • It's true that IPs who have already been blocked as nonsense bots on the wiki probably don't need an extra message. But as I've been pondering the feedback, I think it may still be worth adding the extra message just in case there is a legit user who never cared about the wiki block, but suddenly notices the problem when he loses all access. In this case, it wouldn't necessarily be an ahead-of-time warning, but more of an after-the-fact explanation once the user gets access again (yes, the wording of the message would also need to be tweaked accordingly... assuming we go with the manual approach instead of some newfangled automatic apache mod!). Also, just to be clear, I don't think we need to go through and do a server-level block on every IP that's ever been used by a nonsense bot. I'd say that only bots that continue to show up in the logs need further action (and, again, only with temporary blocks that get reinstated as long as activity continues).
  • We could customize the 503 error messages that are currently being displayed when IPs get blocked. Which might in fact be helpful, since it's clear that most editors don't know what the messages mean when they first see them.
  • The whole point of a server-level block is to completely prevent our computer from having to do any work at all. Apache (the web server that provides all HTTP responses) never even needs to see the connection, and therefore apache doesn't need to waste any of its resources deciding how to respond. Therefore, it's not really possible to provide a friendly explanation message. Thus the caution about extended length blocks and trying to notify ahead of time.
--NepheleTalk 19:57, 17 January 2008 (EST)
I too have been wondering about those lingering 'R' connections which I don't recall seeing before, at least in the amounts there has been lately. If you're familiar with netstat you can login into the server and do more specific lists of IPs connected to the server. While I haven't noticed anything recently I have used it in the past to catch 'bad' addresses DoSing the server in some manner. For example:
   netstat -an | grep ESTABLISHED | sort -k5 | more
lists all established connections sorted by IP. Note that its not too unusually to see a few IPs with a dozen connections since the OB/SI maps can easily generate a dozen server requests for each view. -- Daveh 09:22, 23 January 2008 (EST)
OK, I just happened to catch one of the particularly suspicious clusters in action. On server-status 20 R connections appeared within 10 seconds of each other and, when I noticed them, had been lingering for 199-208 seconds. The server was otherwise pretty quiet (only 17 other active requests) and had been quiet for a while, so it's unlikely that these were triggered by the server getting bogged down. When I used the netstat command, lo and behold, there were 20 established connections from 89.128.216.85. Then in the process of writing this, even more R's appeared, and netstat is showing a huge burst of connections from both 24.201.104.51 and 89.128.216.85. Neither of these IPs is being reported by server-status (i.e., the connections do seem to correspond to the unidentified Rs). In netstat, both the sendQ and recvQ columns are 0 for all of these connections which (if I'm reading the man pages properly) says that neither direction claims that more data needs to be sent. Most of the other established connections had non-zero values in the sendQ column.
The final interesting piece of the puzzle is checking the error_log file for apache. Just doing a grep on the last 5000 lines of the error log, 89.128.216.85 is only showing up once as being blocked by apache for exceeding the connection limit; 24.201.14.51 is showing up 6 times, but all from 4 hours ago. (Both do come up more times as I scan deeper back into the error log). Which means that I'm not sure that iplimit is doing anything about these connections. I'm guessing that iplimit is waiting for the IP to send a request before trying to block or get rid of them (since the iplimit criteria are all based upon which files are being requested). As long as they just hang there, the server's letting them monopolize our connections until finally the connection times out.
Everything seems to confirm that the lingering R connections can be tied to one or two IPs that are misbehaving. And our current measures aren't doing much to control these IPs. --NepheleTalk 02:58, 24 January 2008 (EST)
Last night I specifically checked during a time when the site was quiet to be sure there weren't other extenuating factors; on the other hand, it meant that having two IPs block 40 of our 100 connections for 5 minutes at a time wasn't really interfering with any other readers. Today I figured I'd snoop as the site got busy and confirm whether the same activity is happening when the site starts to bog down.
In the server-status snapshot, all 100 connections on the server are now busy. 55 of those connections are lingering R connections that are more than 2 minutes old. netstat shows 28 established connections from IP 71.180.214.84 and 27 established connections from IP 81.214.45.167. Neither IP is visible in server-status, so these two IPs are indeed responsible for all 55 R connections. With them blocking more than half of our connections from legitimate users, it's no real surprise that we're all having trouble accessing the site. In the time it's taken me to write this, all of those 55 connections timed out. But now 71.180.214.84 is back again with 65+ connections, from that one IP alone. Needless to say, server-status is completely clogged up using all 100 connections, but the vast majority are Rs.
Just to do some quick math: the server averages more than 30 requests per second. If one of these IPs blocks 20 connections for 300 seconds, that's nearly 10,000 requests that are unable to get through each time one of these IPs attacks us. And from what I've seen in the logs, these IPs keep doing it time and time again for hours. We really need to find a way to get rid of these pests. --NepheleTalk 13:06, 24 January 2008 (EST)

Privacy Policy

I just happened to spot a user accessing UESPWiki:Privacy Policy and, not having read it before, thought I'd better take a look. Well... see for yourself. This clearly needs fixing since it's one of the links at the bottom of the page. The talk page has a link to an old Booyah Boy sandbox here, which contains a proper policy that seems to have been on the verge of being accepted. I propose we basically put that live. Three little things need to be fixed:

  • Bread Crumb Trail,
  • List of users with CheckUser (since all admins have it now), and
  • The IRC section - I'd suggest keeping that to a minimum and linking to the IRC page.

If nobody has any objections, I'll go ahead and make the change. –RpehTCE 07:09, 17 January 2008 (EST)

I have no objections. --Mankar CamoranTCE 07:46, 17 January 2008 (EST)
Nor I. Full speed ahead? Muthsera 09:01, 17 January 2008 (EST)
Have fun rewriting rpeh. --Ratwar 09:15, 17 January 2008 (EST)

Okay, I tweaked it, Nephele tweaked it some more and I've just put it up at UESPWiki:Privacy Policy as a preliminary proposal. If nobody raises any concerns in a week, I'll take the "proposal" tag off. –RpehTCE 16:54, 17 January 2008 (EST)

Thanks, Rpeh, for noticing this and dealing with it :) Although I might have to go and check the locks on my bathroom doors now ;) --NepheleTalk 17:28, 17 January 2008 (EST)
I'll stay out of this one. But keep in mind that we have ads and the adservers place cookies. Cookies which most likely track users across multiple sites, etc. So our policy should probably not say that we're not doing that. :lol: --Wrye 18:08, 17 January 2008 (EST)
That's a very good point. It's too late at night for me to think properly about that but the policy needs to mention it. Again, I'd say a brief mention and a link to a page on Google. I'll do that tomorrow unless anybody wants to jump in first. Do you have any other concerns though Wrye? You seem a bit equivocal, and I'd definitely welcome your opinions. –RpehTCE 18:26, 17 January 2008 (EST)
At this time the ads don't appear to set any cookies, but being externally loaded data that might change as decided by Google. -- Daveh 19:47, 17 January 2008 (EST)

Douglas Goodall Interview

Before Oblivion's release, an interview appeared on The Imperial Library, only to be removed a little bit later. I now have a copy of the interview, and wish to post it. It can currently be found in my Sandbox. I've already ask Daveh about it here. If anyone has any objections, let me know. --Ratwar 09:15, 17 January 2008 (EST)

Just to be official... no problems here. It's an interesting read. It would be interesting to hear the other side of the story though. –RpehTCE 09:19, 17 January 2008 (EST)
No objections from me either. --Mankar CamoranTCE 11:27, 17 January 2008 (EST)
Yep, I also think it would be useful for the site's readers to be able to read the interview. I can even think of a few pages that might want to link to the interview (e.g., some of the Dwemer books for confirmation that they're just random). And if there are other developers (or ex-developers) who want to provide additional information, I'm sure we could find a way to accommodate more interviews/developer feedback. --NepheleTalk 17:34, 17 January 2008 (EST)


Templates and Performance

At the moment, templates are the performance burden of the site. However, I beleive we can clean some of these templates to gain some performance. For example, there are some templates which are only used as placeholders of other templates which are not used anywhere else, like Template:stub and [[Template:stub-mw]]. These templates may be joined into a single one to save the server from wasting resources by including another template. One notable template that calls many other templates is Template:NPC Summary, but I don't really know to what extent this can be flattened.

There are also templates which are just a redirect to other templates, like Template:Linkable Entry and Template:LE. These templates can be deleted and replaced by the root template, again avoiding the server from redirecting. In the particular case of LE, this is quite a drag, as it is usually used many times (dozens) in a single page. If shorthands are needed, there are a few other options: a) have NepheleBot replace shorthand templates with the root template every now and then; b) install some kind of hook to replace the code before saving the page; c) put a link to the root template in MediaWiki:Edittools.

Well, I hope I have drawn your attention, as I have seen this issue poping up in many places of the wiki, and I think it deserves discussing. These are my first ideas, but I hope you have some more. Any comments? --DrPhoton 08:55, 23 January 2008 (EST)

I've thought about template performance and while I haven't explicitly measured it I'm not sure its a major cause of the site performance issues. Unlogged users always see the cached version of a page so it doesn't matter how many templates the page has. Templates can slow down the display of an page to a logged in user (I think), or the first time the page is rendered, or if a commonly used template is changed.
Of course, having said that, I think it still might be worthwhile to look at any template optimizations that can be easily done. -- Daveh 09:27, 23 January 2008 (EST)
I'd agree that having templates redirecting to other templates isn't a great state of affairs and I'll also agree that there are a number of essentially useless templates around on the site. In terms of useless ones I think there are three categories:
  • Templates like SI, OB, BM etc are shortcuts that only spare keystrokes at the sake of hurting server performance and so should be removed.
  • Templates like [[Template:Quest Link Short|Quest Link Short]] and [[Template:Place Link Short|Place Link Short]], the latter of which has finally been proposed for deletion, which are at least as long as the commands they replace. There are dozens of templates around, and the rationale for many has been lost over time.
  • The last set is best represented by LE. Yes it's just a shortcut but it can have a huge effect in reducing the length of pages. We'd have to judge whether the performance benefit of smaller pages is offset by the reduced clarity of the name if we completely removed Linkable Entry.
I'll admit to being a bit over-keen to create new templates - everyone always likes to play with a new toy - but I don't think it's the new templates that are hurting performance. If that were true I'd expect to see the site slowdowns occur only when the processor usage is very high, and that's not happening. Even so, we may need to go through and look at each one and see if it's really necessary. That's a big job though. –RpehTCE 10:13, 23 January 2008 (EST)
I also don't think that templates are responsible for the site slowdowns and inaccessibility problems; my apologies if some of the comments that I've made have suggested otherwise. Furthermore, I think that templates are a very useful tool that should continue to be used widely on the site wherever appropriate. On the other hand, there are several specific bugs/limitations with templates that do cause problems, in particular on pages with large numbers of templates. And templates are not an efficient way to provide editing shortcuts. So I agree that it is useful to try to limit unnecessary template use. Some specific thoughts:
  • I agree that shortcuts like SI, etc. are problematic. On the other hand, I know that a fair number of editors have adopted these templates. One possible compromise with these templates is to keep the templates in place and available for editors, but have NepheleBot periodically go through in off hours and expand the templates to the non-shortcut equivalents. It allows editors to have the convenience while eliminating any longterm impact on the server.
  • Templates like Quest Link Short really aren't needed any more; they were useful back when quest pages were on quests subpages (e.g, Oblivion:Quests/Find_the_Heir), but various rounds of reorganization have made them unnecessary. If anyone finds templates like these that are being used on more than a handful of pages, please add them to NepheleBot's list (at User talk:NepheleBot#Update Links) and the bot can clean them up very easily.
  • I'm not sure that there's any template overhead from template redirects. My understanding is that when the server sees LE on a page, it directly inserts the contents of Template:Linkable Entry, just as if Linkable Entry were being used on the page (note this is very different from nested templates, such as for example on Stub right now; this only applies when the template page is of the form #redirect [[Template:Linkable Entry]]). On the other hand, I don't know that the page length reduction caused by replacing Linkable Entry with LE has any benefits. That page length reduction is only relevant for the version of the page seen by wiki editors, so there's a small difference when an editor loads the page, and there's a space reduction somewhere deep in the database (and we're not short on disk space). The length of the HTML page is completely unaffected, and it's the length of the HTML page that has any impact on the site overall, in terms of the bandwidth needed every time a reader views the page. So in my opinion, the only advantage from templates such as LE is the editor shortcut (although the disadvantage is far less than that for other shortcuts such as SI).
  • NepheleBot has already finished cleaning up all of the stub, needs image, and cleanup templates. So now all that's needed is for someone to merge the namespace-specific text/images into the main template and prod the others.
  • I don't think the complexity of templates such as Template:NPC Summary is fundamentally a concern. The template is complex because it includes a lot of different features, and it calls other templates because those templates provide an efficient and/or standardized way to do certain tasks (e.g., to figure out what colour to use for each race). But given that the template is not used hundreds of times on a page, the complexity does not cause problems. Even on a page such Oblivion:Knights of the Nine, which was just changed to call NPC Summary nine separate times, the template parsing statistics are fine (Pre-expand include size: 469855 bytes; Post-expand include size: 91472 bytes; Maximum: 2097152 bytes. In other words, it's only 20% of the way to where problems start to occur). It's important with templates like these to use /Doc subpages, but I don't think we want to start sacrificing usefulness.
Overall, I think it's worth keeping our eyes open for cases where templates can be improved. But I don't know that we need to systematically go through and examine every template on the site. --NepheleTalk 13:21, 23 January 2008 (EST)
Well, it seems I misunderstood how templates work at the server side. As I understand it now, templates are rendered every time a new page cache is built, right? So, does this mean that they can have an impact on editing and previewing pages, and consequently on the server?
As far as templates redirects, I did a little test to check whether they have any impact. I open two edit windows with Morrowind:Base Weapons and on one of them I changed all LE by Linkable Entry. Then I previewed both edits and I haven't noticed any appreciable difference in the time they were redenred. I then changed the text on both pages a bit to force the server to build a new cache, and again no difference. So it seems redirects have no impact on the server.
I don't want to start a site-wide template optimization/revamping unless it's necesary. So if you think templates aren't a concern, let's just leave them alone (except for normal maintenance/cleanup), although. --DrPhoton 03:29, 24 January 2008 (EST)

Random Page Link

Has somebody done something to affect the Random Page link? If you hit that link, it shows a page at random, but then shows the same page the next few times you hit the link, before eventually changing to a new random page? --Gaebrial 03:43, 25 January 2008 (EST)

I'm guessing it's a side effect of the new squid cache that was implemented yesterday to improve site performance. Given that wikipedia uses squid caching and their random page link seems to update every time, it must be possible to specify somewhere that random page bypasses the cache... but I have no idea where at this point since I have almost no idea how our fancy new setup is working ;) Hopefully Daveh can look into it this weekend. --NepheleTalk 12:31, 25 January 2008 (EST)
It was just working fine for me, both logged in and not. Try again and see if it is still broken for you. It may have been a temporary side effect of changing the site's DNS entry. -- Daveh 13:10, 25 January 2008 (EST)
I just tried it and it didn't work for me. The same page appeared eight times before a new one showed up. --Mankar CamoranTCE 13:15, 25 January 2008 (EST)
I get the same thing. Also, while we're on the subject, I've started getting talk pages from it too - I'm sure it always used to give out only articles in subject spaces. –RpehTCE 13:35, 25 January 2008 (EST)
I'm getting that too, where it gives me the same page several times in a row before picking a new one. Also, I've been getting talk pages from it for ages...maybe one of us is crazy? ;) --Eshetalk16:07, 25 January 2008 (EST)
While we are at it, it might also be better to rename it to "Random article". --Mankar CamoranTCE 16:37, 25 January 2008 (EST)
I always got talk pages along with regular ones, as well as short sub-pages (e.g. /Description and /Author pages) all the time. One reason I've not made very much use of the feature. I very rarely get any actual article pages with it. I'd be just as happy if the feature were removed entirely, at least the way it is now. If it could be programmed to omit Talk pages and sub-pages, it might be worth keeping around, but as it is, it's nearly useless, I think. --TheRealLurlock Talk 16:44, 25 January 2008 (EST)
I believe that idea was brought up a while back but I was never able to look into it more. It should be possible once I have some time to do it. -- Daveh 18:06, 25 January 2008 (EST)
Ok, I get the same thing now. It seems that IE and Firefox have this issue but not Opera which is why I didn't see it when I tested it last. This is most likely an issue with the Squid caching the page when it isn't supposed to. -- Daveh 16:41, 25 January 2008 (EST)
I'm getting the same thing with the latest Opera too now. And having downloaded it I notice all the things that don't work about our layout - breadcrumb trails overlapping the dividing line, the main logo being partly overlapped by the big column etc etc... –RpehTCE 18:12, 25 January 2008 (EST)
UPDATE: I was testing the Random page link again, and it brought me to the Imperial Guard Talk page or something like that. I read it for about 20 seconds, and then I hit random page again, then I was brought to the Wabbajack page! It worked! So I hit it again, then it didn't work. What happened to make it work once, then fail? --Playjex 12:10, 27 January 2008 (EST)
I think there's a brief span after you hit it for the first time where all it wants to do is give you the same page. (Hope you all can navigate my complicated technical lingo ;).) I was doing that last night, too, and if you just wait a bit it works again. Also, the problem seems to come and go, for me at least--at some points during the day, I wasn't having any trouble with it at all. --Eshetalk12:33, 27 January 2008 (EST)

Thank you for replying. I do not know if this has to do with this problem, but what type of Internet service do you have? (Verizon, Optimum, IE, Firefox etc.) --Playjex 13:47, 27 January 2008 (EST)

Someone should look into coding the Random page to only go to non-discussion pages. :P --24.59.255.2 16:12, 27 January 2008 (EST)
Well, it doesn't take you to a Talk/Discussion page EVERY time. --Playjex 19:03, 27 January 2008 (EST)
Is it me, or is it FULLY working again? --Playjex 14:31, 20 February 2008 (EST)
It's working for me too. I imagine one of Daveh's recent changes fixed it. –RpehTCE 14:39, 20 February 2008 (EST)

Pages Missing

It has been brought up on the forums that a couple of pages show up completely blank. I have talked with nephele about this problem before and it is mostly with IE users, now noticed in firefox as well. Both the Oblivion:Places and the http://www.uesp.net/wiki/Oblivion:Merchants show up as a blank white page. It would be nice if someone could check into this as it has been a few days since the initial report. Thank You! 24.230.191.109 12:41, 27 January 2008 (EST)Bear24.230.191.109 12:41, 27 January 2008 (EST)

I started a related discussion about this at Problems Saving Pages so that hopefully our editors can recognize some of the symptoms and fix the problem before readers notice it. At this point, I'm pretty sure that the issue is not browser specific, but rather is related to whether or not the reader is logged in: editors viewing the pages anonymously will get blank pages if the cached copy of the page has been corrupted, but logged-in editors don't get the cached copy of the page (not that logging in is really a way to fix the problem: at this point being logged in means you're more likely to see delays in viewing pages, and you're just as likely to get corrupted pages; the only difference is that if you get problems when you're logged in they only affect you and you generally get some type of error message). I think I fixed Oblivion:Places already last night. I'll try to do Oblivion:Merchants right now, too. Although with the site being busy it might not be possible until later. --NepheleTalk 13:46, 27 January 2008 (EST)
Update: I was able to fix Merchants earlier today, but I couldn't fix Places until just now. But I have now confirmed that both pages are working for anonymous readers. (Until the next time they glitch, that is...) --NepheleTalk 03:03, 28 January 2008 (EST)
I'm sorry, but both Oblivion:Places and Oblivion:Merchants are again blank for me as anonymous reader. No idea if someone can fix it, or that more serious steps are needed. --ErwinF, 82.95.216.132 14:16, 29 January 2008 (EST)
I've just purged both pages so you should be okay now. Honestly your best bet is to get an account! –RpehTCE 14:25, 29 January 2008 (EST)
Well, getting an account isn't really a great solution either: the pages are equally likely to fail if you have an account; the only advantage with an account is that you'll get an error message (after waiting 5 minutes) instead of a blank page. And for the server, it just means more work.
Obviously we need to come up with a better fix for this problem. Even when we fix the cached page, it is apparently very liable to get broken again next time the wiki decides to update it (and I'm not sure at this point whether we even have control over those updates... some are triggered by the page being modified, some are triggered by any embedded templates being modified, but it's also possible that the wiki automatically refreshes all pages once per day or at some similar interval). And then having 15 different editors simultaneously request Oblivion:Places every time someone reports a problem like this really doesn't help the server either (and, yes, that's what just happened in the server logs). In fact, I think right now we're just tripping over each and making things worse: 15 people simultaneously purge the page, half of the requests work and half fail. So it's just a crapshoot as to whether or not the page ends up actually getting fixed.
Oblivion:Places needs to be revamped to make it somewhat smaller... although there's only so much that can be done, because I'd guess that many readers rely upon having a single page that lists all of the dungeons in a single place. Similarly, Oblivion:Merchants could perhaps be tweaked, but there's only a bit that can be done without starting to fundamentally make the page less useful to readers. Increasing the server timeout might help (especially now that UESP's content server is somewhat isolated from incoming requests, the timeout might not be as important for controlling rogue requests). Maybe there are some ways to tell the wiki and/or the squid that certain pages should never be automatically refreshed (then if we can get a stable cache of the page we can hopefully prevent it from being overwritten with a bad copy behind our backs).
And in the meantime, could I perhaps suggest that only admins try to purge a page when one of these requests come in? I know everybody else is just trying to help fix the problem, and there is a risk of a slower response if we need to wait for an admin. But right now we're running into a far more real problem: having the server get shut down for half an hour with a ton of redundant requests, all simultaneously asking the server to do one of the most CPU-intensive requests possible. If only admins are trying, it should hopefully limit the server to only two or three redundant requests. Thanks :) --NepheleTalk 14:52, 29 January 2008 (EST)
Update: I've made some key changes to the two main templates being used on these problem pages: Template:Merc on Oblivion:Merchants and Template:Place Link on Oblivion:Places. The upshot of which is that the server needs to make somewhere from 90% to 75% fewer template expansions when parsing these pages. I've confirmed that there's been a significant reduction in the required processing by looking at the parsing statistics. For example, whereas Oblivion:Places' pre-expand size used to be 1359976, it's now only 496372. And qualitatively, saving the page seemed to take place much more quickly just now.
I'm going to keep an eye on these pages over the next few days to confirm whether those changes do a better job of actually fixing the pages. If anyone notices that they're still blank, post an update here. If we need to go to plan C, then we'll figure out what plan C should be ;) --NepheleTalk 04:04, 30 January 2008 (EST)

Talk Page Capitalization Inconsistency

I just noticed this. In the Main, User, UESPWiki, Category, and Image namespaces (basically all the default ones that come with the wiki software by default, I guess), the corresponding Talk pages use a lower-case 't' in the word "Talk". On the other hand, the Talk pages for all the added namespaces, including all the gamespaces, use a capital 'T'. I don't know if it's worth doing anything about, or how you'd go about changing that sort of thing, but it does seem like something that we might want to think about? I don't know. I just get annoyed by inconsistency sometimes... --TheRealLurlock Talk 22:42, 27 January 2008 (EST)

It's something that only can be changed by Daveh. As far as I can tell, the "talk" for the traditional wiki namespaces is fixed (there is a $wgMetaNamespaceTalk setting to override "talk" but the notes state that this variable is only respected on non-English wikis). So our only option would be to change everything to "talk" instead of "Talk". That can be done in the LocalSettings.php file by editing all of the names specified in the $wgExtraNamespaces array.
On the other hand, I'm inclined to say it shouldn't really be a high priority to get fixed. The wiki software automatically adjusts the capitalization so User talk:Nephele and User Talk:Nephele are both valid links; Oblivion talk:Oblivion and Oblivion Talk:Oblivion also both work. The only time I'd previously noticed it was when creating a link to a page from outside of the wiki, but talk pages rarely get linked to.
In other words, it wouldn't be hard for Daveh to fix, but I don't know that it needs to fixed. --NepheleTalk 18:20, 28 January 2008 (EST)
I suspected as much, though I was hoping it'd be possible to change them all to "Talk" rather than "talk", only because there'd be far less pages to change, since most content (and therefore most talk pages) are in the non-default namespaces. I guess it doesn't matter if the auto-capitalization can figure it out either way. Just an odd thing I noticed... --TheRealLurlock Talk 22:22, 28 January 2008 (EST)

Squid Assessment

Since we just made it through our first weekend with our new squid two-server setup, I thought it might be a good time to discuss how it's working and what (if anything) needs to be done next to improve our server's responsiveness.

First, for those reading this and wondering what on earth "squid" means. Last week Daveh added a second server to UESP, so we now have twice the computing computer. The change was basically invisible to readers and editors because the second server is a squid cache. The new machine (squid1) receives all of UESP's requests, responds directly whenever the request is for a cached (i.e., saved and unchanged copy of a) page, or else transparently forwards the request to our main server (content1). In other words, you just type in "www.uesp.net," then the computers figure out which one needs to do the work, and you get a response without any idea that there are now multiple UESP computers.

The good news is that the squid setup is working ;) There have been a couple of issues reported.

  • The Random Page Link is relatively minor issue; while it would be nice to get it fixed, I don't think it's a high priority while the site is still having performance issues.
  • One site outage which is worrisome, but so far just a single case and might possibly have been a side effect of the switch to squid.

And the new server has improved site responsiveness overall. Pages have been loading faster, and our site's downtimes have been less prolonged and/or less severe. Server status shows that the content server's workload has decreased substantially: content1 rarely has more than 10 requests at a time, it's responding to incoming requests very quickly, and its CPU load is great (1.2% right now).

The bad news is that I really don't think that squid itself is enough to fix the site's problems. Over the weekend, the site was better than it has been on past weekends. In other words, I didn't just walk away and give up on trying to use the site for 12 hours at a time. Nevertheless, performance was poor. It took minutes to access pages most of the time. And at one point on Sunday afternoon, I was unable to access anything (even a server status) for nearly half an hour. I finally gave up and restarted apache on content1, which prompted the site to start responding again. As I'm typing this, the site is clearly getting busy again, and it's taking a couple minutes to load pages. One issue that's unclear right now, though, is to what extent these slowdowns are affecting the typical (anonymous) reader, or to what extent they only affect logged-in readers/editors (with the squid cache, it's possible that anonymous editors who only view cached pages could get good responses while logged-in editors who always view freshly generated pages get poor responses). From a few (possibly non-representative) tests I've done while not logged in, the slowdowns seem to affect anonymous readers, too.

So I think more tweaks are needed if we really want to have a site where readers and editors aren't constantly frustrated by inaccessibility problems. Unfortunately, one side effect of the switch to squid is that it is now very difficult to diagnose performance problems. I don't know of any ways to find out what's happening on squid1, i.e., if squid1 doesn't respond, what's going on? And from content1 there's no way to keep track of who is making the requests (the immediate IP source is always squid1), so it's not really possible to monitor for bogus or problematic requests. Which means that I don't know how to go about figuring out what types of tweaks are needed.

For me to help more with diagnosing and recommending what would be useful, I'd like to start by requesting some ability to access squid1. Even just being able to login to squid1 and run netstat would provide some useful information; if there are other tools available on the server to monitor incoming requests (e.g., number of requests from a given IP, types of requests, etc.) then those would also help.

Also, logic tells me that the same problems that we had with rogue IPs are probably still happening now. There's no reason why the IPs would suddenly disappear overnight just because our servers were reconfigured; it seems far more likely that those IPs are still bombarding the site with useless requests but the requests are now effectively invisible to the available monitoring tools (because the requests are all showing up on squid1 not content1). Having access to squid1 will help to confirm or deny this theory. But I think implementing some tools to deal with these IPs will be needed. The simplest short term solution would be for me to have access to iptables on squid1 and therefore have the ability to block the IPs for a week or a month at a time. Or else a better long term solution would probably be some type of apache module that does this automatically.

Any feedback? --NepheleTalk 16:45, 28 January 2008 (EST)

I was planning on getting Nephele access to squid1 I just haven't had the time (had to work some over the weekend plus still away from home). The issue on the weekend was strange. The site had just about the same traffic on Saturday/Sunday yet there were no issues on Saturday. I spent a little time on Sunday trying to track down the issue but couldn't find anything obvious (no huge DoS or other clients abusing the site it seemed). I'm not entirely sure the caching is caching everything it should be and the bottom line may be site traffic is still too much for two servers anyways (more logged in users which bypass the cache).
Another thing to keep in mind that poor site performance is self-limiting to a point. When performance gets very bad people will end up aborting the web request which reduces load on the server a little bit. Even though we've introduced a cache we still may hit the peak performance of the server albeit by serving more requests. This weekend had a slightly higher number of requests than the usual weekend but not by a huge margin (the previous weekend was a bit higher).
There are still a bunch of things Nephele suggested a while ago that we can try to get even better performance but they'll take time to do. I prefer to just change one thing at a time and see what happens over a few days rather than do them all at once and hope nothing breaks. I should have some time this week even though I'm away. -- Daveh 17:01, 28 January 2008 (EST)
One more issue that needs to be fixed was just pointed out: the forum software now sees every single contributor as coming from the squid IP address. Is it possible to set up the forum software to be more squid-aware (obviously the wiki software is still able to access the original IP address; can phpbb do something similar)? Or is there some other way to fix this problem? Because at the moment the forum moderators have lost one of their useful tools for monitoring/controlling spammers and other miscreants. --NepheleTalk 14:20, 29 January 2008 (EST)
Sounds terrific to me. Are there like, any other "updated" versions of the "Squid" Daveh uploaded? If so think we should give it a try once we test the current one out? And what is a "phpbb" that Nephele mentioned recently. I see you two are working very hard, and you deserve this when I say it, thank you for doing so much to the site for us other users. Thank You, and I will keep in touch with this discussion later. --Playjex 15:03, 29 January 2008 (EST)
What do you mean by 'updated'? I installed whatever the lastest stable/release Squid package was (2.6.?). This is the same version as Wikipedia uses. The easiest fix to the forum issue is to either figure out how to properly foward IPs (even if possible) or to have the forums avoid use of the Squid completely (e.g., a seperate subdomain forums.uesp.net). -- Daveh 21:27, 29 January 2008 (EST)
For the record, phpbb is the software that runs the forums part of the site. --TheRealLurlock Talk 21:37, 29 January 2008 (EST)
For now the forums are now accessed via forums.uesp.net which bypasses the cache (they can still be accessed from the old link which does not bypass the cache). A quick search doesn't reveal any easy solution to the IP issue...this is exactly how a Squid cache is supposed to work. There is the X-Forwarded-For header but it requires whatever app to specifically check for and use it (i.e., I'd have to modify phpbb assuming its even possible). -- Daveh 22:25, 29 January 2008 (EST)
Correction -- Looking more closely we did actually experience significantly higher traffic last weekend by about 20% of a normal weekend (which is 10-20% higher than a typical weekday). It was actually the highest number of page requests we've seen in at least several months. -- Daveh 21:27, 29 January 2008 (EST)
Haha, I apologize Daveh. I just thought that maybe there was an other version of it. Sounds good to me (even though I'm not an admin). Thank you for replying. -Playjex 14:10, 30 January 2008 (EST) P.S. Thanks for specifying what a phpbb is ;]
There is Squid 3.0 which was just released in December, but 2.6 is fine for now. -- Daveh 17:14, 30 January 2008 (EST)
We have a new problem reported with the squid server: it is not allowing non-logged-in users to navigate through the category pages properly. For example, on Category:Oblivion-Quests there's a "next 200" link that is supposed to take you to the next page and show you the rest of the quests. If you're logged in, the link works; if you're not logged in, then the link just gives you the exact same page (starting from entry 1 again instead of starting from entry 201).
I'm guessing the problem is that the squid server is not recognizing that these two links are different:
  • http://www.uesp.net/w/index.php?title=Category:Oblivion-Quests
  • http://www.uesp.net/w/index.php?title=Category:Oblivion-Quests&from=To+Serve+Sithis
In other words, it isn't recognizing that the "from" keyword causes the content of the HTML page to change, so it just keeps dishing out the same version of the page sitting in its cache instead of requesting the correct modified version of the page. --NepheleTalk 17:13, 20 February 2008 (EST)
Prev: Archive 7 Up: Administrator Noticeboard Next: Archive 9