Fur Affinity cracks down on 'watchbot' accounts
Furry art community Fur Affinity has announced restrictions on the use of automated watching scripts, which they termed "watchbots".
While staff had been "addressing botters on a one-on-one basis for several weeks", to the tune of "roughly two dozen" accounts, they faced a growing number of users who were unaware of their position. Some also became concerned upon being watched by "TheNSA".
The trend appears to have been started by Mishka Burr, who claims to have watched over 160,000 users using a script on a Raspberry Pi. Several other accounts running a published watch script inspired by Mishka's work had over 40,000 on their watchlists prior to clearing.
Basically, when it was one or two people (namely Valdyrburr) we could look the other way. It was not an issue. But then it became a handful of accounts, and then a few dozen. And everyone took an "If Valdy can do it, so can we".
We initially reached out to discuss the issue with those responsible, and kept things in check. But things got out of hand, and we started taking action against individuals. And then it spread further, and here we are today. [Fender]
One or two mass-watchers was seen as a curiosity, but as others sprung up in their wake, their actions increasingly became an annoyance to regular users:
The one website whose user base is not primarily comprised of automatically generated accounts that pester real people and contribute nothing, and people find a way to make themselves into useless pesty robots. [skrimpf]
Mass-watching can also increase levels of submission and journal notifications, which previously caused major issues for Fur Affinity - although staff indicated that the bots did not "create a serious strain".
Some users asked for the ability to remove users who did not contribute in certain ways from their watchlist. Staff did not seem keen to add this feature.
It's not all bad news for bots on FA, with site leader Dragoneer expressing an interest in using them for administration.
About the author
GreenReaper (Laurence Parry) — read stories — contact (login required)a developer, editor and Kai Norn from London, United Kingdom, interested in wikis and computers
Small fuzzy creature who likes cheese & carrots. Founder of WikiFur, lead admin of Inkbunny, and Editor-in-Chief of Flayrah.
Comments
Oh my god what have I done...
Also, HAI
Who's to blame if FurAffinity has that idiotic system where notifications are actual entries in the database that have to be generated and then deleted all the time? Look at Tumblr, where your dashboard entries come from a simple, real-time DB query. And that's a much bigger service.
Seriously though, there's doing an experiment and then there's straining the servers for the lulz. I kind of like FA working, peeps. Tone it down, will you? :)
To be fair to FA, they haven't raised $85 million in funding - at least, it'd be news to me!
Inkbunny uses a similar scheme, as do many other sites. It works well enough when you have one database server and the indexes (if not the whole database) fit into RAM on one box. It's relatively simple to code, gives the user flexibility about keeping some things around for a while, and it's cheap to notify them when something new pops up - or to displaying a list of notifications (because you've done most of the relevant filtering ahead of time).
It's best when you have a low write count - or failing that, cheap transactions - and you clear old entries out regularly. It was that last bit that killed FA; they fell behind on writes. Probably didn't help that they were using spinning disks (I'm guessing it was too big for an SSD; that's why I've been so gung-ho about database size).
The alternative is essentially glorified polling. It may makes sense if you know you want most of the data that you're pulling (say, because you're looking at a dashboard of the latest posts), but in terms of computational effort it can be vastly more expensive. It works because once you get to that point, you're already planning to shard the database. Whether you see it as a form of Vectored I/O or MapReduce, it's the same concept - have several nodes query up to a certain date, and then do a final operation on those results.
Edit: And, you know what? Tumblr says an inbox system is the future (or was in 2012 - scroll to "Cell Design for Dashboard Inbox"). The architecture for furry art sites is effectively a one-cell system.
So you mean when Blotch posts a piece of artwork there are 43,000 entries generated at that very moment? yikes.
Yes, exactly. Put that way it sounds pretty bad, doesn't it? And to be fair, Tumblr likely uses caching, along with some tricks of their own -- I've seen some funny stuff happening. Also, GreenReaper makes good points above. This stuff isn't straightforward, and I haven't worked on services nearly that big.
The reason I'm being critical is twofold: one, I don't care how tricky the problem is, "clever" approaches will only make it worse; and two, this system has caused trouble for FA before (as pointed out in the article). Maybe, just maybe, it's a sign that this approach doesn't quite work for them?
Yeah, and if you call it 50 bytes with row headers and padding, that ends up as ~2Mb/update just for notifications.
That sounds like a lot - and it is a fair bit, though the image is probably 2Mb as well. You might even have to multiply it a few times for indexing purposes. But databases are used to dealing with lots of data; and for regular users, an update would be closer to 8Kb. More importantly, it's useful data in that lots of people want to know when Blotch posts an update - which is, like, every three months at this point.
(In fact, most of those listed on Popufur.com tend to post maybe once a day max. Because the most-watched artists tend to do good work, which takes time, their impact on the database may be minimal. Writing 2Mb in a block costs less than writing 256 8kb blocks.)
In theory you could instead do a query over the submissions of the people you're watching and not store any additional data. But doing this all the time costs a lot, too - you end up having to scan back through oodles of large submission records, rather than go directly to the ones you're interested in. This increases your I/O read requirements, and means you need to spend more on RAM and CPU to hold those large datasets - or, be prepared to write them temporarily to disk, which will instantly cost far more than an update by anyone. At that point, looking up the last 36 50-byte entries that are already in an ordered list starts looking a lot more sensible.
Um, then what are indexes good for? Or query caches, for that matter? And it's not about the size of those inserts, but about the overhead of performing inserts in the first place -- lots and lots of them -- only for those records to be retrieved once or twice and then deleted as each user gets rid of older notifications. Then there are all the things that can go wrong with this system -- all the extra moving parts. That's no theory; things are regularly going wrong over there. So I can't help but wonder.
Indexes are great! A well-constructed set of indexes, designed in collaboration with developers, can save you a lot of time. But no matter how good they are, if you're watching 5000 users and you do a query looking for the latest submissions of all of them, that either means 5000 index lookups that you have to merge, or scanning backwards through all submissions comparing them to ones created by people you're watching (which might be an index-only scan if you were smart enough to index {create_datetime, user_id}). Storing notifications, clustered or indexed by date (as appropriate for your DBMS), turns that into one very short lookup with no filter.
Query caches aren't particularly useful for data which can change all the time, like the latest submissions by your watched artists. At peak load, Fur Affinity has eight to ten submissions a minute. Each insert will invalidate cached queries based on that table. You could make a smart per-user cache that was only invalidated by updates by people they were watching - and at that point you've effectively recreated notifications.
You're right, write I/O is a concern. But it's not impossible to deal with. And it can scale, if needed - you shard the notification database and have each node record notifications for a set of users. This keeps each node's write load manageable. Reading remains fast because each user's query is handled by one node.
FA is encountering real issues of scale. It has people doing crazy things, like watching 100,000 people or +faving every 10th submission. Inkbunny does, too; someone likes to scrape our API for every submission and every artist every every six hours. (It was great, because I found a missing index which shaved 60ms off submission views, for everyone.) But our issues tend to be 1/10th the size of theirs due to our relative scale.
It's important not to focus on one system that went wrong once and assume it's the source of all problems. Right now, FA's main technical issue seems to be in their thumbnail generation layer, which has been struggling at peak times since mid-April. If you look at other measures of response time, they've improved recently. Not sure if that's because of the reduced watchers, or resolving issues that they found as a result of it.
Tumblr has an advantage in that "notification" is a more limited concept the way it handles things: someone liking you, following you, or reblogging you is a notification, but new posts from people you follow aren't notifications. And, of course, it has no concept of "read" or "unread" notifications: it just spools everything to you on the dashboard in reverse chronological order. They're huge enough that they have serious scaling issues anyway, although they were always designed with the notion that scalability was job #1, which is how they were able to push a fairly standard LAMP stack so damn far.
I'd like to see a furry archive site experiment sometime with a dashboard that's more like Tumblr's and less like the FA model, or perhaps something entirely different -- by 2014 standards, having to explicitly acknowledge every little thing that happens seems archaic, especially on FA's UI.
— Chipotle
The Japanese site Pixiv doesn't seem to have notifications for new work. It does reverse chronological order of everyone you follow. That said, I don't think notifications are archaic. I prefer that system because it lets you manage things when you want to. Otherwise if you wanted to leave something till later you will have page back through all the submissions to find what you wanted. With notifications you just leave it on the list until you want it. That's more manageable for a user.
"If all mankind minus one, were of one opinion, and only one person were of the contrary opinion, mankind would be no more justified in silencing that one person, than he, if he had the power, would be justified in silencing mankind."
~John Stuart Mill~
I suspect there are other ways to accomplish that. For instance: generally speaking, there's only a fraction of what I look like that I want to bookmark for later. Instead of the user experience being built around "explicitly tell me when you don't want to save something," why not "explicitly tell me when you do want to save something?" That could be as simple as one click or one key for any given item with a good UI.
Really, though, I'm just expressing disappointment that so few sites are playing around with the fundamental interaction models we've had for a decade. Everyone who isn't essentially following deviantArt's model (FA, IB, SF, Weasyl) is essentially following 4chan's (fchan, the "booru" sites, etc.). Yes, I know they work, but when push comes to shove, FA isn't going to be disrupted by "pretty much like FA but with some tweaks we think you're gonna love." My impression is that most of the other sites are betting that FA's eternal state of dysfunction is eventually going to cause it to topple and [insert competitor] will be there to fill in the vacuum. But I can't help suspect this is like Dreamwidth being there to fill the void left by LiveJournal.
— Chipotle
That would be the favorites system you have pretty much everywhere, including FA. Good point that it makes the ability to keep notifications around redundant.
It's generally true for all software. Touchscreen devices have amply demonstrated the inadequacy of the window/menubar/scrollbar/modal dialog system, yet barring a few adjustments we still cling to that dusty paradigm. And on the desktop we haven't even adopted those few adjustments.
Yep... and that never happens either. Technical superiority never wins. Linux versus Windows. Status.Net versus Twitter. It's natural, too: people go after convenience, and then there's the network effect, a.k.a. the "everybody uses X" argument.
I just wish more people understood that when we criticize something, for example FA, is because we care. The moment we stop caring, that's when you need to start worrying that we'll all go elsewhere.
Favourites and notifications serve totally different purposes. You don't favourite something to deal with it later, you favourite it because you really like it and think it deserves to be shared.
"If all mankind minus one, were of one opinion, and only one person were of the contrary opinion, mankind would be no more justified in silencing that one person, than he, if he had the power, would be justified in silencing mankind."
~John Stuart Mill~
Well, that or you just want to see it later. Notifications go away eventually, or you end up where FA was last year. (I wonder if people are +faving more on FA as a result.)
Those only expire after weeks. It's unlikely someone can't attend to it for that amount of time. Just using favourites as something to address later seems incredibly unfair to the author/artist. When they are told their work has been favourited they don't think that it's been added to someone's to-do list, they think it is actually a favourite.
"If all mankind minus one, were of one opinion, and only one person were of the contrary opinion, mankind would be no more justified in silencing that one person, than he, if he had the power, would be justified in silencing mankind."
~John Stuart Mill~
I think you misunderstood me. Many people like a work, and want to see it again later, but do not care about sharing it with others (in fact, we've had requests for hidden +favs).
Weasyl has that option at this point, however it's an "all or nothing" approach, so that ability to pick and choose which arts are public favs and private favs are still available nowhere.
What we're often doing now to "deal with something later" is leave it in our notification queue without clearing it -- which is effectively what Tumblr does by just telling us "here are new posts from people you follow, ordered newest backward." Except Tumblr does it in a way that's faster for users to deal with and much lighter-weight on the back end.
And are "save this" and "share this" really best served by combining them into one action? They're very different concepts, aren't they?
Again, my point isn't "FA sucks" -- it has its issues, but there are certain ways in which its user experience is actually better than its newer competitors. (The chief mechanism for discovery on all those sites is seeing what artists you follow -- or who follow you -- are favoriting and who they're following, and when I last checked, FA had fewer "hops" to do that than either SoFurry or Weasyl did.) My point is that we're not seeing anyone seriously experiment with these attributes. Believing that we have already found the Truest And Most Bestest Way That Can Never Be Improved seems dubious.
— Chipotle
Inkbunny isn't looking for FA to topple. That would be a tragedy - if only because we'd have to deal with a horde of 15-year-old drama-bombs. ;-) But seriously, it provides a useful service for much of the community.
I suspect no furry site is ready for the load/rate of growth that'd result if they became a "designated successor", though it might work out if fans were distributed among all the other furry sites. It takes time to identify your bottlenecks and integrate new arrivals.
If other sites can push FA to resolve issues to avoid losing its userbase, while experimenting with new features and providing space for those who want to try something new or who feel unwelcome there for some reason, that's a useful role.
I'm not sure exactly when Valdyrburr started their watchbot project as I didn't notice them until this spring, but Weasyl dev foximile (wweber) was running one over the FA account winkingskeever starting in November 2013.
For the sake of documentation, Mishka said on FurCast that he was also TheNSA (~18:30).
Oh my god what have I done...
Post new comment