[WEB SECURITY] Twitter XSS worms
Arian J. Evans
arian.evans at anachronic.com
Tue Apr 14 14:23:06 EDT 2009
First, to Steve's question: output encoding should *absolutely* be an
"average" practice. It solves most XSS in plain-vanilla use-cases. In legacy
webapps that is often enough to squash all XSS.
Those of us who got it have been encouraging output encoding for many years
now. We did so essentially to enforce a boundary between data and function
for whatever the target interpreter is. (browser, consumer web-service,
I got into an argument with some academics at Stanford about output encoding
a few years ago because they said it didn't work. At first I thought they
just didn't understand encoding (many don't) but realized we were debating a
design philosophy using the wrong words.
Quite a bit of "web 2.0" software is being built explicitly to allow users
to cross the boundary their user data and user memory space can now be
turned into system-wide code with emergent behaviors. Which means that
"output encoding" becomes complicated, limited, and situational. This means
that, whether or not is should be, output encoding is not a "best practice".
(Unless we decide that all this "web 2.0" design turns out to be EPIC FAIL.)
Also -- Jim, did you fall asleep on me?
I had a pretentious mouthful of a quote: "and throw in some international
unicode transcodings". I wrote that myself!
But I did have a serious point here:
The Unicode Consortium is keeps changing forms they said they'd never
change, and making new ones (KC, KD, etc.):
The recently added "security considerations" more or less ignore the entire
world of Syntax attacks (XSS, SQLi) with the exception of ../ which they
cover by recommending you filter out "/" characters:
*The security focus is primarily on normalizing usernames and domain names,
to avoid stalkers and phishers and spammers. The goal: normalize identical
visual charset maps and such. The cool thing about this is done properly it
leads to Language Charset A --> transcode to --> Language Charset B --> ==
For you guys over in Isreal last year that asked me about the
Hebrew/u0590->u001 exploits I found; sorry for the really lame answers as to
what I thought were the reasons behind normalization, and thanks for
I now suspect this is the real reason: tr36 <-- reason for observed
transcodings in dating site catering to Hebrew and English users.
Concluding comments inline:
On Mon, Apr 13, 2009 at 8:48 PM, Jim Manico <jim at manico.net> wrote:
> > Output Encoding is an "average" practice and does not always work or
> solve for modern XSS weaknesses that result from "web 2.0" use-cases.
> > I think you will find that promoting "output encoding" as a "best
> practice" in "Web 2.0" is a challenge... it breaks more and more business
> cases I see.
> Encoding only breaks stuff when you do it wrong. Sure, silly stuff like
> "HTML Entity Encode everything" is an "average" but "flat out wrong"
> practice. The best practice is to encode all user output within the proper
> HTML context. Please take a look at
> https://www.owasp.org/index.php?title=XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet<https://www.owasp.org/index.php?title=XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet> for
> a complete discourse of defensive encoding.
OWASP puts out great resources for developers these days. Completely agreed.
The only problem I see IRL is that not only "encoding only breaks stuff when
you do it wrong" but "encoding only fixes stuff when you do it right." Most
folks programming don't do a good job at neither.
But we've known this for a long time now. Remember Spolsky's rant?:
"In this article I'll fill you in on exactly what *every working
programmer*should know. All that stuff about "plain text = ascii =
characters are 8
bits" is not only wrong, it's hopelessly wrong, and if you're still
programming that way, you're not much better than a medical doctor who
doesn't believe in germs. Please do not write another line of code until you
finish reading this article. Before I get started, *I should warn you that
if you are one of those rare people who knows about internationalization,
you are going to find my entire discussion a little bit oversimplified*."
(italics at the end are mine)
If you need to accept rich HTML content as user input, use a library like
> AntiSamy to provide whitelist HTML validation based on a specific policy.
Makes sense. You'd think more folks would build "web 2.0" software this way!
But they keep using all this other stuff too, like different HTML tags and
Flash widgets. Maybe OWASP could build a blacklist filter out all of that?
> As a very active web-2.0 heavy-ajax web application developer, I do not
> understand the comment that encoding (or anti-samy) does not work in the web
> 2.0 world.
I said "encoding", not Anti-Samy. Jeremiah Grossman yelled "anti-samy" at me
from across the room right after I posted! Dummy me. :( I just don't like
Samy: he reminds me of Arshan; so I have a mental block.
Anti-Samy probably works great. I've never used it myself. But encoding
safely all the time...*it works even better*, as long as you know what you
are encoding and where and why, and don't make any mistakes.
> Blacklisting will never work as a long term robust solution to XSS.
> Attackers can, will and have bypassed all kinds of blacklist filtering
> technology. Encoding when done correctly stops XSS dead - but it's not easy
> and requires programmers to do things differently.
I could not have said it better. People who use blacklisting from what I
have seen *clearly use it as a short term solution to XSS*, that sometimes
they keep on using. It's easy and allows their programmers who already
blacklist "bad words" to keep on doing things similarly to what they are
already doing without changing their behavior.
In the long run, they will be forced to take long-term, robust approaches to
XSS. But for those who are using blacklists today it's safe to say they
aren't there yet.
> What I think is best is doing both - blacklist/waf-like tech for tactical
> purposes - alongside deeper contextual encoding within the codebase for
> strategic purposes.
Makes sense, though I never thought of encoding as strategic.
I guess this means I am more of a visionary than I realized. Billy Hoffman
would normally have my back here, but HP started rejecting email from my
domain so I'm not sure he'll get the chance to come to my defense!
> - Jim
> ----- Original Message -----
> *From:* Arian J. Evans <arian.evans at anachronic.com>
> *To:* Steven M. Christey <coley at linus.mitre.org>
> *Cc:* Hoffman, Billy <billy.hoffman at hp.com> ; Chris Eng<ceng at veracode.com>;
> robert at webappsec.org ; websecurity at webappsec.org
> *Sent:* Monday, April 13, 2009 2:46 PM
> *Subject:* Re: [WEB SECURITY] Twitter XSS worms
> 1. No novelty here. Ajax attack == non-novel.
> "ajax-as-an-attack-vector-novelty" is a separate question from "what is the
> key XSS fundamental weakness? Or are there more than one?
> 2. What Chris said. :) This twitter example is textbook <XSS>.
> Output Encoding is an "average" practice and does not always work or solve
> for modern XSS weaknesses that result from "web 2.0" use-cases.
> As for "best practice" CWE mapping though -- wait! There's more. :)
> "Best" and "Average" programming practices for modern webapps need tree of
> options depending on the data, the use-case, and the various data
> transformations. Web app issues (at least syntax attacks like XSS) are not
> as black and white as buffer protections are. "Blacklist" and "Blacklist
> Escaping" seem to becoming more and more common as practices.
> Fundamentally XSS is a data/function boundary problem. It seems identical
> to SQL Injection and Buffer/heap overflows. Except we have no "stack
> canaries" or "parameterized values" in the web world for most intents. (I am
> ignoring checksumming of values that shouldn't be user-tainted like .NET
> ViewState hashes).
> Robust output encoding normalizes metacharacters used to escape
> data/function boundaries to escaping safe for the target interpreter
> I think you will find that promoting "output encoding" as a "best practice"
> in "Web 2.0" is a challenge... it breaks more and more business cases I see.
> Web "2.0" applications today handle user-tainted data in ways *unique* to
> the web world vs. unmanaged code because of:
> 2.1. Limitations of modern implementation level languages. Limited escaping
> and encoding libraries in most languages; no parameterization or "safe
> sandboxing" in web code.
> 2.2. Limitations of interpreters to "sandbox" data and memory management
> -nx ability with the user agents (browser also has no separate data/control
> channel at the protocol level, let alone sandboxing of functions at the
> document level)
> 2.3. Unique goals of extensibility in web code -- webapp "2.0" businesses
> *want* users to *extend* the code. Awesomeness! :)
> Web 2.0 apps, by business-goal and design, take user-tainted "function" as
> "data" and try to use this data as "limited but extensible function"...
> This is the reverse kind of problem we normally see in unmanaged code, and
> complicates simple solutions.
> You will see this all over social networks.
> They go down the slippery slope of allowing "user-tainted function" which
> completely invalidates the "best practice" game.
> control...at this point mitigation becomes tactical at best, and often
> Output Encoding only solves the "weakness" of allowing user-tainted data to
> cross the data-function boundary.
> Once you jump that boundary in "Web 2.0 land"...What "Best Practice"
> - Input Validation (type, length)?
> - Whitelist "allow safe functions"?
> - Blacklist "known dangerous functions"?
> (these two approaches are often combined in Drupal/PostNuke style CMS
> systems, with radio-button and menu-option combo filters)
> - Escape the dangerous metacharacters?
> And to heap on the pile -- not all data types are used equally.
> Example string: userIdDisplayArea=<SAFETAG:attribute>
> 1. In the URI, Hex-escaped for protocol type-safety:
> 2. The app server also un-escapes Hex-URI; you could have a double-decode
> canonicaliation issue, but is it a weakness?:
> 3. Base64 in the personalization cookie:
> \x27\uu0027\r\u0085 etc.
> 5. And then various unicode and transcoded interpretations passed around
> internal to the application. You see the output on international software
> that normalizes things like usernames at the database for visual
> These different encodings and representations make it challenging to
> specify a universally safe type of "output" in a webapp. "what output
> Which goes back to a "best practice" tree to select use-case, and
> data-type, and how you want it to behave in the document. :)
> As an aside -- I've found a good way to measureapproaches of "web 2.0"
> sites is to type in something like (alert) and 'alert' and \x27alert\x27,
> and throw in some international unicode transcodings, and see how they
> handle the input and output. If they blacklist those, it gives you an idea
> what they are doing, and NOT doing.
> But that's another story for another worm.
> Arian Evans
> On Mon, Apr 13, 2009 at 12:59 PM, Steven M. Christey <
> coley at linus.mitre.org> wrote:
>> For those who speak fluent XSS, how obscure was the attack vector and the
>> attack technique? Actually, what I'm really wondering is, would "best
>> practices" or even "average practices" have prevented this attack from
>> succeeding? either for the XSS or the CSRF angles. Is
>> Ajax-as-an-XSS-attack-vector still novel?
>> - Steve
>> Join us on IRC: irc.freenode.net #webappsec
>> Have a question? Search The Web Security Mailing List Archives:
>> Subscribe via RSS:
>> http://www.webappsec.org/rss/websecurity.rss [RSS Feed]
>> Join WASC on LinkedIn
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the websecurity