Skip to topic | Skip to bottom
Home
Neuralyte

Down with Spammers!

Neuralyte.GrimeAper1.16 - 31 Oct 2009 - 17:12 - TWikiGuesttopic end

Start of topic | Skip to actions

What is GrimeApe?

GrimeApe is an HTTP proxy which enriches every web page you visit with the power of Greasemonkey Userscripts. In others words, it's Greasemonkey for all browsers!

Current GrimeApe has been seen working in Konqueror, Chrome and Firefox. It fails in IE6. It has not been tested in Opera, Safari, IE7, or IE8.

How does it work?

The proxy delivers most files as normal, but when an HTML file is requested, the proxy injects an extra Javascript script to load GrimeApe.

A little monkey appears in the corner of your browser and he loads up the Userscripts you desire.

Grimeape at work in Konqueror, showing userscripts running on Google results page, and the Grimeape menu

Included in the javascript is an implementation of the Greasemonkey API, to provide the usual library functions Greasemonkey scripts make use of. Our implementation communicates directly with the proxy (using a reserved URI path), in order to perform advanced operations such as maintaining state (GM_setValue/getValue), and making arbitrary HTTP requests (GM_xmlhttpRequest).

I was careful to stick to the GM API, without adding any extra special features. I want GrimeApe and Greasemonkey userscripts to be interchangeable. (I often wish that GM had some extra API functions, but this project has been made easier thanks to GM's API having been kept small!)

Status

We are around 0.9.3. The API is 97% complete, and there is a friendly UI. Any remaining bugs in the proxy or the javascript are well hidden. Most of the remaining work now involves tidying things up, porting to other browsers, and completing the compatibility javascript to make non-Mozilla browsers work like Mozilla.

I wrote the proxy with Konqueror as the target browser. After a few changes it also worked for Firefox. I know it doesn't work for IE right now, and I haven't tested at all on Safari or Opera. Support for those browsers should follow... Although I believe Opera already has Greasemonkey, and IE has its own userscripts implementation.

Keep-alive is now working pretty well. The leftover "0" chunk footer problem was fixed. Still, for 100% guaranteed behaviour, you can set HTTPStreamingTools.KEEP_CLIENT_SOCKETS_ALIVE and KEEP_REMOTE_SOCKETS_ALIVE to false, to disable all keep-alive, but this can make pages load more slowly. (Not yet implemented: POST with keep-alive + chunking. Currently we POST with "Connection: close".)

https is not currently supported. This is a bug. We should at least allow direct CONNECT even if we don't manage to inject into it.

Try It

You can try GrimeApe right now by using the public proxy:

  • hwi.ath.cx:7152

This proxy is single-user, in other words all users share the same config. It would be possible to make a multi-user version. But actually it makes a lot of sense for individuals to run GrimeApe on their local machine anyway, for speed. There are scripts to run GrimeApe as an xinet daemon, or a persistent service. (I am interested in an xinet daemon which would wake when needed and close when idle, but I fear this may require two processes, unless we can get xinetd to release its listen port to the proxy app.)

Download

GrimeApe is written in Java and Javascript. You can download the latest build from here:

http://hwi.ath.cx/code/java/web/SuperProxy/grimeape_builds?M=D

Browse the code

Excluding libraries, GrimeApe is essentially only three files:

Two other Javascript files are injected, one providing the XPathResult? object for browsers which lack it, and Base2, a library which improves browser standards-compliance.

I don't think I got a working fallback for old browsers which fail to provide XMLHttpRequest?.

Get the source

If you want to get a full copy of the source, you need these three projects:

svn co https://javawebtools.svn.sourceforge.net/svnroot/javawebtools/CommonUtils   # library

svn co https://javawebtools.svn.sourceforge.net/svnroot/javawebtools/SimpleProxy   # library

wget http://hwi.ath.cx/cgi-bin/viewcvs.cgi/java/web/SuperProxy.tar.gz?view=tar     # contains Grimeape

I keep the projects in separate folders and use Eclipse to compile. However you build, you should run GrimeApe from the SuperProxy folder, since it needs to be able to read the images and javascript folders, and be able to create subfolders in the userscripts folder and the persistent storage file grimeape_registry.nap.

If you are feeling lucky, you can grab or update a copy of my bleeding-edge source like this:

(TODO: We should exclude the various ?M=A indexes from wget's crawl, and the grimeape_build folder!)

GrimeApe is (c)Paul Clark 2009, released under GNU Public License.

Known Bugs

TOP TIP: If you find your Grimeape menu has reset to default settings (he has a sad face, and there are only the 9 default userscripts in the menu), then there are Javascript conflicts on the current page which prevented your config from loading. DO NOT interact with the menu to edit your config or enable/disable GA at this stage, or this fresh config WILL OVERWRITE your saved config. Instead, just navigate away from the page as normal, and you should find that your original settings are still alive. Provide a bugreport of the problem page, and we may be able to fix it. GrimeApe now pops up an alert box a brief summary of this warning, although it may be gentler to leave the warning until a save/edit-attempt, rather than immediately when the config fails to load.

There are some things we will never be able to fix. Security could improved, but I don't think we can ever make a jail as secure as Greasemonkey's, since all our scripts must run in the web page. We could attempt to hide our API library from the page, making it visible only to userscripts. But still the special proxy hook addresses will be exposed.

There is an open security concern regarding GM_xmlhttpRequest and Cookies, for browsers which send multiple cookie headers to the current page. (In GrimeApe.handleSpecialRequest() we need to remove the other styles of cookie headers.)

Konqueror's stability has improved since introducing Socket connection and read timeouts in the proxy, and fixing the chunking bug: now the spinner almost always stops. Occasionally Konqueror is not happy: the Spinner won't stop, the "Runaway Script" dialog pops up, the Error Dialog Window won't close, or Konqueror may crash.

The nastiest errors I can get currently seem to be on pages with many child frames. Konqueror can get locked up if multiple frames are popping up alert boxes which aret not closed quickly!

Some pages interfere with the monkey's smooth operation. For example the scripts provided with Web Archive results re-target the Grimeape icon image, causing it to disappear, and somehow they prevent the config from loading!

The GA menu can inherit unwanted style properties from the page. I don't know a neat way to set our style to default, other than overwriting every CSS property I know. :f Do you know a solution? (This recommendation seems rather heavy.) If so I think it should become GM_clearStyle(), since GM scripts often like to pop up extra floats. wink The GA menu could be made more user friendly in other ways too, maybe you can help with that.

Security concerns

Here are the security issues that the original Greasemonkey had to deal with: http://commons.oreilly.com/wiki/index.php/Greasemonkey_Hacks/Getting_Started#Avoid_Common_Pitfalls

We might be avoiding #1 by adding script references rather than the script source (to check). We are guilty of adding our API functions to the window object (concern #2). This may be addressed by only including the API with userscripts, not globally. But this would mean either: 1) providing the API once for every userscript served (inefficient), or 2) provide the API and all userscripts together (parse error in one userscript could break the others). Concern #2 may not be relevant anyway (see next paragraph). I believe we are safe from concern #3 (until someone "upgrades" the proxy code, so we should make an explicit check there!).

Is there really much point hiding the API? It has only content privilege in GA. The real danger is the exposure of the reserved URLs on the proxy. That could be exploited even if the API is hidden. simply by copying what the API does.

Proposal for securing our reserved URLs: We could deliver, in closure with the GM API + userscript, a one-time key for making a special proxy request. When a special request is made, key should be checked, and the next key should be returned to the javascript. This should prevent any javascript delivered by the website from messing with our reserved stuff! But this will only be safe if the API really is in closure, i.e. unavailable to all parts of the page except the calling userscript. (If we are not worried about interception of our request data, we can use one constant key instead of returning a new one-time key on each request. To avoid interception of returned keys https would be preferable, but shared secret hashing could work.)

In GrimeApe, any functions we embed in the page, either by window or unsafeWindow, will be readable and callable, which is undesirable. The page will not have direct access to the GM_ API functions however. (We may be able to fix readability by intercepting added events and wrapping them in a fn that calls the now hidden fn.) Unlike GM, all scripts in GA run in the content window (on the page). In GA's favour, that means untrusted scripts cannot leave the page, they can only attack our special proxy hooks. But against GA, attack can come through any of the functions we add to the page (either by window or unsafeWindow, unless we can build a real safeWindow). Maybe the worst the page can do to a normal script is to pass fake events with unexpected properties. In this case the worst our code is likely to do is set unpleasant values in the proxy (although if carelessly evaled that could be dangerous - escape dropout).

Let's not forget that the page can redefine certain commonly used functions to have a different meaning. It might be possible to circumvent this risk by forcing our own overrides: We could grab references of all the default functions at the start of page load, and check them again before we run our userscripts.

Hmm there may be some security concerns remaining. Maybe next we should build some example exploits and then try to prevent them.

Future

Fix all remaining bugs.

Split up config so changed config can be uploaded in smaller chunks.

There are some inefficiencies in the core. The de-chunking algorithm we employ reads the whole content stream into memory, when often we could be streaming it directly to client. Our pooled server connection handler is implemented too deep in the request handler, it does not know when the client has finished reading the response stream (making the socket free again for the pool). The current work-around for this, is again streaming into memory, before sending back out to client. Ideally we would pass chunked data straight to client, if we have no intention of manipulating the stream, but perform auto-dechunking if we call getContentAsString/StringBuffer(). I believe chunking may allow clients to stop reading the response when the end of the content is reached, even in the absence of a Content-Length header.

It might be useful to start a moz_konq_compat.js fortifier library, and do the same for other browsers. Since GM scripts were written for Mozilla, our browser should act as much as possible like Mozilla. I am looking for an implementation of XPathResult in pure Javascript. One example problem is that the mouseout event which fires on the document element in Mozilla fails to fire in Konqueror (which is breaking my ReclaimCPU userscript). The fix would need to detect the mouseout as well as possible, and then fire the event itself.

Major refactoring for efficiency: Currently the browser loads the injected Grimeape libraries, then Grimeape makes an xmlhttpRequest to load its config, and then decides which userscripts to load. This process could be significantly faster if performed by the proxy itself, which could inject the final script URLs (or less securely, the scripts' sources) directly into the page during delivery. However this method is more complex, since handling of the config data must now be performed by both Java and Javascript. Currently the config is held in json format, so Java would need a JSON parser, or a javascript interpreter. Or possibly better, we could re-create the config as a Java object, providing the javascript with only atomic (xhR) access to it. Some development in this direction is especially desirable, because the size of the config data grows with each added userscript, whilst the entire config is sent to the client with every page/frame, and sent from the client on every config change! (In the meantime, we could try loading the config as a Javascript include, rather than making an xmlhttpRequest for it, so that the browser can cache the "file" if it has not changed. We will need to provide Last-Modified-Date or ETag for the browser. Such caching is already working for userscripts.)

Questions / Discussion

What more does GrimeApe need?

Related Projects

-- JoeyTwiddle - 17 May 2009
to top

I Attachment sort Action Size Date Who Comment down
grimeape_konqueror_20090527.jpg manage 297.5 K 27 May 2009 - 14:39 TWikiGuest GrimeApe at work in Konqueror, showing userscripts running on Google results page

You are here: Neuralyte > GrimeApe

to top

Copyright © 1999-2010 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback