#115 ✓resolved
nickg

Cleanup of Envjs.uri

Reported by nickg | March 17th, 2010 @ 08:50 AM

Envjs.uri currently mixes in java. Ideally we'd like envjs.uri to be pure-javascript, as it will allow alternative JS engines to be used easier. I added some unit-tests for Envjs.uri and there were a lot of problems (for instance relative URLs with ".." in them did not work).

From some previous work, I more-or-less duplicated python's urlparse module http://docs.python.org/library/urlparse.html here:

http://code.google.com/p/stringencoders/source/browse/trunk/javascr...

with 1,000,000 unit tests here:

http://code.google.com/p/stringencoders/source/browse/trunk/javascr...

Using the function urljoin I think Envjs.uri could just focus on special cases (javascript:, about:blank) and leave the heavy lifting to generic code.

I'm working on a patch and tests now. Just a heads up. My guess it the final version will move the code from xhr to somewhere else (? platform/core/util?). Thoughts?

--nickg

Comments and changes to this ticket

  • nickg

    nickg March 17th, 2010 @ 02:39 PM

    Hi again,

    actually chris, I need some guidance on where the urlparse file should live... in specs/xhr? That way location could use it as well.?

    --nickg

  • nickg

    nickg March 18th, 2010 @ 04:50 AM

    • Tag set to url

    Work in progress.

    Using urlsplit/unsplit/join the function becomes this. The only java used is to get the current directory (which should be Envjs.cwd or something)

    This doesn't take into account ticket #118 yet

    Envjs.uri = function(path, base){
        //console.log('constructing uri from path %s and base %s', path, base);                                                                      
    
        // Semi-common trick is to make an iframe with src='javascript:false'                                                                        
        //  (or some equivalent).  By returning '', the load is skipped                                                                              
        if (path.indexOf('javascript') === 0) {
            return '';
        }
    
        // if base not passed in, try to get it from document                                                                                        
        // Ideally I would like the caller to pass in document.baseURI to                                                                            
        //  make this more self-sufficient and testable                                                                                              
        if (!base) {
            if (document) {
                base = document.baseURI;
                if (base === 'about:blank'){
                    base = null;
                }
            }
        }
    
        // if base is still empty, then we are in QA mode loading local                                                                              
        // files.  convert relative path to fully qualified file:// url.                                                                             
        if (!base) {
            var parts = urlparse.urlsplit(path, 'file');
            if (parts.path && parts.path.length && parts.path[0] != '/') {
                // TBD "get current working directory" need to be made a                                                                             
                parts.path = java.lang.System.getProperty("user.dir") + '/' + parts.path;
            }
            return urlparse.urlunsplit(parts);
        }
    
        // handles all cases if path is abosulte or relative to base                                                                                 
        return urlparse.urljoin(base, path);
    };
    
  • nickg

    nickg March 18th, 2010 @ 01:02 PM

    ok, it's getting close and handles normalization of file w.r.t to query strings and whatnot.

    So far I have 185 asserts in my unit test for the generic case.
    And maybe a dozen or so for the specific Envjs.uri cases.

    Should have patch ready tonight.

    --nickg

  • nickg

    nickg March 18th, 2010 @ 06:20 PM

    This is not a patch that should be commited to The Master, but it can be used to test with.

    http://github.com/client9/env-js/commit/3fbc94d726681ca9d843ccb5972...

    Chris: I'll need some guidance on how to slide files around. Let's talk this weekend.

  • nickg

    nickg March 18th, 2010 @ 06:42 PM

    argh, ignore that last post. of course the one thing I didnt unit-test had a bug.

    To test, you'll need my copy of src/platform/rhino/xhr.js

    Cut-n-Paste with
    http://github.com/client9/env-js/blob/master/src/platform/rhino/xhr.js

    You can apply the following patches
    http://github.com/client9/env-js/commit/3fbc94d726681ca9d843ccb5972...
    http://github.com/client9/env-js/commit/560757564062f8bc5468472d95e...

  • Steven Parkes

    Steven Parkes March 19th, 2010 @ 08:14 AM

    Rather than doing the indexof stuff, just using a uri parse on the path? Then you can just look at the scheme and see if it's javascript or about?

  • nickg

    nickg March 19th, 2010 @ 12:48 PM

    re:Rather than doing the indexof stuff, just using a uri parse on the path? Then you can just look at the scheme and see if it's javascript or about?

    Yeah, that could work. I was using my generic urlparsing routines which just take strings as input and didn't want to do urlsplit multiple times on the same input.

    / handles all cases if path is abosulte or relative to base
        // 3rd arg is "false" --> remove fragments
        var newurl = urlparse.urlnormalize(urlparse.urljoin(base, path, false));
    

    But maybe the right thing to do is to make urljoin ALSO take the broken down parts.

    I will continue to play.

    --nickg

  • Steven Parkes

    Steven Parkes March 19th, 2010 @ 01:08 PM

    I haven't used the Python url stuff in a while. (Obviously) the Ruby port uses the Ruby classes. They've got some really nice features. They'll do url_a + url_b which does pretty much what you expect: if url b is absolute, with scheme etc, it's just url_b. If not, it handles maintaining the scheme/host/port while handling the relative vs. absolute path stuff. Calling code doesn't have do anything expect pass it URLs (and it handles generic URIs where it doesn't understand the scheme, etc.)

    I actually think this would make a great external JavaScript library, if it doesn't already exist. I find myself needing it all the time and I'm always hacking around it and I never get it right the first time.

    Just thoughts. Certainly don't need to scratch my itches. I'll get to it eventually if no one beats me to it.

  • nickg

    nickg March 19th, 2010 @ 01:15 PM

    Steve, I have scratched your itch! The code i'm using is in javascript .

    url a + url b is urljoin(a,b)!

    urlsplit breaks down the url is all sorts of form (urlsplit is the reverse).

    there is also a urlnormalize which does some additional cleanups.

    I threw the source code of this into another project of mine, but it's urlparse.js and urlparse-test.js over at:

    http://code.google.com/p/stringencoders/source/browse/#svn/trunk/ja...

    I just included 'urlparse.js' directly into the 'rhino.js' in my commit, since i didn't know where to put it (short term hack).

    Please try it out! I'm happy to pull this out as a stand-alone project.

    --nickg

  • Steven Parkes

    Steven Parkes March 19th, 2010 @ 01:44 PM

    Pretty cool ... and a lot of the plumbing.

    I have to admit, I've never been crazy about the Python API for this, though. Just breaking things down to a tuple ... a fixed tuple ... I don't think that's a general solution.

    I suppose this is part of the reason the base class in Ruby is called URI, not URL. And subclasses are used for different schemes, so they can add attributes as appropriate. Ruby uses a hierarchy of classes to represent the difference schemes, and will fall back to URIGeneric if it's clueless about a particular scheme.

    And makes things like 'urlunsplit' the (obvious) uri.to_s in Ruby (or uri.toString() in JavaScript).

    I was wondering if we had an object oriented approach to URIs, we could get rid of Envjs.uri: it's really just doing base && base.plus(path) || path, isn't it?

    A few observations (that might not be true:

    I don't think we should be stripping anything from the right hand operand in the + operation: query args, frag, etc. The file:// fetcher would need to ignore these fields in the filei/o, but that doesn't mean it's not part of window.location (FF and Chrome pass these on).

    I don't actually think there's an official spec for file URIs.

    We should handle javascript: and data: URIs. The data: requires itss own parsing routines ... and then the fetcher (that switches on scheme) will want to handle substituting results from a decode.

  • nickg

    nickg March 19th, 2010 @ 02:02 PM

    re: ... Python API for this, though. Just breaking things down to a tuple ... a fixed tuple ... I don't think that's a general solution.

    Actually it returns a weird tuple AND a dict at the same time. It's oddly bizarre, which is why I didn't do that part of it. The JS code returns a object with basic properties you might expect.

    Other than that I kept to python API, well, it made testing very easy. And I wanted to keep it simple, and be "one file", keep it generic as possible.

    re: spec
    There is very much so a spec for URI: http://labs.apache.org/webarch/uri/rfc/rfc3986.html

    re: data, about, etc.
    But unlikely any spec for "about:", "javascript:", "data:" and in fact probably violate the spec above. That said, you could take my junk, and put a nice javascripty layer on top. THat's kinda what Envjs.uri was attempting to do (or should morph into).

    re: file fetcher should ignore query, frag
    You are probably right, however, envjs, java, and libcurl don't do this. However, fixing this is really easy. move my urlnormalize out of Envjs.uri and let the file fetcher do it.

    Anyways..... now that I have a large volume of test cases, I (we?) can think about a more OO design. Let me know if you want to hack on this.

  • Steven Parkes

    Steven Parkes March 19th, 2010 @ 02:32 PM

    Yeah, I know there's a URI spec. And there's also a URL spec. I was misremembering: it's not that there's isn't a spec for file URLs: that's actually described in the URL spec. It just doesn't do what I want: it doesn't say you can pass query params and fragments. I simply chose to ignore that (as the browsers do) and to conveniently forget the fact.

    So that does mean env.js has to decide what to do. I'd go with what browsers do but it turns out I'm not really using this as much as I used to ...

    FWIW, there are specs, either RFCs or IDs, for both about: and data: There was an ID for javascript: URIs that recently expired.

    Nothing really "violates" the URI spec, since it doesn't require much. It specs what characters can used (which changed with IRIs, if I recall) and the scheme part, but after that, it's up to the particular scheme. There are patterns that are shared by many schemes (the stuff that most URI libraries extract) but not all.

    Anyway, all this notwithstanding, your stuff is great. Whenever I get around to this for my own stuff (probably in the next few weeks), I'll definitely start by just putting a shim on top of your stuff. BSD, right? Just need to get it on github :).

    Thanks for the work!

  • nickg

    nickg March 19th, 2010 @ 02:48 PM

    Thanks, and it's MIT. And sure, I'm happy to post it to github if that's your pref!

    I only went with the python stuff cause, well, it was easiest for me. I have no particular love for it, so go nuts.

  • nickg

    nickg March 21st, 2010 @ 11:30 AM

    • State changed from “new” to “open”
    • Assigned user set to “nickg”

    @steve -- will post that code to github, I promise.

    @all

    I think this all the places that need consolidating.

    • img.src
    • a.href
    • window.location
    • document.location (and the entire Location object)
    • iframe.src
    • link tags (TBD)
  • nickg

    nickg March 21st, 2010 @ 05:28 PM

    found another

    • document.domain
  • nickg
  • nickg

    nickg March 24th, 2010 @ 06:44 PM

    • State changed from “open” to “resolved”

    closing for now, and will reopen for specific bugs/issues.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

a javascript browser environment

People watching this ticket

Tags

Referenced by

Pages