UrlRewriteFilter 4.0.3

Examples

Redirect one url


    <rule>
        <from>^/some/old/page\.html$</from>
        <to type="redirect">/very/new/page.html</to>
    </rule>

    <rule match-type="wildcard">
        <from>/some/old/page.html</from>
        <to type="redirect">/very/new/page.html</to>
    </rule>

Redirect docs in a directory to another using wildcard engine.


    <urlrewrite default-match-type="wildcard">

    <rule>
        <from>/some/old/*.doc</from>
        <to type="redirect">/very/new/$1.doc</to>
    </rule>

    </urlrewrite>

Tiny/Freindly url


    <rule>
        <from>^/zebra$</from>
        <to type="redirect">/big/ugly/url/1,23,56,23132.html</to>
    </rule>

Default page as another (requests to / will be redirected)


    <rule>
        <from>^/$</from>
        <to type="redirect">/opencms/opencms/index.html</to>
    </rule>

Perform security checks in a centralised place


    <rule>
        <condition type="user-in-role" operator="notequal">admin</condition>
        <condition type="user-in-role" operator="notequal">bigboss</condition>
        <from>^/admin/(.*)$</from>
        <to>/go-away-please.html</to>
    </rule>

Check that users are using the correct domain name to get to your site. ie, users gong to http://example.com/blah will be redirected to http://www.example.com/blah


    <rule>
        <name>Domain Name Check</name>
        <condition name="host" operator="notequal">www.example.com</condition>
        <from>^(.*)$</from>
        <to type="redirect">http://www.example.com/context$1</to>
    </rule>

Disable access to a directory.


    <rule>
        <name>Disable Directory</name>
        <from>^/notliveyet/.*$</from>
        <set type="status">403</set>
        <to>null</to>
    </rule>

Redirect a directory (for moved content)


    <rule>
        <from>^/some/olddir/(.*)$</from>
        <to type="redirect">/very/newdir/$1</to>
    </rule>

Clean a URL


    <rule>
        <from>^/products/([0-9]+)$</from>
        <to>/products/index.jsp?product_id=$1</to>
    </rule>

e.g. /products/1234 will be passed on to /products/index.jsp?product_id=1234 without the user noticing.


    <rule>
        <from>^/world/([a-z]+)/([a-z]+)$</from>
        <to>/world.jsp?country=$1&amp;city=$2</to>
    </rule>

e.g. /world/unitedstates/newyork will be passed on to /world.jsp?country=unitedstates&city=newyork

Browser detection


    <rule>
        <condition name="user-agent">Mozilla/[1-4]</condition>
        <from>^/some/page\.html$</from>
        <to>/some/page-for-old-browsers.html</to>
    </rule>

e.g. will pass the request for /some/page.html on to /some/page-for-old-browsers.html only for older browsers whose user agent strings match Mozilla/1, Mozilla/2, Mozilla/3 or Mozilla/4.

Security. Preclude certain types of method from you web application.


    <rule>
        <condition type="method" next="or">PROPFIND</condition>
        <condition type="method">PUT</condition>
        <from>.*</from>
        <to type="redirect">/bad-method.html</to>
    </rule>

Sunday Specials


    <rule>
        <condition type="dayofweek">1</condition>
        <from>^/products/$</from>
        <to>/products/sunday-specials.html</to>
    </rule>

Set the "Cache-Control" HTTP response header for all requests


    <rule>
        <from>.*</from>
        <set type="response-header" name="Cache-Control">max-age=3600, must-revalidate</set>
    </rule>

Forward a request to a servlet


    <rule>
        <from>^/products/purchase$</from>
        <to>/servlets/ProductsServlet</to>
        <set name="action">purchase</set>
    </rule>

e.g. the request /products/purchase will be forwarded to /servlets/ProductsServlet and inside the servlet request.getAttribute("action") will return purchase.

Set an "Expires" HTTP header 6 hours into the future for js, css and gif files


    <rule>
        <from>^.*\.(js|css|gif)$</from>
        <set type="expires">6 hours</set>
    </rule>

Hide jsessionid for requests from googlebot.


  <outbound-rule>
       <name>Strip URL Session ID's</name>
       <note>
           Strip ;jsession=XXX from urls passed through
response.encodeURL().
           The characters ? and # are the only things we can use to
find out where the jsessionid ends.
           The expression in 'from' below contains three capture
groups, the last two being optional.
               1, everything before ;jesessionid
               2, everything after ;jesessionid=XXX starting with a ?
(to get the query string) up to #
               3, everything ;jesessionid=XXX and optionally ?XXX
starting with a # (to get the target)
           eg,
           from index.jsp;jsessionid=sss?qqq to index.jsp?qqq
           from index.jsp;jsessionid=sss?qqq#ttt to index.jsp?qqq#ttt
           from index.jsp;jsessionid=asdasdasdsadsadasd#dfds -
index.jsp#dfds
           from u.jsp;jsessionid=wert.hg - u.jsp
           from /;jsessionid=tyu - /
       </note>
       <condition name="user-agent">googlebot</condition>
       <from>^(.*?)(?:\;jsessionid=[^\?#]*)?(\?[^#]*)?(#.*)?$</from>
       <to>$1$2$3</to>
   </outbound-rule>

Permanently redirect incoming URLs containing jsessionid.


  <rule>
       <name>Strip URL Session ID's</name>
       <note>
           Strip ;jsession=XXX from urls passed through
response.encodeURL().
           The characters ? and # are the only things we can use to
find out where the jsessionid ends.
           The expression in 'from' below contains three capture
groups, the last two being optional.
               1, everything before ;jesessionid
               2, everything after ;jesessionid=XXX starting with a ?
(to get the query string) up to #
               3, everything ;jesessionid=XXX and optionally ?XXX
starting with a # (to get the target)
           eg,
           from index.jsp;jsessionid=sss?qqq to index.jsp?qqq
           from index.jsp;jsessionid=sss?qqq#ttt to index.jsp?qqq#ttt
           from index.jsp;jsessionid=asdasdasdsadsadasd#dfds -
index.jsp#dfds
           from u.jsp;jsessionid=wert.hg - u.jsp
           from /;jsessionid=tyu - /
       </note>
       <condition type="requested-session-id-from-url" operator="equal">true</condition>
       <from>^(.*?)(?:\;jsessionid=[^\?#]*)?(\?[^#]*)?(#.*)?$</from>
       <to type="permanent-redirect">$1$2$3</to>
   </rule>

Method Invocation

The standard servlet mapping that is done via web.xml is rather limiting. Only *.xxx or /xxxx/*, no abilty to have any sort of smart matching. Using UrlRewriteFilter any rule when matched can be set to run method(s) on a class.

Invoke a servlet directly


    <rule>
    <from>^/products/purchase$</from>
    <run class="com.blah.web.MyServlet" method="doGet" />
    </rule>

This will invoke doGet(HttpServletRequest request, HttpServletResponse response) when the "from" is matched on a request. (remeber this method needs to be public!)

Use it to delagate cleanly to your methods


    <rule>
        <from>^/pref-editor/addresses$</from>
        <run class="com.blah.web.PrefsServlet" method="runAddresses" />
    </rule>
    <rule>
        <from>^/pref-editor/phone-nums$</from>
        <run class="com.blah.web.PrefsServlet" method="runPhoneNums" />
    </rule>

Browser based delagation to your methods


    <rule>
        <condition name="user-agent">Mozilla/[1-4]</condition>
        <from>^/content/.*$</from>
        <run class="com.blah.web.ContentServlet" method="runForOldBrowsers" />
    </rule>
    <rule>
        <condition name="user-agent" operator="notequal">Mozilla/[1-4]</condition>
        <from>^/content/.*$</from>
        <run class="com.blah.web.GeneralServlet" method="runRobotMonitor" />
        <run class="com.blah.web.ContentServlet" method="runForNewBrowsers" />
    </rule>

When the method specified in the "run" is invoked it has full control over the request and response as if it were a servlet.

URL Abstraction

Both incoming request and embedded links in JSP's can be rewritten allowing full URL abstraction.


    <rule>
        <from>^/tidy/page$</from>
        <to>/old/url/scheme/page.jsp</to>
    </rule>
    <outbound-rule>
        <from>^/old/url/scheme/page.jsp$</from>
        <to>/tidy/page</to>
    </outbound-rule>

Any incoming requests for /tidy/page will be transparently forwarded to /old/url/scheme/page.jsp.

If you use JSTL your JSP page would have something like:

<a href="<c:url value="/old/url/scheme/page.jsp"/>">some link</a>

This will be rewritten upon output to:

<a href="/tidy/page">some link</a>

Or if you use standard JSP:

<a href="<%= response.encodeURL("/old/url/scheme/page.jsp") %>">some link</a>

Will generate output like:

<a href="/tidy/page">some link</a>

mod_rewrite vs UrlRewriteFilter

Examples of mod_rewrite style conf vs UrlRewriteFilter conf are below, there are all examples copied directly from Apache 2.0's official rewrite guide.



<rule>
    <name>Canonical URLs</name>
    <note>
        On some webservers there are more than one URL for a resource. Usually there are canonical URLs (which
        should be actually used and distributed) and those which are just shortcuts, internal ones, etc. Independent
        of which URL the user supplied with the request he should finally see the canonical one only.

        We do an external HTTP redirect for all non-canonical URLs to fix them in the location view of the Browser
        and for all subsequent requests. In the example ruleset below we replace /~user by the canonical /u/user and
        fix a missing trailing slash for /u/user.

        RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2 [R]
        RewriteRule ^/([uge])/([^/]+)$ /$1/$2/ [R]
    </note>
    <from>^/~([^/]+)/?(.*)</from>
    <to type="redirect">/u/$1/$2</to>
</rule>
<rule>
    <from>^/([uge])/([^/]+)$</from>
    <to type="redirect">/$1/$2/</to>
</rule>


<rule>
    <name>Canonical Hostnames</name>
    <note>
        The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which
        may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of
        example.com, you might use a variant of the following recipe.

        RewriteCond %{HTTP_HOST} !^fully\.qualified\.domain\.name [NC]
        RewriteCond %{HTTP_HOST} !^$
        RewriteRule ^/(.*) http://fully.qualified.domain.name/$1 [L,R]
    </note>
    <condition name="host" operator="notequal">^fully\.qualified\.domain\.name</condition>
    <condition name="host" operator="notequal">^$</condition>
    <from>^/(.*)</from>
    <to type="redirect" last="true">http://fully.qualified.domain.name/$1</to>
</rule>


<rule>
    <name>Moved DocumentRoot</name>
    <note>
        Usually the DocumentRoot of the webserver directly relates to the URL "/". But often this data is not
        really of top-level priority, it is perhaps just one entity of a lot of data pools. For instance at our
        Intranet sites there are /e/www/ (the homepage for WWW), /e/sww/ (the homepage for the Intranet) etc. Now
        because the data of the DocumentRoot stays at /e/www/ we had to make sure that all inlined images and other
        stuff inside this data pool work for subsequent requests.

        We just redirect the URL / to /e/www/. While is seems trivial it is actually trivial with mod_rewrite, only.
        Because the typical old mechanisms of URL Aliases (as provides by mod_alias and friends) only used prefix
        matching. With this you cannot do such a redirection because the DocumentRoot is a prefix of all URLs.
        With mod_rewrite it is really trivial:

        RewriteRule ^/$ /e/www/ [R]
    </note>
    <from>^/$</from>
    <to type="redirect">/e/www/</to>
</rule>


<rule>
    <name>Trailing Slash Problem</name>
    <note>
        Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories.
        If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then
        the server searches for a file named foo. And because this file is a directory it complains. Actually it
        tries to fix it itself in most of the cases, but sometimes this mechanism need to be emulated by you. For
        instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

        The solution to this subtle problem is to let the server add the trailing slash automatically. To do this
        correctly we have to use an external redirect, so the browser correctly requests subsequent images etc. If
        we only did a internal rewrite, this would only work for the directory page, but would go wrong when any
        images are included into this page with relative URLs, because the browser would request an in-lined object.
        For instance, a request for image.gif in /~quux/foo/index.html would become /~quux/image.gif without the
        external redirect!
    </note>
    <from>^/~quux/foo$</from>
    <to type="redirect">/~quux/foo/</to>
</rule>


<rule>
    <name>Move Homedirs to Different Webserver</name>
    <note>
        Many webmasters have asked for a solution to the following situation: They wanted to redirect just all
        homedirs on a webserver to another webserver. They usually need such things when establishing a newer
        webserver which will replace the old one over time.

        The solution is trivial with mod_rewrite (and UrlRewriteFilter). On the old webserver we just redirect all
        /~user/anypath URLs to http://newserver/~user/anypath.

        RewriteRule ^/~(.+) http://newserver/~$1 [R,L]
    </note>
    <from>^/~(.+)</from>
    <to type="redirect" last="true">http://newserver/~$1</to>
</rule>


<rule>
    <name>Structured Homedirs</name>
    <note>
        Some sites with thousands of users usually use a structured homedir layout, i.e. each homedir is in a
        subdirectory which begins for instance with the first character of the username. So, /~foo/anypath is
        /home/f/foo/.www/anypath while /~bar/anypath is /home/b/bar/.www/anypath.

        We use the following ruleset to expand the tilde URLs into exactly the above layout.

        RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2/$1/.www$3
    </note>
    <from>^/~(([a-z])[a-z0-9]+)(.*)</from>
    <to>/home/$2/$1/.www$3</to>
</rule>


<rule>
    <name>Redirect Homedirs For Foreigners</name>
    <note>
        We want to redirect homedir URLs to another webserver www.somewhere.com when the requesting user does not
        stay in the local domain ourdomain.com. This is sometimes used in virtual host contexts.

        Just a rewrite condition:

        RewriteCond %{REMOTE_HOST} !^.+\.ourdomain\.com$
        RewriteRule ^(/~.+) http://www.somewhere.com/$1 [R,L]
    </note>
    <condition name="host">!^.+\.ourdomain\.com$</condition>
    <from>^(/~.+)</from>
    <to type="redirect" last="true">http://www.somewhere.com/$1</to>
</rule>


<rule>
    <name>Time-Dependent Rewriting</name>
    <note>
        When tricks like time-dependent content should happen a lot of webmasters still use CGI scripts which do for
        instance redirects to specialized pages. How can it be done via mod_rewrite?

        There are a lot of types in conjunction with operators we can do time-dependent redirects:

        RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700
        RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900
        RewriteRule ^foo\.html$ foo.day.html
        RewriteRule ^foo\.html$ foo.night.html
    </note>
    <condition type="hourofday" operator="greater">7</condition>
    <condition type="hourofday" operator="less">19</condition>
    <from>^foo\.html$</from>
    <to>foo.day.html</to>
</rule>
<rule>
    <from>^foo\.html$</from>
    <to>foo.night.html</to>
</rule>


<rule>
    <name></name>
    <note>
        Assume we have recently renamed the page foo.html to bar.html and now want to provide the old URL for
        backward compatibility. Actually we want that users of the old URL even not recognize that the pages was
        renamed.

        We rewrite the old URL to the new one internally via the following rule:

        RewriteBase /~quux/
        RewriteRule ^foo\.html$ bar.html
    </note>
    <from>^/~quux/foo\.html$</from>
    <to>/~quux/bar.html</to>
</rule>


<rule>
    <name>From Old to New (extern)</name>
    <note>
        Assume again that we have recently renamed the page foo.html to bar.html and now want to provide the old URL
        for backward compatibility. But this time we want that the users of the old URL get hinted to the new one,
        i.e. their browsers Location field should change, too.

        We force a HTTP redirect to the new URL which leads to a change of the browsers and thus the users view:

        RewriteBase /~quux/
        RewriteRule ^foo\.html$ bar.html [R]
    </note>
    <from>^/~quux/foo\.html$</from>
    <to type="redirect">/~quux/bar.html</to>
</rule>


<rule>
    <name>Browser Dependent Content</name>
    <note>
        At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent
        content, i.e. one has to provide a maximum version for the latest Netscape variants, a minimum version for
        the Lynx browsers and a average feature version for all others.

        We cannot use content negotiation because the browsers do not provide their type in that form. Instead we
        have to act on the HTTP header "User-Agent". The following condig does the following: If the HTTP header
        "User-Agent" begins with "Mozilla/3", the page foo.html is rewritten to foo.NS.html and and the rewriting
        stops. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes foo.20.html. All other
        browsers receive page foo.32.html. This is done by the following ruleset:

        RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.*
        RewriteRule ^foo\.html$ foo.NS.html [L]

        RewriteCond %{HTTP_USER_AGENT} ^Lynx/.* [OR]
        RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12].*
        RewriteRule ^foo\.html$ foo.20.html [L]

        RewriteRule ^foo\.html$ foo.32.html [L]
    </note>
    <condition name="user-agent">^Mozilla/3.*</condition>
    <from>^foo\.html$</from>
    <to last="true">foo.NS.html</to>
</rule>
<rule>
    <condition name="user-agent" next="or">^Lynx/.*</condition>
    <condition name="user-agent">^Mozilla/[12].*</condition>
    <from>^foo\.html$</from>
    <to last="true">foo.20.html</to>
</rule>
<rule>
    <from>^foo\.html$</from>
    <to last="true">foo.32.html</to>
</rule>


<rule>
    <name>From Static to Dynamic</name>
    <note>
        How can we transform a static page foo.html into a dynamic variant foo.cgi in a seamless way, i.e. without
        notice by the browser/user.

        We just rewrite the URL to the jsp/servlet and force the correct MIME-type so it gets really run as
        a CGI-script. This way a request to /~quux/foo.html internally leads to the invocation of /~quux/foo.jsp.

        RewriteBase /~quux/
        RewriteRule ^foo\.html$ foo.cgi [T=application/x-httpd-cgi]
    </note>
    <from>^/~quux/foo\.html$</from>
    <to>/~quux/foo.jsp</to>
</rule>

<rule>
    <name>Blocking of Robots</name>
    <note>
        How can we block a really annoying robot from retrieving pages of a specific webarea? A /robots.txt file
        containing entries of the "Robot Exclusion Protocol" is typically not enough to get rid of such a robot.

        We use a ruleset which forbids the URLs of the webarea /~quux/foo/arc/ (perhaps a very deep directory
        indexed area where the robot traversal would create big server load). We have to make sure that we forbid
        access only to the particular robot, i.e. just forbidding the host where the robot runs is not enough.
        This would block users from this host, too. We accomplish this by also matching the User-Agent HTTP header
        information.

        RewriteCond %{HTTP_USER_AGENT} ^NameOfBadRobot.*
        RewriteCond %{REMOTE_ADDR} ^123\.45\.67\.[8-9]$
        RewriteRule ^/~quux/foo/arc/.+ - [F]
    </note>
    <condition name="user-agent">^NameOfBadRobot.*</condition>
    <condition type="remote-addr">^123\.45\.67\.[8-9]$</condition>
    <from>^/~quux/foo/arc/.+</from>
    <set type="status">403</set>
    <to>null</to>
</rule>


<rule>
    <name>Blocked Inline-Images</name>
    <note>
        Assume we have under http://www.quux-corp.de/~quux/ some pages with inlined GIF graphics. These graphics are
        nice, so others directly incorporate them via hyperlinks to their pages. We don't like this practice because
        it adds useless traffic to our server.

        While we cannot 100% protect the images from inclusion, we can at least restrict the cases where the browser
        sends a HTTP Referer header.

        RewriteCond %{HTTP_REFERER} !^$
        RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
        RewriteRule .*\.gif$ - [F]
    </note>
    <condition name="referer" operator="notequal">^$</condition>
    <condition name="referer" operator="notequal">^http://www.quux-corp.de/~quux/.*$</condition>
    <from>.*\.gif$</from>
    <set type="status">403</set>
    <to>null</to>
</rule>

<rule>
    <name>Blocked Inline-Images example 2</name>
    <note>
        RewriteCond %{HTTP_REFERER} !^$
        RewriteCond %{HTTP_REFERER} !.*/foo-with-gif\.html$
        RewriteRule ^inlined-in-foo\.gif$ - [F]
    </note>
    <condition name="referer" operator="notequal">^$</condition>
    <condition name="referer" operator="notequal">.*/foo-with-gif\.html$</condition>
    <from>^inlined-in-foo\.gif$</from>
    <set type="status">403</set>
    <to>null</to>
</rule>

Extending UrlRewriteFilter

To extend the basic rule functionality make a subclass of org.tuckey.web.filters.urlrewrite.extend.RewriteRule

For an example of an extended rule see:
src/java/org/tuckey/web/filters/urlrewrite/sample/SampleRewriteRule.java

A "class-rule" is configured by adding it into urlrewrite.xml as follows:

<class-rule class="org.tuckey.web.filters.urlrewrite.TestRuleObj"/>

Or for a method with a name other than "matches":

<class-rule class="org.tuckey.web.filters.urlrewrite.TestRuleObj" method="trial"/>