When is absoluteUri used from the http request specs?
I've been looking into the HttpServletRequest API (Java) which has the getRequestURI and getRequestURL methods. That made me look into: https://tools.ietf.org/html/rfc7230#section-5.3 As I understand getRequestURI returns the value from the first line of the http request which is a relative path to the resource most of the time unless the origin server is behind an inbound proxy in which case it must be an absolute URI. I guess most origin servers of the popular websites on the internet belong to that category which means that the URI in the raw http request should be the absoluteUri (from the http specs) but I haven't managed to find an example of this anywhere. Can a browser really know if it sends its requests to an inbound proxy or directly to the origing server? Is there any practical value in that absoluteUri concept in the http specs? Because the Host header field is always sent in HTTP 1.1 requests. Did that part of the specification had some practical value in the time of HTTP 1.0 when there was no Host header field yet?
I think you might be confused about the type of proxy being discussed. It looks like the RFC is referring to a forward-proxy where you make a request to a different server via another one (and the client tells the proxy where to forward traffic to).
With a reverse proxy, you're right, the client doesn't know that a request has been proxied to another server.
Difference between proxy server and reverse proxy server
From the http protocol 1.0 specs
The absoluteURI form is only allowed when the request is being made to a proxy. The proxy is requested to forward the request and return the response. If the request is GET or HEAD and a prior response is cached, the proxy may use the cached message if it passes any restrictions in the Expires header field. Note that the proxy may forward the request on to another proxy or directly to the server specified by the absoluteURI. In order to avoid request loops, a proxy must be able to recognize all of its server names, including any aliases, local variations, and the numeric IP address. An example Request-Line would be: GET /TheProject.html HTTP/1.0
The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case, only the absolute path of the URI is transmitted (see Section 3.2.1, abs_path). For example, a client wishing to retrieve the resource above directly from the origin server would create a TCP connection to port 80 of the host "www.w3.org" and send the line: GET /pub/WWW/TheProject.html HTTP/1.0 followed by the remainder of the Full-Request. Note that the absolute path cannot be empty; if none is present in the original URI, it must be given as "/" (the server root).
So yes there is a practical sense in all this but only if you know that your are actually posting to a proxy. The browser cannot really know that he is submitting information to a proxy but since this is the most common case this is why you always have the host and the uri attribute transmitted but not the explicit path. Modern (and not so modern) proxies reconstruct the URL from the host, protocol, port plus the URI
Take the example bellow
GET /standards/ HTTP/1.1
Host: www.w3.org
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.w3.org/
Connection: keep-alive
Upgrade-Insecure-Requests: 1
A proxy will reconstruct the URL the client used to make the request. The returned URL will contains a protocol, server name, port number, and server path.
In java omething similar is done. If you take a look at the servletapi specs you will see also the same behavior.
So as a rule of thumb, the absoluteURI form is only allowed when the request is being made to a proxy. The request is not necessarily from a browser but in case a proxy does not receive an absolute path it constructs the URL using the rest of the data in the header, similar to java's getURL.
Okay Daniel Scott has identified the source of my initial confusion. I will make a note of some points that weren't so clear to me and prevented me from understanding the specs correctly:
Also I want to say that I did an experiment which confirmed what is stated in the http specs.
I googled "free proxy ip and port", went to "https://www.hide-my-ip.com/proxylist.shtml" and configured windows to use a forward proxy (Control Panel -> Internet Options -> Connections -> Lan Settings -> "Use a proxy server..."). Then I made a request to www.bbc.com and examined the raw http request from the chrome console network tab the address in the Request-Line was absolute. Then I removed the proxy and made the same request again. The adress at the Request-Line was now just the path.
I'm not sure about the whole reconstruction of the url thing by a proxy which Alexius Diakogiannos is mentioning. It seems very logical that this is an option that most forward proxies have if the client does not send the absolute url but from what I can see, at least chrome, sends the absolute url to the proxy correctly when it realizes that it is behind it. Of course I have never managed/ran a forward proxy myself so I wouldn't know.
链接地址: http://www.djcxy.com/p/96418.html上一篇: 无法在node.js上设置代理