public class URI
extends java.lang.Object
URI character sequence: char octet sequence: byte original character sequence: String
So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte. URI Syntactic Components
- In general, written as follows: Absolute URI = <scheme>:<scheme-specific-part> Generic URI = <scheme>://<authority><path>?<query> - Syntax absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] abs_path = "/" path_segments
The following examples illustrate URI that are in common use.
ftp://ftp.is.co.za/rfc/rfc1808.txt -- ftp scheme for File Transfer Protocol services gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles -- gopher scheme for Gopher and Gopher+ Protocol services http://www.math.uio.no/faq/compression-faq/part1.html -- http scheme for Hypertext Transfer Protocol services mailto:mduerst@ifi.unizh.ch -- mailto scheme for electronic mail addresses news:comp.infosystems.www.servers.unix -- news scheme for USENET news groups and articles telnet://melvyl.ucop.edu/ -- telnet scheme for interactive services via the TELNET ProtocolPlease, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808). The expressions for a URI
For escaped URI forms - URI(char[]) // constructor - char[] getRawXxx() // method - String getEscapedXxx() // method - String toString() // method For unescaped URI forms - URI(String) // constructor - String getXXX() // method
Modifier and Type | Field and Description |
---|---|
protected static java.util.BitSet |
abs_path
URI absolute path.
|
protected static java.util.BitSet |
absoluteURI
BitSet for absoluteURI.
|
static java.util.BitSet |
allowed_abs_path
Those characters that are allowed for the abs_path.
|
static java.util.BitSet |
allowed_authority
Those characters that are allowed for the authority component.
|
static java.util.BitSet |
allowed_fragment
Those characters that are allowed for the fragment component.
|
static java.util.BitSet |
allowed_host
Those characters that are allowed for the host component.
|
static java.util.BitSet |
allowed_IPv6reference
Those characters that are allowed for the IPv6reference component.
|
static java.util.BitSet |
allowed_opaque_part
Those characters that are allowed for the opaque_part.
|
static java.util.BitSet |
allowed_query
Those characters that are allowed for the query component.
|
static java.util.BitSet |
allowed_reg_name
Those characters that are allowed for the reg_name.
|
static java.util.BitSet |
allowed_rel_path
Those characters that are allowed for the rel_path.
|
static java.util.BitSet |
allowed_userinfo
Those characters that are allowed for the userinfo component.
|
static java.util.BitSet |
allowed_within_authority
Those characters that are allowed for the authority component.
|
static java.util.BitSet |
allowed_within_path
Those characters that are allowed within the path.
|
static java.util.BitSet |
allowed_within_query
Those characters that are allowed within the query component.
|
static java.util.BitSet |
allowed_within_userinfo
Those characters that are allowed for within the userinfo component.
|
protected static java.util.BitSet |
alpha
BitSet for alpha.
|
protected static java.util.BitSet |
alphanum
BitSet for alphanum (join of alpha & digit).
|
protected static java.util.BitSet |
authority
BitSet for authority.
|
static java.util.BitSet |
control
BitSet for control.
|
static java.util.BitSet |
delims
BitSet for delims.
|
protected static java.util.BitSet |
digit
BitSet for digit.
|
static java.util.BitSet |
disallowed_opaque_part
Disallowed opaque_part before escaping.
|
static java.util.BitSet |
disallowed_rel_path
Disallowed rel_path before escaping.
|
protected static java.util.BitSet |
escaped
BitSet for escaped.
|
protected static java.util.BitSet |
fragment
BitSet for fragment (alias for uric).
|
protected static java.util.BitSet |
hex
BitSet for hex.
|
protected static java.util.BitSet |
hier_part
BitSet for hier_part.
|
protected static java.util.BitSet |
host
BitSet for host.
|
protected static java.util.BitSet |
hostname
BitSet for hostname.
|
protected static java.util.BitSet |
hostport
BitSet for hostport.
|
protected static java.util.BitSet |
IPv4address
Bitset that combines digit and dot fo IPv$address.
|
protected static java.util.BitSet |
IPv6address
RFC 2373.
|
protected static java.util.BitSet |
IPv6reference
RFC 2732, 2373.
|
protected static java.util.BitSet |
mark
BitSet for mark.
|
protected static java.util.BitSet |
net_path
BitSet for net_path.
|
protected static java.util.BitSet |
opaque_part
URI bitset that combines uric_no_slash and uric.
|
protected static java.util.BitSet |
param
BitSet for param (alias for pchar).
|
protected static java.util.BitSet |
path
URI bitset that combines absolute path and opaque part.
|
protected static java.util.BitSet |
path_segments
BitSet for path segments.
|
protected static java.util.BitSet |
pchar
BitSet for pchar.
|
protected static java.util.BitSet |
percent
The percent "%" character always has the reserved purpose of being the
escape indicator, it must be escaped as "%25" in order to be used as
data within a URI.
|
protected static java.util.BitSet |
port
Port, a logical alias for digit.
|
protected static java.util.BitSet |
query
BitSet for query (alias for uric).
|
protected static java.util.BitSet |
reg_name
BitSet for reg_name.
|
protected static java.util.BitSet |
rel_path
BitSet for rel_path.
|
protected static java.util.BitSet |
rel_segment
BitSet for rel_segment.
|
protected static java.util.BitSet |
relativeURI
BitSet for relativeURI.
|
protected static java.util.BitSet |
reserved
BitSet for reserved.
|
protected static java.util.BitSet |
scheme
BitSet for scheme.
|
protected static java.util.BitSet |
segment
BitSet for segment.
|
protected static java.util.BitSet |
server
Bitset for server.
|
static java.util.BitSet |
space
BitSet for space.
|
protected static java.util.BitSet |
toplabel
BitSet for toplabel.
|
protected static java.util.BitSet |
unreserved
Data characters that are allowed in a URI but do not have a reserved
purpose are called unreserved.
|
static java.util.BitSet |
unwise
BitSet for unwise.
|
protected static java.util.BitSet |
URI_reference
BitSet for URI-reference.
|
protected static java.util.BitSet |
uric
BitSet for uric.
|
protected static java.util.BitSet |
uric_no_slash
URI bitset for encoding typical non-slash characters.
|
protected static java.util.BitSet |
userinfo
Bitset for userinfo.
|
static java.util.BitSet |
within_userinfo
BitSet for within the userinfo component like user and password.
|
Constructor and Description |
---|
URI() |
Modifier and Type | Method and Description |
---|---|
protected static java.lang.String |
decode(char[] component,
java.lang.String charset)
Decodes URI encoded string.
|
protected static java.lang.String |
decode(java.lang.String component,
java.lang.String charset)
Decodes URI encoded string.
|
protected static char[] |
encode(java.lang.String original,
java.util.BitSet allowed,
java.lang.String charset)
Encodes URI string.
|
public static final java.util.BitSet within_userinfo
public static final java.util.BitSet control
public static final java.util.BitSet space
public static final java.util.BitSet delims
public static final java.util.BitSet unwise
public static final java.util.BitSet disallowed_rel_path
public static final java.util.BitSet disallowed_opaque_part
public static final java.util.BitSet allowed_authority
public static final java.util.BitSet allowed_opaque_part
public static final java.util.BitSet allowed_reg_name
public static final java.util.BitSet allowed_userinfo
public static final java.util.BitSet allowed_within_userinfo
public static final java.util.BitSet allowed_IPv6reference
public static final java.util.BitSet allowed_host
public static final java.util.BitSet allowed_within_authority
public static final java.util.BitSet allowed_abs_path
public static final java.util.BitSet allowed_rel_path
public static final java.util.BitSet allowed_within_path
public static final java.util.BitSet allowed_query
public static final java.util.BitSet allowed_within_query
public static final java.util.BitSet allowed_fragment
protected static final java.util.BitSet percent
protected static final java.util.BitSet digit
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
protected static final java.util.BitSet alpha
alpha = lowalpha | upalpha
protected static final java.util.BitSet alphanum
alphanum = alpha | digit
protected static final java.util.BitSet hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
protected static final java.util.BitSet escaped
escaped = "%" hex hex
protected static final java.util.BitSet mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
protected static final java.util.BitSet unreserved
unreserved = alphanum | mark
protected static final java.util.BitSet reserved
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final java.util.BitSet uric
uric = reserved | unreserved | escaped
protected static final java.util.BitSet fragment
fragment = *uric
protected static final java.util.BitSet query
query = *uric
protected static final java.util.BitSet pchar
pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final java.util.BitSet param
param = *pchar
protected static final java.util.BitSet segment
segment = *pchar *( ";" param )
protected static final java.util.BitSet path_segments
path_segments = segment *( "/" segment )
protected static final java.util.BitSet abs_path
abs_path = "/" path_segments
protected static final java.util.BitSet uric_no_slash
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
protected static final java.util.BitSet opaque_part
opaque_part = uric_no_slash *uric
protected static final java.util.BitSet path
path = [ abs_path | opaque_part ]
protected static final java.util.BitSet port
protected static final java.util.BitSet IPv4address
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
protected static final java.util.BitSet IPv6address
IPv6address = hexpart [ ":" IPv4address ]
protected static final java.util.BitSet IPv6reference
IPv6reference = "[" IPv6address "]"
protected static final java.util.BitSet toplabel
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
protected static final java.util.BitSet hostname
hostname = *( domainlabel "." ) toplabel [ "." ]
protected static final java.util.BitSet host
host = hostname | IPv4address | IPv6reference
protected static final java.util.BitSet hostport
hostport = host [ ":" port ]
protected static final java.util.BitSet userinfo
userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," )
protected static final java.util.BitSet server
server = [ [ userinfo "@" ] hostport ]
protected static final java.util.BitSet reg_name
reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" )
protected static final java.util.BitSet authority
authority = server | reg_name
protected static final java.util.BitSet scheme
scheme = alpha *( alpha | digit | "+" | "-" | "." )
protected static final java.util.BitSet rel_segment
rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," )
protected static final java.util.BitSet rel_path
rel_path = rel_segment [ abs_path ]
protected static final java.util.BitSet net_path
net_path = "//" authority [ abs_path ]
protected static final java.util.BitSet hier_part
hier_part = ( net_path | abs_path ) [ "?" query ]
protected static final java.util.BitSet relativeURI
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
protected static final java.util.BitSet absoluteURI
absoluteURI = scheme ":" ( hier_part | opaque_part )
protected static final java.util.BitSet URI_reference
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
protected static char[] encode(java.lang.String original, java.util.BitSet allowed, java.lang.String charset) throws org.apache.http.HttpException
original character sequence->octet sequence->URI character sequence
An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character. Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8. The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. The next step is to convert the UCS character code to the UTF-8 encoding. Mapping between vendor codepages can be done in a very similar manner as described above. The only time escape encodings can allowedly be made is when a URI is being created from its component parts. The escape and validate methods are internally performed within this method.
original
- the original character sequenceallowed
- those characters that are allowed within a componentcharset
- the protocol charsetorg.apache.http.HttpException
- null component or unsupported character encodingprotected static java.lang.String decode(char[] component, java.lang.String charset) throws org.apache.http.HttpException
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.
component
- the URI character sequencecharset
- the protocol charsetorg.apache.http.HttpException
- incomplete trailing escape pattern or unsupported
character encodingprotected static java.lang.String decode(java.lang.String component, java.lang.String charset) throws org.apache.http.HttpException
URI character sequence->octet sequence->original character sequence
A URI must be separated into its components before the escaped characters within those components can be allowedly decoded. Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading. The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI. The unescape method is internally performed within this method.
component
- the URI character sequencecharset
- the protocol charsetorg.apache.http.HttpException
- incomplete trailing escape pattern or unsupported
character encoding