DEV Community

BC
BC

Posted on

Day35:Parse URL - 100DayOfRust

We can use url library to parse a URL:

Cargo.toml:

[dependencies]
url = "2.1.1"
Enter fullscreen mode Exit fullscreen mode

Example:

use url::Url;

fn main() {
    let url = "https://github.com:8443/rust/issues?labels=E-easy&state=open";
    let parsed = Url::parse(url).unwrap();
    println!("scheme: {}", parsed.scheme());
    println!("host: {}", parsed.host().unwrap());
    if let Some(port) = parsed.port() {
        println!("port: {}", port);
    }
    println!("path: {}", parsed.path());
    println!("query: {}", parsed.query().unwrap());
}
Enter fullscreen mode Exit fullscreen mode

Run the code:

scheme: https
host: github.com
port: 8443
path: /rust/issues
query: labels=E-easy&state=open
Enter fullscreen mode Exit fullscreen mode

One thing I found that I am not sure if it is a bug in this url library: if I change the url to be:

https://github.com:443/rust/issues?labels=E-easy&state=open
Enter fullscreen mode Exit fullscreen mode

The port part will return None instead of 443. Seems this library ignored port if the port is the standard one with the protocol.

This is different behavior like Python:

from urllib.parse import urlparse
parsed = urlparse("https://github.com:443/rust/issues?labels=E-easy&state=open")
print(parsed.port)
# this will print 443 out
Enter fullscreen mode Exit fullscreen mode

In this case, if we still want to get 443, instead of using port method, we should use port_or_known_default:

    if let Some(port) = parsed.port_or_known_default() {
        println!("port: {}", port);
    }
Enter fullscreen mode Exit fullscreen mode

But the caveat is, with this method, even we don't have ":443" in the url, it will still print out 443.

[Update] Apparently this port behavior is designed on purpose, according to the source code here:

/// Note that default port numbers are never reflected by the serialization,
/// use the port_or_known_default() method if you want a default port number returned.

But I still prefer that if the url contains the port number explicitly, no matter it is the standard port, the port should always print it out, return None is just weird.

Top comments (0)