Understanding URL Structure: A Developer's Guide

What Is a URL?

A URL (Uniform Resource Locator) is the address of a resource on the web. Every URL follows a structured format defined by RFC 3986. Understanding each component helps you debug network issues, build correct links, handle redirects, and write better API integrations.

The Anatomy of a URL

A full URL looks like this: https://user:pass@api.example.com:8080/v1/users?page=2&sort=name#results

This breaks down into the following components.

Protocol (scheme): The part before the colon and double slash. Common values are https, http, ftp, ws, and wss. The browser and servers use this to determine how to handle the connection.

Username and password: Rarely used in modern URLs, but the format user:pass@ places credentials before the hostname. Basic auth this way is insecure and should be avoided in practice.

Hostname: The domain or IP address of the server. This is what DNS resolves to find the server. Subdomains (api.example.com) are part of the hostname.

Port: An optional number after the hostname separated by a colon. If omitted, browsers default to 443 for HTTPS and 80 for HTTP. Common API ports include 8080, 3000, and 4000 in development.

Pathname: Everything after the hostname and port, up to the question mark. It identifies the specific resource on the server. In REST APIs this often represents a resource path like /v1/users/42.

Query string (search): Everything from the question mark to the hash. Key-value pairs separated by ampersands encode additional parameters. Spaces become %20 or +, special characters are percent-encoded.

Hash (fragment): Everything after the hash symbol. This is processed entirely by the browser and is never sent to the server. It typically scrolls the page to an element with a matching id, or in single-page applications it drives client-side routing.

Origin: The combination of protocol + hostname + port. The browser enforces the same-origin policy based on the origin, which determines which cross-origin requests require CORS headers.

Query Parameters in Detail

The query string encodes parameters as key=value pairs joined by ampersands. A URL can have multiple parameters: ?name=Alice&age=30&city=London.

Values must be percent-encoded when they contain characters outside the safe set. A space becomes %20, a forward slash becomes %2F, and an at sign becomes %40. JavaScript's encodeURIComponent handles this encoding automatically.

The order of query parameters is technically irrelevant -- servers should treat ?a=1&b=2 the same as ?b=2&a=1 -- but some systems are order-sensitive in practice.

URL Encoding

Characters that have special meaning in a URL (like ?, &, =, #) must be encoded when they appear as literal values inside a parameter. Without encoding, a value containing & would be misread as a parameter separator.

JavaScript provides two functions: encodeURIComponent encodes a single value (encodes ?, &, =, #, and /), while encodeURI encodes a full URL but leaves structural characters intact.

Relative vs Absolute URLs

An absolute URL includes the full scheme and host: https://example.com/page. A relative URL is resolved against a base: /page resolves against the current origin, and ../other resolves relative to the current path. Understanding relative URL resolution prevents broken links when sites move between domains or paths.

Practical Tips

Always validate URLs before using them in your code. The browser's built-in URL API throws an error on invalid URLs, making it a reliable validator. Use new URL(input) and catch the exception rather than writing a regex.

When logging URLs, be careful about leaking credentials in the username:password position or sensitive tokens in query parameters. Logging infrastructure may store or transmit these values in plain text.

For SEO, canonical URLs tell search engines which version of a page is authoritative when duplicate content exists across multiple URLs (with and without trailing slash, with and without www, HTTP vs HTTPS).