Website Internationalization (i18n): The 2026 Guide
Before you can translate your site into any language you must start by internationalizing it carefully. In this article, we cover the 10 technical decisions you must make for a successful internationalization.
2026-04-22
- What is website internationalization (i18n)?
- Internationalization vs. localization vs. globalization
- Why website internationalization matters in 2026
- Is your website already internationalized? A quick checklist
- The 10 technical decisions to make before you implement
- How the internationalization (i18n) process actually works
- Key takeaways
What is website internationalization (i18n)
Website internationalization (i18n) is the process of preparing a website to support different languages and regional standards. Without it, your site’s structure makes it impossible to translate your website.
What does “i18n” mean?
“i18n” is the abbreviation of “website internationalization.” The “i” is of internationalization; “18” represents the 18 letters between “i” and “n.” The same pattern gives us l10n for localization, g11n for globalization, and a11y for accessibility. The convention started in software engineering in the 1980s and stuck because “internationalization” is tedious to type, harder to read, and easy to misspell.
You can consider the following analogy.
A plug designed only for American outlets doesn’t work in Prague. You can buy an adapter, but that’s a hack; it adds bulk, sometimes melts, and you’re always one trip away from being stuck. Internationalization is designing the device to accept any voltage, any socket, from the start. Localization is plugging it in wherever you are.
In plain terms, a website is internationalized if adding a new language is a matter of translating content and configuration—not rewriting code, redesigning layouts, or migrating databases.
Internationalization vs. localization vs. globalization
| Term | What it is | Who does it | When it happens |
|---|---|---|---|
| Internationalization i18n | Preparing the codebase, architecture, and design to support multiple languages and regions | Developers, designers, architects | Before any specific market launch Ideally from day one |
| Localization l10n | Adapting the content, visuals, and experience for a specific target market | Translators, regional marketers, designers, QA | For each market, repeatedly As you expand |
| Globalization g11n | The overall business strategy of operating across multiple markets | Product, marketing, legal, operations, leadership | Ongoing — set up front Revisited as strategy evolves |
The order matters.
Internationalization should come before localization. You can localize a site that hasn’t been properly internationalized (teams do it all the time), but the result is usually messy, brittle, and expensive to maintain.
A properly internationalized site can be localized in days. One that isn’t, can take months of engineering work before localization can even start, and every new market compounds the debt.
Globalization sits above both. It’s the decision to go international in the first place, and it shapes which markets get prioritized, which languages come first, what the quality bar looks like, and how much you invest in each region.
- Globalization answers “Should we do this?”
- Internationalization answers “Can we do this?”
- Localization answers “What does it look and feel like in this market?”
A real example is Netflix.
When they decided to expand from a regional streaming service into a global one, that was globalization: the strategic choice to operate across international markets.
To make that possible, Netflix invested in internationalization. It built shared language infrastructure such as a global string repository, internal internationalization tooling, pseudo-localization testing, and search systems tuned for additional languages.
Once that foundation was in place, localization became the repeatable, market-by-market work, i.e., translating the interface, creating subtitles and dubbing, adapting discovery and language options, and improving the experience for viewers in each locale.
When Netflix later deepened support for more languages or launched in additional markets, it was able to build on that existing internationalization layer rather than re-engineer the product from scratch.
At times, you may see them grouped differently. For instance, in the localization industry, “localization” is often used as an umbrella term that includes translation and internationalization.
For our purposes, the most useful framing is the one provided earlier: globalization is the strategic decision, internationalization is the technical preparation, and localization is the market-level execution.
Why website internationalization matters in 2026
The business case for going multilingual isn’t new, but the urgency is higher than ever. The most compelling data still comes from CSA Research’s Can’t Read, Won’t Buy study, which surveyed 8,709 consumers across 29 countries. Its findings were clear:
- 76% of online shoppers prefer to buy products with information in their own language.
- 40% won’t buy from websites in other languages at all.
CSA has since expanded its dataset to roughly 9,909 consumers across 33 markets, reinforcing rather than reversing those numbers.
Even more recently (May 2025), in a separate analysis CSA added useful context: only 17 languages each account for at least 1% of global online GDP. In plaint terms, it means that digital spending is concentrated in a relatively small set of languages, but one that no English-only business has genuinely covered.
Two shifts since that original research make the case even stronger in 2026.
1. AI translation and machine translation are no longer the bottlenecks. AI quality has improved dramatically, to the point where translation itself isn’t the main obstacle for many content types.
Practically, it means that for most teams today, the real obstacle is the underlying architecture. Teams that built internationalization into their codebase early can launch in a new market in weeks. Teams that didn’t are still facing the same retrofit project they’ve been putting off for years.
2. Multilingual support has become a baseline expectation. Offering a translated website used to be a competitive advantage. In many industries today (particularly European SaaS, enterprise software, and regulated sectors) it’s simply expected.
Put simply: a multilingual site isn’t a question of if; it’s when. And the longer you wait, the more it costs to get there—while competitors who already made the investment quietly take the markets you haven’t reached yet.
Is your website already internationalized? A quick checklist
<!-- ⚠ All text below is hard-coded. None of it can be translated
without a developer manually editing this file for each language. -->
<div class="ui-section-hero–content">
<h1>Design better.</h1> <!-- ❌ hard-coded -->
<p class="ui-text-intro">
Design Mobile UI faster and better with our product and produce
professional designs for your business <!-- ❌ hard-coded -->
</p>
<div class="ui-component-cta ui-layout-flex">
<form action="#" class="ui-component-form ui-layout-grid ui-layout-column-4">
<input type="email"
placeholder="Email" <!-- ❌ hard-coded, placeholder won't translate -->
class="ui-component-input ui-component-input-medium"
required>
<button type="submit"
class="ui-component-button ui-component-button-medium ui-component-button-primary">
Join Waitlist <!-- ❌ hard-coded -->
</button>
</form>
<p class="ui-text-note"><small>Available on Android and iOS.</small></p> <!-- ❌ hard-coded -->
<p class="ui-text-free">Free for 3 months</p> <!-- ❌ hard-coded -->
</div>
</div>
But a single instance alone is not enough. After all, that could be a minor leftover issue. To be sure, run through the checklist below. Each item is a yes/no question. Any “no” is a retrofit project waiting to happen.
Text externalization
- All user-facing text is externalized into systems or files that can be translated without editing application code. A translator can produce a new language version without a developer involved.
Encoding consistency
- UTF-8 is used consistently across HTML, HTTP headers, database storage, and file system storage. No Latin-1, no Windows-1252, no legacy encoding anywhere in the stack.
URL structure
- The URL structure supports multiple language versions. Subdirectories, subdomains, or ccTLDs — not a single domain serving all languages from the same URLs.
- Each language version has its own distinct, crawlable URL that search engines can discover and index separately. Not the same URL serving different content based on headers or cookies.
Locale-aware formatting
- Dates, times, numbers, and currencies are formatted through the Intl API or a locale-aware library, not built by hand. No string concatenation that assumes MM/DD/YYYY or comma thousand separators.
Layout flexibility
- Layouts are flexible enough to absorb 35% text expansion without breaking. No fixed-width text containers, no button labels that assume four-letter words.
- Images don't contain text that needs translation. Any text overlay is HTML/CSS on top of the image, not baked into the image file.
Architecture and codebase
- The CMS, framework, or routing layer can serve and manage different locale versions without code duplication. One codebase, many language versions — not one codebase per language.
- RTL support is at least architecturally possible. CSS uses logical properties (margin-inline-start, not margin-left), or the codebase has a clear path to adopting them. Directional icons are identified.
Full surface coverage and workflow
- Emails, error messages, validation text, metadata, and legal copy are all externalized. Not just the main site copy but the entire user-facing surface area.
- Translation workflow is not copy-paste. There's an export format (JSON, XLIFF, or .po) or a TMS integration — not a spreadsheet being emailed to translators.
A site that ticks all ten is internationalized. A site that ticks seven or eight is in good shape but has a short punch list to close. A site that ticks four or fewer hasn’t really been internationalized yet; it’s been translated, or it hasn’t been touched at all, and the next expansion is going to be expensive.
The 10 technical decisions to make before you implement
Internationalization involves a series of decisions your team has to make before a single line of localizable code gets written. Get them right and adding new languages later is a matter of days. Get them wrong and every new market means retrofitting architecture you already shipped.
1. URL Structure
This is the one decision you can’t easily reverse. Your URL structure tells search engines which version of your site to show which users, determines how SEO authority is distributed across your markets, and shapes the operational complexity of everything downstream.
Migrating from one structure to another is possible but quite painful (expect ranking volatility for months). You have four realistic options:
This option means different domains for each market (e.g., yourbrand.de, yourbrand.fr, yourbrand.co.uk). It’s the strongest possible geotargeting signal. Users see a local domain and trust it more, and Google treats it as a site intended for that country.
But it will cost more. Each ccTLD is effectively a separate website. Separate SEO authority, separate content strategy, separate technical infrastructure, separate certificates, and (for some ccTLDs) local registration requirements. For example, a .de domain requires a German administrative contact, and a .cn requires a Chinese business license.
It looks a bit cleaner, allows regional hosting, and lets different teams run different regional sites without stepping on each other.
Google’s position, stated repeatedly by John Mueller, is that subdomains and subdirectories are treated roughly equivalently. But in practice, subdomains inherit less authority from the root domain than subdirectories do. They sit in a grey area. More separation than a subfolder, less geotargeting power than a ccTLD.
All language versions sit under a single domain and inherit its full SEO authority. This is the default recommendation for most businesses in 2026. Easier to set up, easier to maintain, and every new language version benefits from the root domain’s existing link equity instead of starting from zero.
Don’t. Google can technically interpret these, but they signal “this is the same page with a language flag” rather than “this is a distinct localized page.” Fine for temporary testing, unsuitable for production.
Note: Language targeting and country targeting are not the same thing. As in, yourbrand.com/de/ most naturally reads as “the German-language version.” It’s useful for Germans, Austrians, and Swiss Germans alike. yourbrand.de, by contrast, reads as “the Germany-specific site.”
Some businesses need both axes (a Swiss French page that’s distinct from a France French page), some only need language (a single Spanish version serving Spain, Mexico, and Argentina). Whichever axis you’re on shapes which URL structure fits, and we’ll come back to this distinction throughout the next few sections.
2. Hreflang tags
If URL structure is the decision you can’t reverse, hreflang is the decision you can’t afford to get wrong. Independent audits consistently find that roughly two-thirds of international websites have hreflang errors severe enough to break their implementation (LinkGraph’s research puts it at over 65%, while Ahrefs puts it at 67% Gracker).
An hreflang tag tells search engines which language and region a specific page is meant for. It looks like this:
<!-- Tells search engines: this URL serves English for Australian users -->
<link rel="alternate" hreflang="en-au" href="https://www.example.com/en-au/" />
That tag says “there’s an Australian English version of this page at this URL.” You place one tag per language version on every page that has alternates.
Overall, there are three things about hreflang that you shouldn’t skip:
If your English page declares a French alternate, your French page must declare the English page as an alternate back.
Missing return tags is the single most common hreflang error. In fact, Google ignores the entire set when reciprocity breaks. Self-referencing tags (a page pointing to itself in its own hreflang set) are also required; omitting them is another frequent failure mode.
Each language version should canonicalize to itself, not to a “master” version. If your Spanish page’s canonical points to the English page, you’ve told Google “these aren’t really separate pages” (and Google will honor the canonical and ignore the hreflang).
The hreflang=”x-default” value specifies which page to serve when no other language version matches the user’s location or language. It’s meant to appear once per set, typically pointing to a language selector page or your primary language version.
For sites with more than a couple of languages, omitting it means a user whose language and region match nothing you’ve declared gets whichever version Google guesses at. Using x-default on every language version is a separate and equally common mistake.
<!-- One tag per language version, all listed on every page -->
<link rel="alternate" hreflang="en" href="https://www.example.com/" />
<link rel="alternate" hreflang="fr" href="https://www.example.com/fr/" />
<link rel="alternate" hreflang="es" href="https://www.example.com/es/" />
<!-- Fallback for users whose language doesn't match any of the above -->
<link rel="alternate" hreflang="x-default" href="https://www.example.com/" />
Every one of those pages needs the same four tags, adjusted for its own self-reference.
Three places you can declare hreflang: in the HTML <head> (shown above), in your XML sitemap, or via HTTP response headers (for non-HTML files like PDFs).
Pick one method and stick to it. Mixing methods across your site is a frequent source of conflicts.
Note: Google’s own native tooling for debugging hreflang is limited. The International Targeting report that used to live in Search Console was deprecated in September 2022 and no longer exists.
For serious hreflang audits, most operators now rely on third-party crawlers. Ahrefs, Screaming Frog, and Sitebulb all have dedicated hreflang audit features that catch missing return tags, invalid codes, and canonical conflicts at scale.
3. Character encoding
If your site is already UTF-8 throughout (HTML, database, server responses, file storage), skip this section.
UTF-8 is the Unicode encoding that can represent every character in every written language, from the Latin alphabet to Arabic, Chinese, Japanese, Korean, Cyrillic, Hebrew, Devanagari, emoji, and mathematical symbols. It’s been the dominant encoding on the web since 2008 and is used by roughly 98% of websites today.
The old alternatives (Latin-1, Windows-1252, Shift-JIS, GB2312) only handle specific language groups and will produce the classic mojibake symptoms the moment a user types a character that the encoding can’t represent. For example, the replacement-character diamond (�) or UTF-8 bytes displayed as Latin-1 (so “café” becomes café).
Three places UTF-8 must be set consistently:
- HTML meta tag: <meta charset=”UTF-8″> as the first element inside <head>.
- HTTP response header: Content-Type: text/html; charset=UTF-8.
- Database and server storage: MySQL’s utf8mb4 (not the older utf8, which only supports three-byte characters and fails on emoji and certain CJK characters), PostgreSQL’s default UTF-8, file system storage in UTF-8.
A common failure mode comes from sites that are UTF-8 in the browser but Latin-1 in the database. Form inputs from non-Latin-alphabet users get mangled on save, and by the time anyone notices, there’s a corrupted history of customer records that’s expensive to repair.
One related trap worth flagging: encoding is not collation. UTF-8 lets you store any character correctly; it doesn’t tell your database how to sort or compare them. German umlauts, Turkish dotted I’s, and Spanish accented letters all have their own rules for alphabetical ordering, case conversion, and “equals” comparisons.
Setting the right encoding but leaving collation at the database default (often a generic utf8_general_ci or equivalent) produces subtle bugs. For instance, surnames sorting in the wrong order, login comparisons failing for accented usernames, and duplicate-key violations that shouldn’t exist. Pair UTF-8 with a locale-appropriate collation.
4. Text externalization
Before a translator can touch your content, that content has to be extractable from your codebase. Text externalization is the practice of storing all user-facing strings in separate resource files (typically JSON, XML, YAML, or the older gettext .po format) rather than hard-coding them inside templates, components, or controllers.
The pattern looks like this. Instead of:
<button>Sign up for free</button> <!-- ❌ hard-coded — untranslatable without editing source -->
You write (in pseudocode, the exact syntax varies by framework):
<button>{ translate("cta.signup") }</button> <!-- ✓ externalized — the key resolves to the correct string for each locale -->
And store the actual text in a resource file:
{
"cta.signup": "Sign up for free"
}
When French is added, a new file fr.json contains “cta.signup”: “Inscrivez-vous gratuitement”, and the template never changes. The same code renders the correct language based on which resource file is loaded.
This is the single most important technical habit for internationalization, and it’s also the one developers skip most often under deadline pressure.
Every hard-coded string is a future bug. And “string” here means more than visible page copy. A complete externalization pass covers:
- Button labels, headings, and body copy (the obvious).
- Form placeholders, validation error messages, and tooltip text.
- Email templates (welcome emails, password resets, receipts, transactional notifications).
- Image alt text and ARIA accessibility labels.
- Metadata: page titles, meta descriptions, Open Graph tags
- Error states, empty states, and loading messages.
- Date, currency, and number format strings (covered in more depth in 6.7).
- Legal and compliance copy: cookie banners, terms snippets, GDPR notices.
The most common retrofit pattern is a team that externalized the homepage in week one and is still finding hard-coded strings in the password-reset flow a year later. Make the list above a checklist, not a discovery.
5. Accommodating text expansion and contraction
Translated text rarely occupies the same space as the source. English is unusually compact. Most languages expand when translated from it and a few contract.
To plan accordingly, the table below can be a useful heuristic. It’s an expansion guideline based on IBM’s classic design reference, still maintained by the W3C.
| Source language → Target | Typical expansion |
|---|---|
| English → Spanish, French, Italian, Portuguese | +15% to +30% |
| English → German, Dutch | +30% to +35% Sometimes more |
| English → Russian, Polish | +20% to +30% |
| English → Arabic, Hebrew | -5% to +25% Variable |
| English → Chinese, Japanese, Korean | -10% to -40% Contraction |
Averages are misleading, though.
The real problem is shorter strings. The rule of thumb that actually matters: the shorter the source string, the larger the percentage expansion. A paragraph of English might expand 25% in German. A single-word UI label might expand 300%.
What this means for design:
- Don’t fix widths on text-containing elements. Use flexible layouts, minimum widths rather than fixed ones, and let containers grow where possible.
- Test early with pseudo-localization. Before real translations exist, render your UI with inflated placeholder text (e.g., wrap every English string in [!!! … !!!] or generate lengthened variants) to surface overflow bugs. Many modern i18n libraries support this, and teams often bake it into their design QA workflow before real translations arrive.
- Give translators character limits where they matter. A navigation item with a hard 15-character ceiling is a constraint translators can work with; an unconstrained translation followed by a frustrated designer “fixing” it weeks later is not.
- Watch vertical space too. Some languages (Arabic, Thai, Vietnamese) use diacritics and marks that need more vertical room per line. Line-height set tightly for Latin text will clip them.
- Don’t bake text into images. Hero banners, infographics, button graphics, screenshots with embedded labels. Basically, any text living inside an image file has to be re-rendered for every language, manually. What looks like a nice design shortcut in English becomes one of the most expensive parts of a ten-language rollout. Use HTML/CSS text overlaid on images wherever possible, and reserve in-image text for true illustration.
6. Right-to-left language support
Arabic, Hebrew, Farsi (Persian), and Urdu are read right-to-left. That sounds like a simple flip, but an RTL interface isn’t a mirror image of the LTR version; it’s a structural redesign of the layout flow.
Navigation moves to the right. Sidebars swap sides. Arrows and chevrons point the other way. Icons with directional meaning (back/forward, next/previous) flip. Form fields align right. Progress bars fill from right to left.
For a well-designed transition, two levers matter:
Setting <html dir=”rtl” lang=”ar”> tells the browser to flip the document’s base text direction.
This handles the text itself correctly—Arabic and Hebrew render right-to-left, numerals and embedded Latin-script words handle their own directionality via Unicode’s bidirectional algorithm. What it doesn’t do, by itself, is reposition your layout.
The modern approach to RTL layout is to stop thinking in left and right and start thinking in start and end. Instead of margin-left: 20px, write margin-inline-start: 20px. Instead of padding-right, write padding-inline-end.
The same CSS now works correctly in both LTR and RTL contexts. The browser resolves start and end based on the document direction. No separate stylesheets, no .rtl-override classes, no duplicated rules.
CSS logical properties are supported in every modern browser at this point and are the 2026 best-practice default. Retrofitting them onto an existing LTR-only stylesheet is tedious but mechanical; building them in from the start costs nothing.
The non-CSS parts of RTL support are where most teams underestimate effort:
- Icons with direction. Back arrows, play buttons, chevrons, breadcrumb separators, audit which icons have directional meaning and flip them in RTL contexts. Icons without directional meaning (search, menu, close) stay as they are. This is easier to get wrong than it sounds.
- Mixed-direction content. Arabic text containing English product names, or Hebrew text with Latin numerals, renders with embedded direction changes. The bidirectional algorithm handles most cases automatically, but complex mixed strings sometimes need <bdi> or <bdo> tags to render correctly.
- Animations and transitions. Slide-in animations that enter from the left in LTR should enter from the right in RTL. Otherwise motion feels counterintuitive.
- Number and punctuation handling. Arabic uses its own numeral forms in some contexts and Western numerals in others, depending on register and region. This decision should be made deliberately per market, not left to defaults.
7. Locale-sensitive formatting
Translation handles words. Locale formatting handles everything that isn’t words. And that’s more than you realize. There are 6 categories to consider:
The US writes 04/05/2026 as April 5th. Most of Europe reads the same string as May 4th. Japan uses 2026/04/05 (year-first, unambiguous). ISO 8601 (2026-04-05) is the internationally safe format for structured data but looks clinical in user-facing interfaces.
Month names need translation. Weekday names need translation. The first day of the week varies. Sunday in the US and parts of Asia, Monday across most of Europe, Saturday in much of the Middle East, which affects any calendar or scheduling interface.
Time is not expressed in the same way across cultures. 12-hour vs 24-hour, AM/PM vs morning/afternoon conventions, whether a colon or period separates hours and minutes (14:30 vs 14.30).
Time zones need to be explicit. “3pm” in a user interface is meaningless without a zone, and assuming the user’s local zone fails the moment someone is booking a flight, scheduling a meeting, or reading a timestamp from a different country.
1,000.50 in the US is 1.000,50 in Germany, 1 000,50 in France, and 1’000.50 in Switzerland. Get this wrong on a price page and you’ve just shown a German customer what looks like a one-euro product for a thousand-euro item.
Not just the symbol. The euro goes before the number in Ireland (€100) and after it in France (100 €). Some currencies conventionally use no decimal places (Japanese yen, Korean won).
Some use three (Kuwaiti dinar, Tunisian dinar). Currency abbreviations vary: kr for Swedish krona, Norwegian krone, and Danish krone, three different currencies sharing one symbol, distinguished by context.
And when displaying prices across a multi-currency storefront, you have the separate and harder question of how you handle conversion, rounding, and psychological price points (a product priced at $99 usually shouldn’t become €91.37).
Metric vs imperial is obvious. What is less obvious is that the US, UK, Liberia, and Myanmar use different imperial conventions; cooking measurements (US cup vs UK cup) differ by volume; paper sizes (Letter vs A4) affect any print-oriented UI; clothing and shoe sizes have region-specific conventions.
Family-name-first (Japan, China, Korea, Hungary) vs given-name-first (most of the West) affects form field labels, sort order, and salutations.
Address formats (the order of street, city, region and postal code) vary enormously. A form designed around the US pattern (street, city, state, ZIP) will frustrate or silently fail for users in dozens of countries.
The practical solution for most of this is the browser’s native Intl API (Intl.DateTimeFormat, Intl.NumberFormat, Intl.RelativeTimeFormat), backed by the Unicode CLDR database that powers every major formatting library.
Libraries like FormatJS wrap Intl for framework use. i18next focuses on translation but integrates with these formatters. For React, react-intl (part of FormatJS) is the common pairing.
The underlying principle is simple: never build formatting logic by hand. If your codebase contains a function that manually decides where to put commas in a number or how to format a date string, you’ve built a bug factory. Use the Intl API or a library on top of it, pass it a locale, and let it handle the rules for every region Unicode has mapped.
8. CMS and framework choice
If you’re choosing or reconsidering your CMS or framework specifically because of multilingual requirements, you have three genuine questions to answer:
Major platforms like WordPress (with WPML or Polylang), Shopify (via Markets or apps like Langify), Webflow, Squarespace, Drupal, Contentful and Sanity all support multilingual content, but with very different levels of depth.
“Supports multiple languages” can mean anything from “you can create a separate site in each language” (primitive) to “content objects have language-aware fields and the CMS handles routing, fallbacks, and translation workflow” (mature).
Ask vendors to demonstrate adding a third language to an existing site, not just showing you a two-language example.
Not every CMS can cleanly produce subdirectory URLs. Some default to subdomains or URL parameters in ways that are painful to override. If you’ve decided on /de/ subdirectories, verify the CMS can produce exactly that URL pattern—not /?lang=de, not de.yoursite.com.
A CMS that stores translations internally but has no API or export format is a CMS that will force manual copy-paste translation workflows forever. Look for native integrations with Lokalise, Phrase, or Crowdin, or at minimum a robust API and standard export formats (XLIFF, JSON, CSV).
For JavaScript frameworks, the landscape is clearer. Next.js has first-class internationalized routing built in. Nuxt (Vue), Remix, SvelteKit, and Astro all have mature i18n solutions. React on its own has no routing opinion; pair it with react-i18next or react-intl. Vue has vue-i18n. Angular has built-in i18n via @angular/localize. For Python, Flask uses Flask-Babel; Django uses its built-in translation system. Each of these has its own conventions; the framework-specific tutorials are where to go next when implementation starts.
One signal worth listening to: the framework’s i18n story says a lot about how seriously the framework treats international use. Next.js shipped internationalized routing as a core feature. That’s a different level of commitment than a framework where the i18n guide is a third-party blog post from 2019.
9. Translation workflow and tooling
At some point, actual translators (human, machine, or both) need to turn your externalized strings into other languages. How that happens determines almost everything about how fast you can launch new markets and how consistent your brand stays as you scale.
Three families of tools, roughly ordered by how much your engineering team is involved:
Lokalise, Phrase, and Crowdin all sit here. They integrate with your repository via CLI, API, or direct Git integration; they auto-pull new strings as they appear in your codebase; they push translations back as files your build process can consume.
Translators work in a web interface with translation memory, glossaries, and QA checks. Lokalise leans toward UI copy and mobile apps. Phrase has the strongest developer tooling and enterprise features.
Crowdin has transparent word-based pricing (versus Lokalise’s seat-based model, which can penalize larger teams) and is popular with open-source and community-translated projects.
Smartling and XTM Cloud are the dominant enterprise options. Smartling is known for its visual-context translation, showing translators exactly how a string will appear in the rendered interface, which catches length and meaning errors that spreadsheet-based translation never would.
These platforms assume enterprise scale, enterprise security requirements, and enterprise procurement timelines.
Trados Studio and memoQ are the desktop-based tools professional translators and LSPs have used for decades.
They’re not website-integrated in the way Lokalise or Phrase are, but they’re where the deepest professional translation memory and terminology management happens.
For enterprise localization projects where translation quality is paramount (regulated industries, legal content, highly technical documentation) a workflow that routes through a CAT tool isn’t legacy, it’s best practice.
Weglot, Localize, and a few others sit in a different category: they proxy your website, detect content automatically, and serve translated versions without requiring developer integration. Fast to set up, useful for content-heavy marketing sites, less suitable when translations need to live alongside application logic.
There’s one decision underneath all of this that the tooling choice can’t answer for you: what’s your mix of AI translation, human translation, and human post-editing of AI output? In 2026, for most content, the question isn’t “AI or human” but rather “which content gets which workflow?”
For instance, raw AI translation is reasonable for user-generated content, long-tail pages, and internal tooling where speed and cost matter more than nuance.
However, human translation from scratch is still the standard for hero copy, brand messaging, legal content, and anything where a mistranslation has real business consequences.
As for AI translation with expert post-editing (still widely called MTPE, increasingly called AITPE as LLMs replace older MT engines) is the dominant workflow for most commercial content in 2026. Faster than pure human translation, more reliable than raw AI, and the baseline most professional translation services now operate from.
The trap is treating AI translation as a finished product because the output looks fluent. Modern LLM-based translation is genuinely impressive and genuinely wrong.
It hallucinates proper nouns, misreads idioms, invents plausible-sounding terminology that doesn’t match your glossary, and doesn’t know your brand voice.
For anything customer-facing, AI-drafted and expert-reviewed is the minimum professional bar. This is where a translation services partner earns its keep: not by producing raw translation (AI has changed those economics), but by owning the review, quality assurance, terminology management, and cultural adaptation that turn AI output into something a native speaker trusts.
10. SEO for multilingual sites
Hreflang (6.2) and URL structure (6.1) are the two biggest SEO decisions, but they’re not the only ones. A complete multilingual SEO checklist also covers:
Each language version should canonicalize to itself. The trap is copying the canonical tag along with the rest of a template, which silently tells Google “the English version is the real page.” Audit canonical tags specifically when auditing hreflang.
A multilingual sitemap can declare hreflang relationships inline, which for large sites is more maintainable than putting hreflang in the <head> of every page. Either approach is valid; the mistake is doing both inconsistently.
Should yourbrand.com/en/products/shoes become yourbrand.com/fr/produits/chaussures, or yourbrand.com/fr/products/shoes?
Translated URLs (slugs in the target language) signal relevance more strongly and improve click-through rates in local search results, but they require a routing layer that can resolve translated slugs back to the same underlying content and need careful redirect management when slugs change.
For most sites, translating URL slugs is worth the complexity; for large catalog sites with frequent product changes, many teams leave slugs in English for operational sanity.
Page titles, meta descriptions, Open Graph tags, Twitter card text, structured data. It all needs to be translated, and all of it is part of the externalization checklist from 6.4. Missing metadata translation is one of the most common issues you can have. The page content is translated, but the SERP snippet still shows English.
The search terms people use in German are not translations of English keywords. They’re their own keywords, with their own volume and competitive dynamics.
A German customer searching for running shoes might search “Laufschuhe” (running shoes), “Joggingschuhe” (jogging shoes), or “Sportschuhe” (sports shoes); picking the wrong primary keyword ranks you for traffic that doesn’t convert. Budget keyword research per market as a distinct cost, not a translation deliverable.
International SEO authority isn’t fully transferable. A domain with 10,000 backlinks from English-language sites starts with real advantages in English-speaking markets, and much smaller advantages in Germany, Japan, or Brazil.
Backlink strategy needs a local component per priority market, which is why ccTLDs and subdomains are harder to rank: they don’t inherit root-domain authority and don’t have local backlinks either.
A site hosted in the US serves pages fast to US users and slow to users in Singapore or São Paulo.
A CDN (Cloudflare, Akamai, Fastly) with points of presence in your target markets is nearly mandatory for international SEO at this point. Page speed is a ranking factor, and it’s also a conversion factor.
Automatic redirection based on browser language or IP geolocation is tempting and almost always wrong.
It overrides user intent, breaks for travelers and expats, confuses search engines when Googlebot gets redirected, and frustrates users who want to read a site in a language other than their default.
Let users choose. Remember their choice. Show them the right default on first visit if you can detect it confidently, but never force a redirect.
How the internationalization (i18n) process actually works
Every framework has its own i18n story. And there are many: Flask-Babel, next-intl, react-i18next, vue-i18n, Django’s translation system, Angular’s @angular/localize. The syntax differs, the tooling differs, the file formats differ. However, the process doesn’t.
With that in mind, here are the six universal steps, independent of the stack.
Before anything else, every piece of user-facing text in the codebase has to be identified and moved into resource files.
This is the step that takes the longest on mature codebases, and it’s the step that quietly determines whether the rest of the work goes smoothly.
The externalization checklist from Section 6.4 is the scope, not just body copy, but emails, validation messages, metadata, alt text, error states, and legal copy. Miss any of those, and they’ll surface as bugs later.
Once strings live in resource files, your code needs to reference them through a translation function — t(‘key’), gettext(‘key’), __(‘key’), or whatever your framework calls it.
This is the convention that makes automated extraction possible: tooling can scan the codebase for translation-function calls and build a master list of every string that needs translating.
Your tooling extracts every marked string into a single file (usually JSON, YAML, XLIFF, or .po) that becomes the source of truth. For any new language, this file gets duplicated and translated. The source file itself keeps getting updated as new strings enter the codebase.
This is where the translation workflow from Section 6.9 engages.
The translated files get committed back into the codebase (or loaded from a TMS via API at build time or runtime). The framework’s i18n layer handles the rest, i.e., detecting the user’s language, loading the right file, rendering the right strings and falling back when a translation is missing.
This is the step frameworks actually differ on; everything before it is universal.
The Flask framework: We’ll cover the Flask implementation in detail in our dedicated Flask Internationalization Tutorial with Flask-Babel. React, Next.js, and Vue guides will follow.
New features mean new strings. Without a maintenance workflow, the first few months of multilingual operation produce a steady drift, English-only strings appearing in translated interfaces whenever a new feature ships.
The healthy pattern is automation: translation files get regenerated on every build, a CI check fails the build if untranslated strings exist, and translators get notified automatically when new strings need their attention.
What the management actually has to decide
Developers can handle most of the steps 1, 2, 3, and 5 without much business input. Steps 4 and 6 need decisions that only the business can make.
- Which languages and in what order? This is a market-prioritization question, not a technical one. Pick based on actual market research, not on which translations seem easy.
- What’s the quality bar per language? Is the French hero copy going to the marketing review? Is the German legal copy going to a regulated-industry reviewer? These decisions shape cost and timeline.
- What’s the terminology and brand voice per market? Is “Sign up” translated as the formal or informal “you” in French? In German? In Japanese? These aren’t translation decisions; they’re brand decisions that translators need guidance on.
- Who approves translations before they go live?If there’s no in-market approver, translations ship without review, and errors reach customers.
- What’s the ongoing budget for maintenance translation?New strings get added every sprint. Somebody has to translate them, and somebody has to pay for it.
Key takeaways
- Internationalization (i18n) is the architectural work that makes translation possible. It comes before localization, not after. The goal is a website where adding a new language is a matter of configuration and translation, not engineering.
- Ten decisions drive implementation. URL structure, hreflang, UTF-8, text externalization, text expansion handling, RTL support, locale-sensitive formatting, CMS and framework choice, translation workflow, and multilingual SEO. Each one is easier to get right up front than to retrofit later.
- Subdirectories win for most sites. yourbrand.com/de/ is the 2026 default for international URL structure. ccTLDs make sense only for flagship markets or regulated industries. Avoid subdomains unless you have a specific reason.
- Hreflang fails more often than it works. Roughly two-thirds of international sites have hreflang errors that break their implementation (usually missing return tags, invalid codes, or canonical conflicts). Audit it; don’t assume it’s working.
- Text externalization covers more than body copy. Emails, validation messages, metadata, alt text, error states, and legal copy all need to live outside the codebase. Missing any of these produces the “translated homepage, English password reset email” pattern that takes years to clean up.
- Design for text expansion from the start. English is unusually compact. German expands 30%+; short UI labels can expand 300% or more. Fixed-width buttons and rigid layouts are future bugs.
- Machine translation has raised the floor, not replaced professionals. Raw MT is usable as a draft for most content in 2026. For hero copy, legal text, and brand-sensitive content, human translation or MT with professional post-editing remains the standard. The work a services partner does has shifted toward review, QA, terminology management, and cultural adaptation — which haven’t been automated and probably won’t be soon.
- Forced language redirection is almost always wrong. Detect, suggest, offer, don’t force. Override the user’s choice, and you break the experience for travelers, expats, bilinguals, and search crawlers.
- Retrofit is expensive; prevention is cheap. Internationalizing a mature site is one of the most costly engineering projects a product team takes on. Building it in from day one is a small upfront cost and nothing afterward. This is the cheapest option value a product team can buy.
Web i18n FAQ
i18n stands for internationalization (the "i," then the 18 letters between "i" and "n," then an "n"). The convention started in software engineering in the 1980s and stuck because "internationalization" is tedious to type and easy to misspell. The same pattern gives us l10n for localization, g11n for globalization, and a11y for accessibility.
Website internationalization is the process of preparing a website's code, architecture, and design to support multiple languages and regions, without having to rewrite it for each new market. It's the structural work that makes translation and localization possible later.
A properly internationalized site can be translated into a new language in days. A site that hasn't been internationalized often needs months of engineering work before translation can even begin.
Internationalization (i18n) is the technical preparation (externalizing text, supporting multiple URL structures, using UTF-8, accommodating text expansion, handling locale-sensitive formatting). It's done once, by developers and designers.
Localization (l10n) is the market-specific adaptation (translation, cultural adjustments, regional marketing copy, and local legal compliance). It's done repeatedly, for each new market, by translators, regional marketers, and cultural reviewers. You internationalize a site; then you localize it for each target market.
In practice, yes. You can technically translate the content of a site that hasn't been internationalized, but the result is usually brittle and expensive to maintain; every new language compounds the underlying problems.
A properly internationalized site can be localized in days. Skipping i18n is the most common reason localization projects run months over schedule.
For a new site built with i18n from day one, the additional effort is small, a consistent set of habits rather than a distinct project. For an existing site being retrofitted, timelines vary widely with codebase maturity and complexity.
Retrofitting is consistently one of the more expensive engineering projects a product team takes on, and the longer the site has been English-only, the more hard-coded strings, embedded text in images, and locale-specific assumptions will need to be unwound.
Cost depends on three things: the size and age of the existing codebase (retrofit is far more expensive than greenfield), how many languages are in scope, and whether translation will be handled in-house, by freelancers, or by a professional services partner.
For most businesses, the largest cost line item isn't translation itself (modern machine translation has reduced raw translation costs substantially) but the engineering work to internationalize the codebase, plus the ongoing cost of translation review, QA, terminology management, and localization maintenance.
Yes, you can retrofit internationalization onto an existing site without a full rebuild, and most teams do. The process usually involves auditing and externalizing every hard-coded string, restructuring URLs to support multiple languages, updating layouts to handle text expansion, and introducing a translation workflow.
It's more expensive than building i18n in from day one, but a rebuild is rarely the right answer—the retrofit is a known-scope project; a rebuild introduces new risks and usually takes longer than the retrofit would have.
There's no single best answer. It depends on your existing stack, team, and content model. WordPress handles multilingual content via plugins like WPML or Polylang. Shopify supports it through Markets and third-party apps. Webflow, Squarespace, and Drupal all have native multilingual features.
For headless setups, Contentful and Sanity have strong localization models built in. The things to verify before choosing: does it support your preferred URL structure (subdirectories, subdomains, ccTLDs) without forcing you into URL parameters; does it integrate with a translation management system like Lokalise, Phrase, or Crowdin; and does it store translations as first-class content, or bolt them on as an afterthought.
Pick based on actual market research, not on which translations feel easy. The considerations: which markets represent genuine revenue opportunity, which are regulated or require local-language support, where your existing traffic is coming from (Google Analytics can tell you which countries are already finding you), and where your competitors are or aren't strong.
A common starting pattern is: English as the source, then Spanish, French, German, and Portuguese (Brazilian) for broad Western reach; add Japanese, Chinese, Korean, and Arabic when your specific market warrants it. But defaulting to "the top 5 European languages" without market research is a mistake most teams make and most regret.
Search engines rely on explicit signals to show the right language version to the right user. The primary signals are URL structure (ccTLDs, subdirectories, or subdomains), hreflang tags that declare the language and region of each page version, and translated metadata that signals relevance for local searches.
Handled correctly, each language version ranks independently in its target market. Handled poorly, language versions compete with each other, duplicate content warnings appear, and search engines serve the wrong version to the wrong user.
Most multilingual SEO problems trace back to broken hreflang or canonical tags fighting hreflang declarations—both covered in Section 6.2.
No. The cost difference between building internationalization from day one and retrofitting it later is significant enough that even small businesses with any ambition to eventually sell internationally benefit from doing the structural work up front. The translation itself can wait; the architecture shouldn't.
For many content types in 2026, yes—as a draft. Raw machine translation is reasonable for user-generated content, long-tail catalog pages, and low-stakes internal tooling.
For customer-facing hero copy, brand messaging, legal content, and regulated industries, machine translation with professional post-editing (MTPE) is the standard. Pure human translation remains appropriate for the highest-stakes content.
The mistake isn't using MT; it's publishing raw MT output to customers without review—modern MT is fluent enough that its errors feel authoritative even when they're wrong.