Excluding Your Webflow Pages from Search Engine Indexing

There are several ways to disable search engine indexing for your Webflow website. Below are the key methods:

  1. Firstly, disable Indexing of the Webflow Subdomain
    This can be done in the site settings with a simple toggle, which is part of our pre-launch checklist.
  2. Disable Indexing of Static Site Pages
    Use the Sitemap indexing toggle found in the Page settings. This toggle adds <meta content="noindex" name="robots"> to your page, preventing it from being crawled and indexed by search engines.
  3. Disable Indexing of an Entire Folder
    To do this, create a rule in the robots.txt file.
  4. Disable Publishing of Empty Template Collection Pages
    This is managed through a toggle in the Collection template page settings.
  5. Disable Indexing of Certain Pages in a Collection
    This requires a one-time custom setup but can be easily managed in the future using the Option field in the Collection.
Indexing options in Webflow
Indexing options in Webflow

Understanding the Difference Between robots.txt and the <meta name="robots" content="noindex"> Tag

Both the robots.txt file and the <meta name="robots" content="noindex"> tag control how search engines interact with your website, but they function differently.

  1. robots.txt File
    • Purpose: Used to instruct web crawlers on which parts of your website they should or should not access. It prevents crawlers from reaching specific pages or directories.
    • Disallow Directive: Using "Disallow" in robots.txt prevents search engines from crawling specific URLs. However, if these URLs are linked from other websites, they can still be indexed based on the content found in the links, even though the crawlers won’t access the page content.
  2. <meta name="robots" content="noindex"> Tag
    • Purpose: This tag is placed directly in the HTML of a webpage and instructs search engines not to index the specific page. This ensures the page won't appear in search results, even if crawlers have access to it.
    • Usage: Useful when you want search engines to access a page but not include it in the search index. It can be combined with "nofollow" to prevent the passing of link equity.

The key difference between the two lies in their visibility and application. The robots.txt file controls crawling by instructing search engines on which parts of the site they should avoid. However, it doesn't guarantee that a page won't be indexed if other pages link to it. On the other hand, the <noindex> tag specifically prevents a page from being indexed, even if it is accessible to crawlers. In terms of application, robots.txt is particularly useful for managing access to large sections of your site, such as entire directories, while the <noindex> tag is more suited for controlling the indexation of individual pages.

In Summary
Use robots.txt when you want to block crawlers from accessing parts of your site, and use <noindex> when you want specific pages not to appear in search engine results. Remember, anyone can access your site’s robots.txt file, so they may still be able to identify and access your private content.

Blocking Query Parameters

To block specific query parameters using robots.txt, use the "Disallow" directive combined with wildcard characters (*). This helps prevent search engine crawlers from accessing URLs with those parameters, which is useful for managing duplicate content or preventing crawlers from indexing filtered or sorted versions of the same page.

Example: To block URLs that include the ?filter and ?sort query parameters:

User-agent: *
Disallow: /*?filter=
Disallow: /*?sort=

Explanation:

  • /*?filter=: Blocks all URLs that contain the filter query parameter. For example:
    • https://www.example.com/products?filter=color
    • https://www.example.com/products?filter=size
  • /*?sort=: Blocks all URLs that contain the sort query parameter. For example:
    • https://www.example.com/products?sort=price
    • https://www.example.com/products?sort=popularity

Important Notes:

  • Wildcard Usage: The * wildcard matches any sequence of characters, allowing you to block URLs that contain the specified query parameter, regardless of where it appears in the URL.
  • Order Matters: If you're combining "Allow" and "Disallow" directives, place "Allow" rules before "Disallow" rules when they need to override them.

By setting up your robots.txt file with these rules, you can effectively block search engines from crawling and indexing URLs with specific query parameters, helping to keep your search index focused on your site’s main content.

Use Password Protection

To prevent the discovery of specific pages on your website, protect them with a password. It's important to note that files uploaded to Webflow are publicly available and discoverable, though they may not necessarily be indexed by search engines if the file isn’t on a publicly viewable webpage or linked elsewhere. Password protection can prevent assets on your site from being discovered or indexed.

Written By
Karina Demirkilic
Founder | Lead Developer and Designer