XML Sitemaps landing in WordPress Core

The Google team is working on a feature plugin to add XML sitemaps to the WordPress core:

https://github.com/GoogleChromeLabs/wp-sitemaps

I’ve tested the MVP and it works great.

The URLs can be filtered with this code. That means we can use it for Frontity :slight_smile:

<?php 
function change_urls( $url_list ) {
	for ( $i = 0; $i < count( $url_list ); $i++ ) {
		foreach ( $url_list[ $i ] as $attr => $value ) {
			if ( 'loc' === $attr ) {
				$parsed_url              = wp_parse_url( $value );
				$parsed_url['scheme']    = 'https';
				$parsed_url['host']      = 'frontity-domain.com';
				$url_list[ $i ][ $attr ] = unparse_url( $parsed_url );
			} 
		}
	}
	return $url_list;
}
 
add_filter( 'core_sitemaps_posts_url_list', 'change_urls' );
add_filter( 'core_sitemaps_taxonomies_url_list', 'change_urls' );
add_filter( 'core_sitemaps_users_url_list', 'change_urls' );
8 Likes

Hey @luisherranz thanks for the research, this plugin is going to be really helpful :blush:

Could you elaborate a bit more why this plugin is compatible with our “direct to Frontity” installation? It’s not totally clear for me.

Sure. With the code I added here we can change the URL of the sitemap links from the WordPress URL to the Frontity URL. For example, a link that is https://wp.domain.com/post-1 will become https://www.domain.com/post-1.

Then, the only thing we would need to do is to add the sitemap URL (https://wp.domain.com/sitemap.xml) to the robots.txt file of the Frontity site:

Sitemap: https://wp.domain.com/sitemap.xml
User-agent:*
Disallow:

It’s not the only way to add sitemap support though, there are more ways. For example, we could add a Frontity package that requests the sitemap from the WordPress site and filters the URLs before returning it.

This is a very simplified version, but something like this:

packages: [
  {
    name: "@frontity/sitemap",
    state: {
      sitemap: {
        origin: "https://wp.domain.com"
      }
    }
  }
]
export const server = ({ app }) => {
  app.use(
    get("/sitemap-*.xml", async ctx => {
      const origin = ctx.settings.state.sitemap.orign;
      const frontityUrl = ctx.settings.state.frontity.url;
      // Get the original sitemap from the WordPress site.
      const response = await fetch(`${origin}/${ctx.path}`);
      const body = await response.text();
      // Replace the URLs of WordPress for URLs of Frontity.
      ctx.body = body.replaceAll(origin, frontityUrl);
      // Do not cache this.
      ctx.set("cache-control: no-cache");
    })
  );
};
1 Like

Now it’s totally clear for me, thanks a lot @luisherranz

My site does not have a sitemap yet. I use Yoast seo, but could not find a solution to make it work with Frontity. I tried to use this, but without succes.
I tried your solution, but I get **Fatal error** : Uncaught Error: Call to undefined function unparse_url.
Can you share that function? Or do you have a good solution with Yoast Seo?

I was able to make the code work with replacing unparse_url( $parsed_url ) with str_replace('//wp.mysite.com/', '//mysite.com/' $value).

And how can I add it to robots.txt?

I created an npm package from your code: https://www.npmjs.com/package/basic-sitemap-for-frontity
But it does not work… :stuck_out_tongue: I get always 400 error.

1 Like

Wow @koli14 that package is awesome, but we haven’t finished the server extensibility yet :sweat_smile:


There’s another, easier way, to solve this: Use a robots.txt file in the root.

It should be fairly easy to do because it’s simply replicating what we have for the favicon.ico file, but with a robots.txt file. If it’s not present, it should return the default robots.txt file we have defined now.

If you are willing to contribute with this feature, you can start here: https://docs.frontity.org/contributing/code-contribution-guide

And if you want to go ahead with a PR and need help, let us know! :slightly_smiling_face:

2 Likes

is it possible to do something to write the original domain in robots.txt, not a subdomain?
For example, for nodejs to take the content at wp.domain.com/sitemap.xml, and give it to the address domain.com/sitemap.xml

Where can i add this? in functions.php? any can i help me with this and the ads?

Where can i add this? in functions.php? any can i help me with this and the ads?

Yes, you need to add it in functions.php. And then add a robots.txt pointing to the generated Sitemap: Use a robots.txt file in the root

1 Like

Hi @prainua

Apologies for the long delay in getting back to you on this.

Support for robots.txt was added in the last release. Please see: Use a robots.txt file in the root

This is about to be released in the upcoming WordPress 5.5 :slightly_smiling_face:

@santosguillamot, now that our plugin is going to know if Frontity is being used in decoupled mode and the Frontity URL, it could add the filters for core sitemaps. Maybe also filters for other popular plugins, like Yoast.

EDIT: By the way, I don’t know if the filter I proposed in the opening post is still valid.

1 Like

I’ve been taking a look at it and it seems the filters you suggested in the opening post aren’t available anymore. We can find the new filters in the announcement or the plugin description.

I thought wp_sitemaps_posts_pre_url_list could work similar to the ones you used, but after doing some tests I think the purpose is different and you don’t have the $url_list at that point (you’ll have to generate it yourself).

Not sure but maybe we could use the filter wp_sitemaps_posts_entry, but I haven’t tested it yet. I’ll try to do it and keep you posted.

I’ve been taking a deeper look at that filters and I think it’s what we should use. I’ve tested them locally and it worked great. The logic is almost the same as the one Luis shared in the opening post, I just had to adapt it to these filters and I had to define the unparse_url() function. With this we can use the Sitemaps functionality and change the url to point to the Frontity domain. This is the code I was using:

function unparse_url($parsed_url)
{
	$scheme   = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
	$host     = isset($parsed_url['host']) ? $parsed_url['host'] : '';
	$port     = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
	$user     = isset($parsed_url['user']) ? $parsed_url['user'] : '';
	$pass     = isset($parsed_url['pass']) ? ':' . $parsed_url['pass']  : '';
	$pass     = ($user || $pass) ? "$pass@" : '';
	$path     = isset($parsed_url['path']) ? $parsed_url['path'] : '';
	$query    = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
	$fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
	return "$scheme$user$pass$host$port$path$query$fragment";
}

function change_urls($entry)
{
	$parsed_url              = parse_url($entry['loc']);
	$parsed_url['scheme']    = 'https';
	$parsed_url['host']      = 'frontity-domain.com';
	$entry['loc'] = unparse_url($parsed_url);
	return $entry;
}

add_filter('wp_sitemaps_posts_entry', 'change_urls');
add_filter('wp_sitemaps_taxonomies_entry', 'change_urls');
add_filter('wp_sitemaps_users_entry', 'change_urls');

As Luis said, as in the Frontity plugin we’ll know the frontend url, the future plugin could take care of this.

EDIT: Out of curiosity I took a look at Yoast filters and it seems we could make it work, exactly with the same code, using wpseo_sitemap_entry filter. It’d work for posts, taxonomies and users. I’ve tested it and it works great as well. This is the filter in Yoast code -> https://github.com/Yoast/wordpress-seo/blob/trunk/inc/sitemaps/class-post-type-sitemap-provider.php#L226-233

1 Like

Hey @SantosGuillamot :wave:t3:

Thank you very much for providing this example.
Have some questions on that:

  1. I use a lot of custom taxonomies. Do you think i have to add_filter for all the sitemaps? See Screenshot
    image

  2. As you can see from the screenshot the sitemap on my side is generated by Yoast, but i also use WordPress 5.5.1 version. So do you think i have to use them as you described here?

EDIT: Out of curiosity I took a look at Yoast filters and it seems we could make it work, exactly with the same code, using wpseo_sitemap_entry filter. It’d work for posts, taxonomies and users. I’ve tested it and it works great as well. This is the filter in Yoast code -> https://github.com/Yoast/wordpress-seo/blob/trunk/inc/sitemaps/class-post-type-sitemap-provider.php#L226-233

EDITED:
3. How can i test this in frontity to make sure it works?

Thank you very much!

If you’re using Yoast, I think the same filter should work for all the Post Types and for any url. And if you are using Yoast sitemap you should use this filter I mentioned. I think Yoast is overriding default WordPress sitemap so it shouldn’t be a problem.

This is not part of Frontity, it’s part of your backend. You just have to specify in your robots.txt where your sitemap, adding a line like this one:

Sitemap: https://admin.ruthgeorgiev.com/sitemap.xml

You can test it going to that url and navigating through the different post types and taxonomies you have.

1 Like

As you mentioned, it works well just changing to the yoast filter instead:
add_filter('wpseo_sitemap_entry', 'change_urls');

What it does miss though is the front page url and the archive pages for custom post types.

Adding a page for the CPT archive would include it in “page-sitemap.xml” though. Which can be a temporary solution. We would still be able to use the url as archive, if that’s preferred.

So to get the front page url included as well - I created a custrom sitemap for Yoast.

function add_sitemap_custom_items( $sitemap_custom_items ) {
$sitemap_custom_items .= '
<sitemap>
<loc>https://wp.mossbjerling.com/frontity_custom-sitemap.xml</loc>
<lastmod>2020-09-11-22T23:12:27+00:00</lastmod>
</sitemap>';
return $sitemap_custom_items;
}
add_filter( 'wpseo_sitemap_index', 'add_sitemap_custom_items' );

That should cover it up for anyone who needs a temporary solution for automated sitemaps.