How to hide the JSON data from page source if we are doing all SSR?Everything gets exposed for scraping, https://www.cnbcafrica.com/article/2020/12/15/update-1-nigerias-boko-haram-behind-schoolboys-abduction-audio-message/ Image: https://image.prntscr.com/image/IBuGLBDsSjqJeGYmAlzvSQ.png
The JSON injected in the HTML is the initial value of the state, This JSON is filled with the information needed for the rendered page.
Frontity applies Dynamic SSR from WordPress REST API data. This system has the main advantage of having fresh data available in every request (no need to rebuild the site). This system also requires the WordPress REST API data to be always online and available
So, in this context, try to avoid data-scraping is a difficult thing to do but there are some measures that can be taken to improve the security of the site
Maybe @mmczaplinski can add more information on this
There is no way that you can have your content publicly available on the internet and prevent scraping it at the same time, I’m afraid
The data that you see is the very same data that anyone can access publicly on https://cms.cnbcafrica.com/wp-json
I know you can’t hide everything publicly, but at least we can make it a bit difficult.
That’s right, and we can restrict the wp-json endpoint to the server IP right?
Yes, but the WordPress admin & Gutenberg are also using the REST API so in that case you might have to include the IPs of all of the content editors.
So, I think in answer to your question: yes, you could make it a little bit harder to access the raw JSON but personally I would not find it practical to do so. Plus anybody can still scrape the HTML of the site and parse the content from it.
I don’t think so, because when you disable Rest API these things still work, or, using security plugins most of them restrict access to your Frontity IP and make sure everything internal doesn’t get blocked from accessing the Rest API.
Keep in mind that your Frontity theme needs access to the REST API in order to make “Client Side Rendering”, so you can’t restrict the wp-json endpoint to the server IP. Every client accessing your website will be making client side calls to the REST API to fetch the content needed to render the next pages.
damn that’s right
Hey need your confirmation for this, just a rough idea
Before SSR, call API, check the site calling and send token like JWT, now use this token in all requests sitewide.
Make the token refreshable and low expiry time.
This will prevent anybody other than my site accessing my API right?
Not really, as anyone could get your HTML, extract the token and do a request using it.
As @mmczaplinski said, if your site is online you are exposed to scrapping, not matter if the information is in HTML, RSS or JSON format.
For that reason Google gives extra credit to the first site that published that information online. And if someone is clearly stealing your content, you should be able to take legal actions as well.
And that is fine, as request will have diff IP than my server so I can block it?
All the client request will always have a different IP than your server. As I said before, you can’t block that or client side rendering won’t work.