Prevent Google and others from indexing content?

Get help from other users here.

Moderators: Developer, Contributor

Post Reply
preeddal
Posts: 3
Joined: 29 Feb 2008, 16:05

Prevent Google and others from indexing content?

Post by preeddal »

We noticed that Google has the ability to get inside of issues and actually read contents of the issues, even though we don't allow access to users without a username and password and we don't allow account creation on the login page. Not sure how Google can see the contents of issues, but is there some setting within mantis that prevents this?
shaitan
Posts: 6
Joined: 18 Jun 2007, 19:43

Re: Prevent Google and others from indexing content?

Post by shaitan »

You can tell google to not index parts of your website. Check out their instructions on how google indexes and how you can prevent it from indexing parts of your website (check out the portion about robots.txt)

http://www.google.com/support/webmaster ... swer=35769
preeddal
Posts: 3
Joined: 29 Feb 2008, 16:05

Re: Prevent Google and others from indexing content?

Post by preeddal »

thanks shaitan, I was familiary with that, but was not sure if maybe some of the developers know of other settings within the mantis core code. does it not seem rather a security concern that mantis allows google access to issue contents that might need to be addressed?

Even though you can't click through to a specific issue from Google search results (mostly in the blog search tool) the first few lines of the description are still visible in the search results.

My concern is that if Google can get to it, what is to prevent someone else from getting to content?
preeddal
Posts: 3
Joined: 29 Feb 2008, 16:05

Re: Prevent Google and others from indexing content?

Post by preeddal »

After doing a little searching it seems that only issues from the 1.1.1 installation are exposed, previous tracker issues from 1.0.1 do not come up in Google's blog search tool. Should this be submitted as an issue for mantis 1.1.1?

UPDATE: after some research, I believe we've discovered how and why Google got in. We looked at server logs to determine what username Google used to index content. Then after speaking with that person we determined that the only difference between him and any other users was that he had subscribed to Mantis' RSS feed and was running the Google toolbar in the same browser. Since we had no robots.txt file present, Google must have assumed that the contents of the RSS was a blog or something like it and that it would be helpful, since someone had subscribed to it, to index its contents.

Word to the wise, if you don't want Google indexing your Mantis issue contents, be sure and purposefully exclude all bots via the robots.txt file. If you don't and someone subscribes to the RSS feed,Google will use their username and key to access issue contents. At that point they become public (at least the first few lines of the description). To solve the problem we've put a robots.txt forbidding all bots, and for good measure, since there was only one person using the RSS, we disabled the RSS feed. We submitted the domain to Google's content removal tool, but who knows how long or if they'll do that.

I wouldn't consider this so much a bug in Mantis as a feature of RSS and Google's pervasiveness to index all the web's content. Just be careful.
vboctor
Site Admin
Posts: 1304
Joined: 13 Feb 2005, 22:11
Location: Redmond, Washington
Contact:

Re: Prevent Google and others from indexing content?

Post by vboctor »

It is interesting how much spying happens behind the scenarios from all these toolbars, addons and plugins. I've reported an issue in Mantis bug tracker to get some brainstorming about what can be done from within Mantis, at least from the point of view of alerting and educating the administrators.

http://www.mantisbt.org/bugs/view.php?id=8982
Migrate your MantisBT to the MantisHub Cloud
Post Reply