PDA

View Full Version : myspace question



Beyerstein00
04-28-2007, 04:11 PM
is there a web crawler thing (other than google) that caches myspace profiles

Troll
04-28-2007, 05:14 PM
Myspace uses robot.txt to tell automated bots not to crawl its site.

http://www.myspace.com/robots.txt

However, search engines still caches some myspace pages.. And as google probably has the biggest cache of myspace pages then i only s***est using google.. (or a simular search engine; such as msn)

Archive.org could work (i haven't checked), but as myspace uses a robot.txt it's highly unlikely to work...

Sorry i couldn't help much..

Ezekiel
04-28-2007, 05:31 PM
The file Troll mentioned (robots.txt) is a way of preventing bots crawling your website. You can disallow all crawlers, or some with a specific user agent.

On Myspace's robots.txt, they block ia_archiver. I just Googled, and it's the user agent string of www.archive.org. I was going to s***est that as a place to check, but they're blocked from caching Myspace pages.

Other than that, try Coral:

http://www.coralcdn.org/

... or, just search Google for "search engine" and try the cached versions on all the search engines you can find. Yahoo, Live Search, et al.