Copy a website

General Metalworking - All aspects of working with metal. 

Bookmark this page:  YahooMyWeb Yahoo!  Google Google  Windows Live Favorites Windows Live  del.icio.us del.icio.us  digg digg  Add to Netscape Netscape
Subject Author Date
Copy a website Gunner Asch 08-30-2008
Posted by on August 31, 2008, 4:27 pm
Please log in for more thread options


>
> If you try downloading my website algebra.com, you will get into an
> infinite recursion through millions of pages. That's why I prevent
> most such bots from accessing my site. This would work only on very
> simple sites.

How does your web server differentiate between a bot and a human user
making http requests?

Regards,

Robin

Posted by Lloyd E. Sponenburgh on August 31, 2008, 6:27 pm
Please log in for more thread options


robinstoddart@gmail.com fired this volley in news:b6e6ea8f-a3fe-4f46-
a893-6ba3b4f93520@f63g2000hsf.googlegroups.com:

>>
>> If you try downloading my website algebra.com, you will get into an
>> infinite recursion through millions of pages. That's why I prevent
>> most such bots from accessing my site. This would work only on very
>> simple sites.
>
> How does your web server differentiate between a bot and a human
user
> making http requests?

Duh! It doesn't. The site has links back to the place where the link
began. It wouldn't appear recursive to a human user, because that
person would choose where he/she viewed. The spider can't tell, and
ends up in recursions it can only abort by "counting out" repeats.

LLoyd

Posted by Ignoramus3863 on August 31, 2008, 6:51 pm
Please log in for more thread options


On 2008-08-31, Lloyd E. Sponenburgh <lloydspinsidemindspring.com> wrote:
> robinstoddart@gmail.com fired this volley in news:b6e6ea8f-a3fe-4f46-
> a893-6ba3b4f93520@f63g2000hsf.googlegroups.com:
>
>>>
>>> If you try downloading my website algebra.com, you will get into an
>>> infinite recursion through millions of pages. That's why I prevent
>>> most such bots from accessing my site. This would work only on very
>>> simple sites.
>>
>> How does your web server differentiate between a bot and a human
> user
>> making http requests?
>
> Duh! It doesn't. The site has links back to the place where the link
> began. It wouldn't appear recursive to a human user, because that
> person would choose where he/she viewed. The spider can't tell, and
> ends up in recursions it can only abort by "counting out" repeats.

I actually have some smarts in the server that can tell a bot from a
human. But httrack is blocked on the spot in any case. I am not
against it, as such, but it will not work on my site.

--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/

Posted by Richard J Kinch on September 1, 2008, 12:19 am
Please log in for more thread options


Ignoramus3863 writes:

> I actually have some smarts in the server that can tell a bot from a
> human.

Not a bot attempting to look human. Just bots that advertise their
botness, by honest design or flawed hacking.

Posted by Ignoramus3863 on September 1, 2008, 12:24 am
Please log in for more thread options


> Ignoramus3863 writes:
>
>> I actually have some smarts in the server that can tell a bot from a
>> human.
>
> Not a bot attempting to look human. Just bots that advertise their
> botness, by honest design or flawed hacking.

Yes, a bot trying to look like a human (ie supplying Referer and
browser-like User-Agent, I can still detec that it is a bot).

The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it. The hidden link is
not permitted by robots.txt, so it catches all non-compliant bots.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/

Similar ThreadsPosted
FA: I just upgraded my copy of turboCAD - you may be able to get my old copy cheaply November 27, 2008, 1:28 am
Art Website ! December 20, 2007, 4:20 am
a good website February 13, 2007, 4:36 pm
New website for metallurgist April 20, 2007, 8:15 am
Can someone beautify a website for me? June 3, 2007, 1:06 am
Can someone beautify a website for me? June 3, 2007, 1:07 am
New website for Eastburn May 1, 2006, 10:53 pm
MarMachine website May 1, 2006, 4:11 pm
Kewl website December 24, 2008, 12:21 pm
Tanaka Welding Website July 27, 2006, 12:29 pm

Contact Us | Privacy Policy

XML SitemapXML Sitemap