|
Posted by on August 31, 2008, 4:27 pm
Please log in for more thread options
>
> If you try downloading my website algebra.com, you will get into an
> infinite recursion through millions of pages. That's why I prevent
> most such bots from accessing my site. This would work only on very
> simple sites.
How does your web server differentiate between a bot and a human user
making http requests?
Regards,
Robin
|
|
Posted by Lloyd E. Sponenburgh on August 31, 2008, 6:27 pm
Please log in for more thread options
robinstoddart@gmail.com fired this volley in news:b6e6ea8f-a3fe-4f46-
a893-6ba3b4f93520@f63g2000hsf.googlegroups.com:
>>
>> If you try downloading my website algebra.com, you will get into an
>> infinite recursion through millions of pages. That's why I prevent
>> most such bots from accessing my site. This would work only on very
>> simple sites.
>
> How does your web server differentiate between a bot and a human
user
> making http requests?
Duh! It doesn't. The site has links back to the place where the link
began. It wouldn't appear recursive to a human user, because that
person would choose where he/she viewed. The spider can't tell, and
ends up in recursions it can only abort by "counting out" repeats.
LLoyd
|
|
Posted by Ignoramus3863 on August 31, 2008, 6:51 pm
Please log in for more thread options
On 2008-08-31, Lloyd E. Sponenburgh <lloydspinsidemindspring.com> wrote:
> robinstoddart@gmail.com fired this volley in news:b6e6ea8f-a3fe-4f46-
> a893-6ba3b4f93520@f63g2000hsf.googlegroups.com:
>
>>>
>>> If you try downloading my website algebra.com, you will get into an
>>> infinite recursion through millions of pages. That's why I prevent
>>> most such bots from accessing my site. This would work only on very
>>> simple sites.
>>
>> How does your web server differentiate between a bot and a human
> user
>> making http requests?
>
> Duh! It doesn't. The site has links back to the place where the link
> began. It wouldn't appear recursive to a human user, because that
> person would choose where he/she viewed. The spider can't tell, and
> ends up in recursions it can only abort by "counting out" repeats.
I actually have some smarts in the server that can tell a bot from a
human. But httrack is blocked on the spot in any case. I am not
against it, as such, but it will not work on my site.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
|
|
Posted by Richard J Kinch on September 1, 2008, 12:19 am
Please log in for more thread options
Ignoramus3863 writes:
> I actually have some smarts in the server that can tell a bot from a
> human.
Not a bot attempting to look human. Just bots that advertise their
botness, by honest design or flawed hacking.
|
|
Posted by Ignoramus3863 on September 1, 2008, 12:24 am
Please log in for more thread options
> Ignoramus3863 writes:
>
>> I actually have some smarts in the server that can tell a bot from a
>> human.
>
> Not a bot attempting to look human. Just bots that advertise their
> botness, by honest design or flawed hacking.
Yes, a bot trying to look like a human (ie supplying Referer and
browser-like User-Agent, I can still detec that it is a bot).
The way I detect is is that there is a hidden link that humans cannot
see, and cannot click, but bots would follow it. The hidden link is
not permitted by robots.txt, so it catches all non-compliant bots.
--
Due to extreme spam originating from Google Groups, and their inattention
to spammers, I and many others block all articles originating
from Google Groups. If you want your postings to be seen by
more readers you will need to find a different means of
posting on Usenet.
http://improve-usenet.org/
|
| Similar Threads | Posted | | FA: I just upgraded my copy of turboCAD - you may be able to get my old copy cheaply | November 27, 2008, 1:28 am |
| Art Website ! | December 20, 2007, 4:20 am |
| a good website | February 13, 2007, 4:36 pm |
| New website for metallurgist | April 20, 2007, 8:15 am |
| Can someone beautify a website for me? | June 3, 2007, 1:06 am |
| Can someone beautify a website for me? | June 3, 2007, 1:07 am |
| New website for Eastburn | May 1, 2006, 10:53 pm |
| MarMachine website | May 1, 2006, 4:11 pm |
| Kewl website | December 24, 2008, 12:21 pm |
| Tanaka Welding Website | July 27, 2006, 12:29 pm |
|
|
> If you try downloading my website algebra.com, you will get into an
> infinite recursion through millions of pages. That's why I prevent
> most such bots from accessing my site. This would work only on very
> simple sites.