How to Make Sure Robots Properly Crawl and Index Your Site

Search engines have robots that come to your site and take everything there is to take. But because the competition is so fierce, there is no way to get into the search engines, unless you pay for ads or hire an SEO (search engine optimization) consultant, right? Wrong!

Even if you pay a lot of money, if the robots used by search engines for indexing don’t see your site properly, chances are many of your pages never will.

In this article, I will discuss the importance of having your website structured correctly, the importance of using old-fashioned hyperlinks versus modern Flash menus, scripts, and extensions, and provide you with a very simple and free tool that will allow you to view your site. similarly do most indexing robots. But first, let’s define some of the concepts.

What is a www bot?

A bot is a computer program that automatically reads web pages and checks all the links it finds.

The first robot was developed by MIT and launched in 1993. It was dubbed the World Wide Web Wander and its initial purpose was purely scientific, its mission was to measure the growth of the web. The index generated from the results of the experiment proved to be an incredible tool and effectively became the first search engine. Most of the stuff online that we can’t live without today was born as a side effect of some scientific experiment.

What is a search engine?

Generally, a search engine is a program that searches through a database. In the popular sense, referring to the web, a search engine is considered to be a system that has a user search form, which can search through a repository of web pages gathered by a robot.

What is a bot? What is a spider? What is a tracker?

Bot is just a shorter and cooler (to some) version of the word robot. Spiders and crawlers are robots, only the names sound more interesting in the press and within metro-geek circles. For consistency, I will use the term robot throughout this article when referring to spiders, crawlers, and bots.

Are there other…things crawling around?

Oh yes, but these things are way beyond the scope of this article. Well, for you conspiracy theory buffs, let’s see… we have worms, self-replicating programs, webants (or ants), distributed cooperating robots, autonomous agents, intelligent agents, and many other bots and beasts.

How do robots work?

As with all other technical things, I believe that the only way you will use a technology to its full potential and to your best advantage is if you understand how that technology works. When I say how it works, I don’t mean intricate technical details, but fundamental processes, general things.

In general, robots are nothing more than reduced versions of web browsers, programmed to automatically navigate and record information about web pages. There are some very specialized robots, some that only search blogs, others that only index images. Many (like Google’s GoogleBot) are based on one of the first popular browsers, called Lynx. Lynx was initially a pure text browser, so in today’s Internet, Lynx would be extremely robust and fast. Basically, if you can program, you can take Lynx, modify it, and make a robot.

So how do these things really work? They get a list of websites and literally start “surfing” them. They come to your site and then start reading the pages and follow each link, while storing different information like page titles, the actual text of the page, etc.

Based on the above, what if instead of your beloved Internet Explorer, Firefox, Opera, or whatever browser you’re connected to, you went searching the Internet and downloaded a version of the venerable Lynx browser?

I’ll tell you what would happen, and some will probably accuse me of revealing one of the secrets the corporate SEO community doesn’t want you to know:

You will be able to see your site very closely the way a robot sees it. You’ll be able to look for errors on your pages and track down navigational errors that could prevent a robot from seeing parts of your site.

In plain language, let’s say you’ve created an attractive site. There is an index page, the first page one sees when entering your site. On that page you have the most amazing Flash navigation system, with a huge button that points to your products and services and the rest of the site. If Lynx goes to your index page and doesn’t see a standard link, it won’t be able to see the rest of your site. There are extremely high chances that many indexing robots will not see your site either.

Then you’ll understand why your massive site, which has one of the most intricate and functional Flash-based navigation systems on the planet, never makes it to the top of the search engines, even after all your efforts to submit it manually. everywhere. It’s simply because you forgot to add basic hyperlinks. It’s because when you submit a site, even manually, all that really happens is you say to the search engine “hey mr search engine, when you think you can find some time, submit your trusty bot to my site.”

Folks, bots generally can’t use menu navigation made in Flash, Java script, PHP, etc. and you will not be able to access your pages, it is as simple as that.

How do I get Lynx?

Lynx began as a UNIX application, written by the University of Kansas as part of its campus-wide information system. It then became a gopher application (a pre-web search tool), then a web browser. The official page for Lynx is http://lynx.isc.org, however, if you’re not a Linux geek, you used to play around with binary distribution files and you used to compile your own applications (don’t worry about what I just said ), you may want to find a version that someone else has already made usable for your computer. For example, if you’re a PC user running Windows, you might want to check out the links to “Win32 compiled versions.” At the time of this writing, one such site is http://csant.info/lynx.htm (called the distribution site) where you can download a version that will install on Windows machines in a way that will be familiar to most users. not experts. . After installing the browser, you may want to read the documentation. To get started and ease your beginner’s frustrations, I’ll tell you to press the G key (as in “go”), then type the full URL of the site you want to browse (starting with “http://”), then press Enter. Use the arrows to navigate.

In a nutshell, use Lynx to verify that all pages on the site are accessible and let the bots do all the work for you. You’ll save yourself a lot of hassle and maybe some money you’d waste advertising your otherwise unindexable site.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *