Back before Google, people used websites to link to other websites. People would put a link on their website to another website. Right. But it was hard to know what content was on those other websites. That's where crawlers came in.
Crawlers are programs that would go from website to website and look at all the content on those websites. A crawler is just a small piece of code. It's code that comes and looks at your website. Crawlers don't use a web browser to look at your website. Instead, they look at the code of your website to see what is supposed to be shown. Web crawlers crawl your website. They're just pieces of code to determine what is supposed to be shown on your website.
In this episode, you will learn more about crawling in SEO. The definition, importance, how it works, and how to control what is crawled.
Listen to the Episode
Atiba de Souza: Hey today on Traffic Keys, we're going to go back in time, just a little bit, and talk about web crawlers. Where they came from and are they even still important? Something that you should even know about and be considering in 2022 and beyond. Hey, everybody. Welcome to Traffic Keys. I am your host, Atiba. And today we are going to get into web crawlers.
Now let's go back in history for a little bit. Okay. So 20 odd years ago, I'm dating myself a little bit here, but 20 odd years ago, this thing started to become popular called the world wide web. Right. And well, a web has spiders, right? Because spiders and webs kind of go together. And what do spiders do? They crawl around their webs.
And so when the world wide web came out, we had all of these spider themed terms and some of them have stuck around for a very, very long time. Okay. One of them that stuck around is this term called a crawler. Now, there was a time when they were called spiders. Now, they're called. Crawlers and people ask and people are curious, what are they, why do they exist?
So let's dive in and really look at web crawlers and where they fit in the SEO kind of landscape. But before we get there, do me a favor. Hit that, like, and subscribe button for your boy. Alright. Okay. Here we go. So, this predates Google, even though people now only think about the Google bot and it crawling around the web, this pre-dates, Google, and what happened was the web was created and there were links, just like they are today, there were hyperlinks. People put a link on one website to another website. Right. That existed way back in the very beginning. And there was a challenge, if you will, of how do we know what content is out there? Heck, how do we know what content a particular website even has? Well, that's where crawlers came in.
So, a crawler is nothing more than a small piece of code. Okay. It's just some code that comes and "visits" your website. Now, I say "visits" your website because it doesn't pull up a web browser and load a web browser and take a look at your website and read it. No, it's actually pulling up the code of your website and interpreting what is supposed to be shown.
Now, that's a really key point. Okay. Web crawlers crawl your website. They're just pieces of code to determine what is supposed to be shown on your website. They don't actually look at your actual website, but now while they're crawling your site, what they're looking for is number one, the structure of your site, the content of your site to see— and I should even say "your site", the page that they land on. Okay. Be at your homepage or some other page. So they're looking at that particular page and they're trying to figure out what is it all about? What are you talking about? And does it make sense based on what it believes it knows about this topic, then it starts to look guys at all of your links that are on a particular page.
And it's going to follow those links. In other words, it's going to act like it clicked on the link and went to whatever page you link to and then read those pages. And that's how a crawler grows its knowledge base. By following link, after link, after link, after link, after link. Now, along came Google and Google kind of made crawlers a little bit more, not a little bit, a lot more intelligent.
And Google also started recognizing that, "Hey, sometimes people update content regularly. And so we should visit these websites regularly, not just one time, learning everything and never come back." And they started talking about how frequently they would visit particular sites to learn if there was new content, new links, et cetera, et cetera.
Okay. So, crawlers are critical to the SEO space because, very simply, that's how search engines learn the content that's on your website. So it can rank it. That's how they learn who you are and what you're all about. So, crawlers are super important from that standpoint in the SEO space, but crawlers have another major advantage and major use, I should say, that most people ignore. And that's in the SEO tools. Is that you have tools like Screaming Frog that actually will crawl your website for you and tell you, this is what the Google bot or other crawlers are seeing. Here are the problems. Here are the broken links. Here are the missing images.
Here is the content that it's seen, and it gives you a view into what the Google bot or other search engine crawlers see when they come to your website, now this is critical. This is huge. And it's critical when we're looking at auditing a website, because we want to know what's there. And how is it working?
Is it effective? Okay. I remember there was a large national side I did last year and what we learned was 90% of the PDFs that they were linking to on their website didn't exist anymore. They were all bad links. That's bad. That's really, really bad. Okay. Good crawlers, like Screaming frog will even tell you how many pages on a particular website link to one particular page.
So you have your, let's say your about us page. How many other pages linked to that page? Or you have your blog page, you know, a particular blog that you wrote that you think it should be ranked really well. Well, how many other pages on your site link to that page? Very often a tool like Screaming frog and other crawlers can answer those types of questions for you.
And those are important questions to answer, because those are the questions that Google's asking too. Now you're probably going to ask at some point. The question that everybody does, which is okay. So, is there a way I can control the crawler? Like, does it have to see everything? How does that work?
Right? You said there are links on my pages, so just follows all links. Well, Yes, and no. Yes and no. Number one, and let's start here from a very technical perspective, there are two files. That's four. There are two files that you need to have on your web server for your website. The first one is the robots that T X T.
So it's a text file, robots.txt file. And the other one is your sitemap.XML. Okay. Sitemap.XML. So the sitemap.XML, number one, when a crawler comes to your site, it's going to look for your sitemap.XML, because the sitemap.XML is your opportunity to tell the crawler, "Here are the pages and the hierarchy and structure of my site that I believe are important."
Okay. So this is your opportunity to tell the crawler. What you believe is important. Now. Is that all the crawler is going to see? Nope. It will read that it will go to those pages, but then it will go anywhere else at once. Unless, unless you add specific lines of code that you can look those up into your robots.txt file.
So, for example, If your website has an area where people can log in and then there's a back office too. It's like, you don't want that indexed of course not. So you can add that into your robots.txt file and tell the crawler, don't go to any pages in this sub-directory or don't go to any pages that start with this name or this file name or what have you.
Okay. So, those are the two files that you must know when you want to consider, where do you want the crawler to go? And what do you want to tell the crawler is important on your site? The roobots.txt, and then the sitemap.XML. Okay. So that's how you control the crawler. Now it's vitally important this very, very next step, especially if you're trying to rank for SEO in Google.
Now, if you're doing it on some other network and that's why you care fine, but if you want to rank well on Google, this very next step is crucial. Okay. In your Google search console, you can submit your site sitemap.XML. Now the beauty of submitting your sitemap.XML to the Google search console is when you do that, you're saying to Google, "Hey, I'd like you to come crawl my site. There's stuff here that you don't know about. I would like you to crawl my site." Now, depending on your website and how your website is structured, whether you're a WordPress site or a Shopify site, or, you know, whatever your CMS is, is what I should have said, your CMS for managing your website, each one of them has different tools. Sometimes native, sometimes there are plugins that you can add on that will help you create your sitemap.xml and submit it to Google search console. So check out whatever CMS you're using and the best practices there for how to create that sitemap.xml and submit it.
But it's important that you submit it, especially the first time. Go ahead and after it's created the very first time, take that URL, which is usually your domain.com or dot whatever you are /sitemap.xml. And then go to the Google search console and you can submit that. And that will kick off the process of you saying to Google, "Hey, send the Google bot my way. I want you to come index me, send the Google bot my way." Okay. Especially with a brand new website, super important to do. Alright, everybody. I hope this gave you a little bit of a window into crawlers and why they're important and how they're still important today in 2022 and beyond for your SEO. As always, if you have any questions, drop them down below.
I am Atiba. I'm here for you. I'm here to answer any of your questions about SEO that you may have. I will see you later. Bye.