Chapter 2: How Search Engines Work – Crawling, Indexing, and Ranking
2xx status codes: A class of status codes that indicate the request for a page has succeeded.
4xx status codes: A class of status codes that indicate the request for a page resulted in error.
5xx status codes: A class of status codes that indicate the server’s inability to perform the request.
Advanced search operators: Special characters and commands you can type into the search bar to further specify your query.
Algorithms: A process or formula by which stored information is retrieved and ordered in meaningful ways.
Backlinks: Or “inbound links” are links from other websites that point to your website.
Bots: Also known as “crawlers” or “spiders,” these are what scour the Internet to find content.
Caching: A saved version of your web page.
Caffeine: Google’s web indexing system. Caffeine is the index, or collection of web content, whereas Googlebot is the crawler that goes out and finds the content.
Citations: Also known as a “business listing,” a citation is a web-based reference to a local business’ name, address, and phone number (NAP).
Cloaking: Showing different content to search engines than you show to human visitors.
Crawl budget: The average number of pages a search engine bot will crawl on your site
Crawler directives: Instructions to the crawler regarding what you want it to crawl and index on your site.
Distance: In the context of the local pack, distance refers to proximity, or the location of the searcher and/or the location specified in the query.
Engagement: Data that represents how searchers interact with your site from search results.
Google Quality Guidelines: Published guidelines from Google detailing tactics that are forbidden because they are malicious and/or intended to manipulate search results.
Google Search Console: A free program provided by Google that allows site owners to monitor how their site is doing in search.
HTML: Hypertext markup language is the language used to create web pages.
Index Coverage report: A report in Google Search Console that shows you the indexation status of your site’s pages.
Index: A huge database of all the content search engine crawlers have discovered and deem good enough to serve up to searchers.
Internal links: Links on your own site that point to your other pages on the same site.
Login forms: Refers to pages that require login authentication before a visitor can access the content.
Manual penalty: Refers to a Google “Manual Action” where a human reviewer has determined certain pages on your site violate Google’s quality guidelines.
Meta robots tag: Pieces of code that provide crawlers instructions for how to crawl or index web page content.
Navigation: A list of links that help visitors navigate to other pages on your site. Often, these appear in a list at the top of your website (“top navigation”), on the side column of your website (“side navigation”), or at the bottom of your website (“footer navigation”).
NoIndex tag: A meta tag that instructions a search engine not to index the page it’s on.
PageRank: A component of Google’s core algorithm. It is a link analysis program that estimates the importance of a web page by measuring the quality and quantity of links pointing to it.
Personalization: Refers to the way a search engine will modify a person’s results on factors unique to them, such as their location and search history.
Prominence: In the context of the local pack, prominence refers to businesses that are well-known and well-liked in the real world.
RankBrain: the machine learning component of Google’s core algorithm that adjusts ranking by promoting the most relevant, helpful results.
Relevance: In the context of the local pack, relevance is how well a local business matches what the searcher is looking for
Robots.txt: Files that suggest which parts of your site search engines should and shouldn’t crawl.
Search forms: Refers to search functions or search bars on a website that help users find pages on that website.
Search Quality Rater Guidelines: Guidelines for human raters that work for Google to determine the quality of real web pages.
Sitemap: A list of URLs on your site that crawlers can use to discover and index your content.
Spammy tactics: Like “black hat,” spammy tactics are those that violate search engine quality guidelines.
URL folders: Sections of a website occurring after the TLD (“.com”), separated by slashes (“/”).
URL parameters: Information following a question mark that is appended to a URL to change the page’s content (active parameter) or track information (passive parameter).
X-robots-tag: Like meta robots tags, this tag provides crawlers instructions for how to crawl or index web page content.