SEO has become extremely complicated and technical over the years. I’ve heard that organic search went from roughly 200 rankings factors to over 500, possibly more. But that’s speculation – only a few ranking signals were ever officially confirmed by Google. Some were “discovered” in studies, but most of them are either based on assumption or anecdotes. That creates too much room for uncertainty, speculation and straightforward wrong information.

Deep Dive: (for beginners) what is a SEO ranking factor?
An SEO ranking factor is a signal Google uses to rank pages in Google Search.

Google applies “Ranking Signals” to its index of web documents to return the most relevant result when a user performs a search. It’s important to distinguish between indexing and ranking. Google builds an index of pages by using hyperlinks to crawl through the web. Ranking doesn’t happen in this step. Many people think that when Google cannot properly index a page, say because it uses non-compliant Javascript, it is a ranking factor. That’s not the case.

Ranking signals take lots of different parameters on and off a web document into account: content, links, structure, etc. Our goal as SEOs is to figure out what ranking factors Google uses, so that we can optimize sites to rank higher in Organic Search.

We need more clarity about what we do know and what we don’t know in SEO to improve our credibility, have better conversations and achieve better results. Google’s use of Machine Learning is already making it harder to understand ranking signals and algorithm updates. It will not get easier and speculation only adds to the noise.

Instead of analogy, we need to reason from first principles.

HOW WE DISCOVER RANKING FACTORS IN SEO

“What ranking factors do we certainly know to be true?” is not a simple question. Google is a black box and it won’t tell us the secrets to its $100 billion algorithm [13]. It’s often impossible to create laboratory conditions in which we can isolate a factor and measure its impact on rank (people tried [27]). On top of that, ranking factors aren’t as “clear” as they used to be. They changed a lot over time and now even seem to be weighed different depending on the query. Yet, there are other systems of similar nature that have been reverse engineered. It’s not impossible.

To advance our understanding, we can draw evidence from 7 sources:

  1. Google’s blog
  2. Public statements by Googlers, e.g. on Twitter, in presentations or in interviews
  3. Ranking factor studies/analyses
  4. The Google Quality Rater Guidelines
  5. Google’s basic SEO guide
  6. Patents Google registered or acquired
  7. Anecdotes (people running tests and drawing conclusions

None of these sources are perfect, but in combination, they give us the best picture possible. There’s always an angle you can attack this from. For example, officially confirmed signals still don’t tell us how their weighted in the sum of all signals. Statements on Twitter are often very broad. And we even see data that conflicts with some things Google says. But, we have to work with what we have ¯\_(ツ)_/¯.

ESTABLISHING FIRST PRINCIPLES OF SEO

First principles are the smallest building blocks; the things and laws we know to be true. Establishing first principles comes with three constraints. First, we have to distinguish between direct and indirect impact. Optimized meta-descriptions can positively impact organic traffic, but don’t have a direct impact on rank. Second, the questions “how much” and “in which case” are significant. Not every ranking factor applies to every query in the same way. For example, QFD (“query deserves freshness”) and HTTPS apply to only certain keywords. Third, we have to distinguish between positive and negative ranking factors (for example, 404 errors or “thin content”).

What’s the overarching goal I’m trying to achieve with this article? The goal is to sharpen our sense of proven truths in times of uncertainty. Google’s increasing usage of machine learning makes it harder than ever to understand the algorithm(s). But, by going back to the basics, we should be able to focus on results over speculative minutiae.

OFFICIALLY CONFIRMED RANKING FACTORS

We can put ranking signals into three groups:

  1. Officially confirmed by Google
  2. Discovered through analysis
  3. Speculated

I’m covering only confirmed and discovered signals in this article. I don’t see any sense in amplifying ranking signal speculations by covering them in this article.

The order in which the ranking factors are mention is my personal understanding of their significance. I understand content to be the most important signal on this list and E-A-T the least important. However, none of the signals are unimportant.

  1. Content
  2. External and internal links
  3. User Intent
  4. CTR
  5. User Experience
  6. Title tag
  7. Page speed
  8. Freshness
  9. E-A-T
  10. SSL encryption

RANKING SIGNAL 1: CONTENT

Returning the most relevant search results is the goal of every search engine. The roll-out of Hummingbird in 2013 was a milestone in getting closer to that goal: Google switched focus on entities and their relationships, which made it significantly better at understanding context and relevance.

In the early days of search, it was enough to mention a keyword many times on the page to be relevant. Now, content needs to have high relevance for the query, informational depth, answer all questions about a topic and match user intent. So, “Content as a ranking factor” means the length, depth, and relevance of body content for the targeted query.

Deep Dive: the nuance of content

Content is not only text; it’s also images, videos, gifs, and more. All these elements play together (more under “User Intent”). Ranking in Google’s image search is not the only benefit of optimizing images. Adding a descriptive alt-tag and file name increases the relevance of your content, especially for search queries that demand more visual results, like “star wars wallpaper”.

There’s also a difference between main content and supplementary content, i.e. text in the footer, header or parts of the site other than “the body”. It’s easy to see that the topic of “content” is very nuanced, but I’m trying to keep it high-level here.

Lastly, “pruning” low quality content has shown to be effective many times. The idea is to decrease the amount of low quality content on a domain by either improving or getting rid of it (noindex, 404 or redirecting). This indicates that Google measures content quality on a domain-level, at least to a degree. Note that this is not an official ranking factor, but John Mueller addressed the topic in a Webmaster Hangout, saying:

So in general when it comes to low quality content, that’s something where we see your website is providing something but it’s not really that fantastic. And there are two approaches to actually tackling this. On the one hand you can improve your content and from my point of view if you can improve your content that’s probably the the best approach possible because then you have something really useful on your website you’re providing something useful for the web in general. […] cleaning up can be done with no index with a 404 kind of whatever you like to do that.

How do we know this to be true?

Google SEO Starter Guide

Presentations:

Interviews:

Articles:

RANKING SIGNAL 2: EXTERNAL AND INTERNAL LINKS

Links still have a decent influence on rankings, but ranking factor studies and Google statements have shown its decline over time. They still play a role in the ranking and indexation of web documents. And, like “content” as a ranking signal, backlinks are a bit more nuanced. Their quality depends on many factors, such as anchor text, strength of the link source, and matching content relevance between link source and target.

Internal links are powerful ranking signals, too. They pass link equity from page to page. Internal anchor text helps Google understand the topic and context of content like external backlinks. Already in 2008, Google recommended to “keep important pages within several clicks from the homepage“. So, URL-structure has a positive impact on rankings because it’s an indicator of a clear hierarchy of information (system taxonomy). URL optimization revolves around clean, descriptive directory-structures without duplicates or parameters.

Deep Dive: age as a quality indicator for links (and content)

I want to call out a patent invented by Matt Cutts (some might remember him) and Jeff Dean (Google’s current head of AI), amongst others. It describes using historic information in ranking, but I want to narrow down on the factor of document age and its impact on the quality of a link. A rapid spike in the number of backlinks might indicate a spam attempt or be okay depending on how old a page/site is.

In implementations consistent with the principles of the invention, the history data may include data relating to: document inception dates; document content updates/changes; query analysis; link-based criteria; anchor text (e.g., the text in which a hyperlink is embedded, typically underlined or otherwise highlighted in a document); traffic; user behavior; domain-related information; ranking history; user maintained/generated data (e.g., bookmarks); unique words, bigrams, and phrases in anchor text; linkage of independent peers; and/or document topics.” [40]

The patent contains all kinds of interesting hints, so give it a read when you have time.

How do we know this to be true?

Patents:

Interviews:

Google SEO Starter Guide [2]

Articles

RANKING SIGNAL 3: USER INTENT

I’ve written about the different types of user intent and how to identify them for a large set of queries in “User Intent mapping on steroids”:

User intent” is the goal a user is trying to achieve when searching online. Old school SEO distinguished between “transactional”, “navigational”, and “informational” user intent. People either want to buy, visit a specific page or find out more about a topic.

That hasn’t changed dramatically, but in the 2017 version of its quality rater guidelines, Google distinguishes between four intents:
– Know
– Do
– Website
– Visit-in-person

Content relevance and User Intent are closely related, but not the same. First, if user intent isn’t met a page won’t rank, whereas content relevance exists on a spectrum. For example, a blog article cannot rank for a query that demands listings, say for jobs or real estate. Or when you search for “Sushi”, you get local search results. Google understands that more users are looking for restaurants than an explanation or definition in this case. For some queries, images are a better format than text, for example, “tattoo inspiration”. In this case, you want to create an image gallery to rank well, not an essay.

RankBrain is the engine behind user intent understanding and the third strongest ranking signal according to Google:

Of the hundreds of “signals” Google search uses when it calculates its rankings (a signal might be the user’s geographical location, or whether the headline on a page matches the text in the query), RankBrain is now rated as the third most useful.

It’s described to assess “how well a document in the ranking matches a query” (Jeff Dean, head of AI at Google in a Wired article 2016 [31]).

How do we know this to be true?
Presentations:

Interviews:

Articles:

RANKING SIGNAL 4: CLICK-THROUGH RATE

Click-through rate is the ratio between clicks and impressions in the Google search results. It’s affected by:

The exact usage of CTR in ranking is not 100% clear. It often falls between the cracks of using general feedback mechanisms in search. The questions here are how strong compared to other signals CTR is, whether it affects rankings in real-time (unlikely), or if there is an accumulation time. Besides Google being unclear about its usage, two papers show strong evidence for Google using CTR to rank pages.

There’s also evidence that Google is able to distinguish between more than just long and short clicks: “[…] rather than simply distinguishing long clicks from short clicks, a wider range of click-through viewing times can be included in the assessment of result quality, where longer viewing times in the range are given more weight than shorter viewing times.” [15]

How do we know this to be true?

Patents:

Presentations:

RANKING SIGNAL 5: USER EXPERIENCE

User Experience is one of the blurriest ranking signals of all because it’s so had to define and overlaps with many other signals. It could entail all touch points a user has with a company, but that’s impossible to measure for a search engine. It’s too soft. Instead, we need to look for hard factors:

A page is accessible when it loads completely, quickly, and without issues. One way to optimize for this particular case is by providing image dimensions to avoid the “jump” when a page loads. But Ad pressure and invasiveness of ads fit into the bucket as well.

Compatibility with different devices, search functionality, and 404 errors are indicators for usability.

What most people have in mind when thinking of “user experience” is design and it does carry some importance. For example, If a site looks spammy users bounce, which can have implications on rankings. Important factors for “design’ are how easy it is for users to find and consume information and how trustworthy the experience looks. The latter plays into the next signal: E-A-T.

Good indicators for User Experience are user signals (bounce rate, dwell time, pages/visit) and engagement signals (social shares, scroll depth).

How do we know this to be true?

Articles:

RANKING SIGNAL 6: TITLE TAG

The Title tag has been one of the stronger ranking signals from the beginning. It’s a strong indicator of relevance and affects CTR. Having the keyword in the title is still a requirement to rank, even though Google understands the context of queries. Google looks at “[…] how often and where those keywords appear on a page, whether in titles or headings or in the body of the text.

How do we know this to be true?

Articles:

RANKING SIGNAL 7: PAGE SPEED

Google confirmed page speed to have an impact on rank in 2010 for the first time [22] and in 2018 for the second time [21]. Where the former relates to desktop devices, the latter refers to mobile search (to no one’s surprise).

10 years ago, page speed was a simple metric. Nowadays, we need to measure several metrics to get a good understanding, as websites have become much more sophisticated. Google’s own page speed tool, WebPageTest, recommends “Speed Index” as unifying metric. It accrues metrics like TTFB (time to first byte), TTFP (time to first paint), TTFMP (time to first meaningful paint), and time to DOMContentLoad.

How do we know this to be true?

Articles:

RANKING SIGNAL 8: FRESHNESS AND QDF

Fresh results are a top goal of search engines, after relevance. As mentioned in the Google SEO Starter guide:

Traditional search evaluation has focused on the relevance of the results, and of course that is our highest priority as well. But today’s search-engine users expect more than just relevance. Are the results fresh and timely?

“Freshness” in search got a push when Google introduced its new indexation system “Caffeine” in 2010. [37] It allowed Google do index (new) pages in a matter of seconds and paved the way to assign a query “freshness”: a higher relevance for time. The query “Bitcoin” is highly sensitive to news these days, for example, while that wasn’t the case 2 years ago.

“Query deserves freshness” QDF is the ranking signal Amit Signal, former head of search at Google, talked about already in 2007: “The QDF solution revolves around determining whether a topic is “hot”. If news sites or blog posts are actively writing about a topic, the model figures that it is one for which users are more likely to want current information. The model also examines Google’s own stream of billions of search queries.” [38]

The difference between “Freshness” and QDF is that the latter measures spiking search volume to indicate whether a query is “hot”. It ranks newer content higher and shows more news integrations in the SERPs as a result. The former refers to keeping content up to date by adding new facts or findings. Search engines always want to return content that’s as up to date as possible, but that’s not the same as a query that suddenly has a high interest. The two vary in intensity.

How do we know this to be true?

Patents:

Articles:

Videos:

RANKING SIGNAL 9: E-A-T (EXPERTISE, AUTHORITY, TRUSTWORTHINESS)

E-A-T (“expertise, authority, trustworthiness”) is another broad signal, like user experience. To optimize for E-A-T, you need to add information to your site that helps Google understand whether you’re an authority, for example by adding an “about” page or providing a correct and full address. Your content needs to live up to the required expertise in quality and length. Writing about rocket science sounds and looks a lot different than writing about rap (no judgment). Google will also look at recommendations and endorsements from other, neutral sites. Yes, that also includes links from highly authoritative sites like Wikipedia.

E-A-T includes factors like domain age, reputation, reviews, and ratings. Some of us might remember the days of rel=author, an attempt of Google to measure the expertise of people for specific topics. Google retired authorship, but the idea is the same.

How do we know this to be true?

Articles:

Presentations:

Articles:

RANKING SIGNAL 10: SSL ENCRYPTION

Google confirmed SSL being a ranking signal in 2014, after migrating to https itself two years earlier. Once again, the question is and was how much that signal applies. Back when Google rolled it out, HTTPS affected about 1% of queries and seemed to carry less weight than content:

For now it’s only a very lightweight signal—affecting fewer than 1% of global queries, and carrying less weight than other signals such as high-quality content—while we give webmasters time to switch to HTTPS. But over time, we may decide to strengthen it, because we’d like to encourage all website owners to switch from HTTP to HTTPS to keep everyone safe on the web.

Encryption is more important in industries like insurances, finance, and e-commerce than in others. It’s also more applicable in the check-out/login part than the blog of a site, for example. Google seems to give certain queries and parts of a website a higher relevance for HTTPS. That doesn’t make HTTPS unimportant in other cases: Google often emphasizes the benefits of HTTPS for general security.

How do we know this to be true?

Articles:

RANKING FACTOR STUDY “META-ANALYSIS”

The strongest evidence in scientific research comes from a meta-analyses study. It looks at the data from many different studies on the same topic to form a holistic view. I conducted a “pseudo ranking factor study meta-analysis”, in which I compared the results of 7 studies from the last 2 years by Searchmetrics, SEMrush and Backlinko*. It’s “pseudo” because I couldn’t get insight into the raw data of the studies, so all the scientists in the audience can calm down ;-). (If any ranking factor study provider wants to grant me access – I’m all ear!).

On the chart, you see the top10 ranking factors from each study. I grouped them into five bigger fields (colored), so we can see the overlaps.

(links = orange, content = yellow, user behavior = blue, social = green, technical = gray)

When we look at the ranking factors across different studies – I don’t think anyone has ever done that before – we see foremost one thing: a big mess. On second look, I see a slight dominance of content relevance and length paired with user behavior. Backlinks seem to live on the lower end of the top10.

When it comes to backlinks, the sheer number of links and linking domains seem to still be the most prominent factor.

We can debate the meaningfulness and interpretation of ranking factor studies for the SEO industry, but I’m always open to learning from large sets of data. This little analysis merely helps to see the bigger picture.

*More caveats: Also note the timeliness of the studies. Ranking factors seem to change (or adapt?) faster in the last couple of months. Lastly, some studies focused on broad keyword sets while others looked at specific industries. That makes them only comparable to a degree.

ORGANIC SEARCH IS A NON-LINEAR SYSTEM

Organic Search is a non-linear system, meaning the whole is greater than the sum of its parts. Some factors seem to compound, others seem to be driven by thresholds. Having great content, links, and user, experience seems to have a stronger effect than each factor added in isolation. Google also seems to measure negative factors with thresholds: a few 404s won’t hurt, but after a certain percentage Google seems to reinforce negative consequences. I only have observational evidence for this, so I’m curious on your experience!

Fact is, we don’t know the exact relationship between each ranking factor. And, If there are really 200 (or more) ranking factors, we must admit that most are unknown to us. That doesn’t mean we cannot speak about them or do experiments, but we must be honest about what we know and what we don’t know.

But even without that knowledge, we can focus on the parts we know make a difference – on the first principles of SEO:

  1. Content
  2. External and internal links
  3. User Intent
  4. CTR
  5. User Experience
  6. Title tag
  7. Page speed
  8. Freshness
  9. E-A-T
  10. SSL encryption

You can never do the basics well enough.

References

  1. https://www.google.com/search/howsearchworks/crawling-indexing/
  2. https://support.google.com/webmasters/answer/7451184?hl=en
  3. https://backlinko.com/search-engine-ranking
  4. https://backlinko.com/google-ranking-factors
  5. https://www.semrush.com/ranking-factors/
  6. https://www.searchmetrics.com/knowledge-base/ranking-factors/
  7. https://www.searchmetrics.com/knowledge-base/ranking-factors-finance/
  8. https://www.searchmetrics.com/knowledge-base/ranking-factors-travel/
  9. https://www.searchmetrics.com/knowledge-base/ranking-factors-media/
  10. https://www.searchmetrics.com/knowledge-base/ranking-factors-health/
  11. https://abc.xyz/investor/pdf/2017Q4_alphabet_earnings_release.pdf
  12. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf
  13. https://patentscope.wipo.int/search/en/detail.jsf?docId=US146703304
  14. http://www.thesempost.com/how-google-uses-clicks-in-search-results-according-to-google/
  15. https://googleblog.blogspot.com/2008/07/technologies-behind-google-ranking.html
  16. https://www.slideshare.net/SearchMarketingExpo/how-google-works-a-ranking-engineers-perspective-by-paul-haahr
  17. https://googleblog.blogspot.com/2011/11/giving-you-fresher-more-recent-search.html
  18. http://webpromo.expert/google-qa-march/
  19. https://webmasters.googleblog.com/2018/01/using-page-speed-in-mobile-search.html
  20. https://webmasters.googleblog.com/2010/04/using-site-speed-in-web-search-ranking.html
  21. https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440
  22. https://www.youtube.com/watch?v=muSIzHurn4U
  23. https://security.googleblog.com/2014/08/https-as-ranking-signal_6.html
  24. https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45562.pdf
  25. https://engineering.purdue.edu/~ychu/publications/wi10_google.pdf
  26. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=9,659,064.PN.&OS=PN/9,659,064&RS=PN/9,659,064
  27. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=08484194&OS=PN/08484194&RS=PN/08484194
  28. https://www.google.com/insidesearch/howsearchworks/assets/searchqualityevaluatorguidelines.pdf
  29. https://www.wired.com/2016/06/how-google-is-remaking-itself-as-a-machine-learning-first-company/?gi=e27d6becfaf8
  30. https://webmasters.googleblog.com/2006/12/better-understanding-of-your-site.html
  31. https://webmasters.googleblog.com/2008/10/importance-of-link-architecture.html
  32. https://webmasters.googleblog.com/2008/10/good-times-with-inbound-links.html
  33. https://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines
  34. https://web.archive.org/web/20111115090558/http://www.google.com/about/corporate/company/tech.html
  35. https://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
  36. https://www.nytimes.com/2007/06/03/business/yourmoney/03google.html
  37. https://www.youtube.com/watch?time_continue=17&v=QyFlIhruda4
  38. https://patents.google.com/patent/US7346839
  39. https://patents.google.com/patent/US6285999B1/en
  40. https://www.seroundtable.com/google-improving-pruning-content-24706.html
  41. https://www.youtube.com/watch?v=cBhZ6S0PFCY&utm_source=wmx_blog
  42. https://support.google.com/webmasters/answer/76329?hl=en&ref_topic=4617741
  43. https://support.google.com/webmasters/answer/114016?hl=en
Skip to content