Episode 6 – Setting SEO Guidelines For A Whole Niche

Get it on iTunes

[These are the show notes for the podcast episode]

Hi and thank you for listening to SEOFightClub.org. I’m Ted Kubaitis and I have 22 years of web development and SEO experience. I have patented web technologies and started online businesses. I am both an engineer and a marketer. My goal is to help you win your SEO fights.

This episode’s FREEBIE

With every episode I love to give something away of high value. This episode’s SEO freebie is BOTH the data I collected on over 300 factors for our target search terms and the 2 page template of content tuning guidelines for the search terms. So if you are an SEO data junkie like I am then this download is GOLD.

You can download the freebie at: http://seofightclub.org/episode6

Search Terms:
seo agency
seo analysis
seo audit
seo package
seo pricing
seo services company
seo services packages
seo services pricing
seo services seattle
seo services

Search types:
60.00% Right Side Ads,Top Ads,Related Searches
30.00% Right Side Ads,Top Ads,Related Searches,Local Business Results
10.00% Less Than 10 Results,Right Side Ads,Top Ads,Related Searches,Image Results

Types of Result Pages:
1st 8 home pages , 1 blog post , 1 category/service page
2nd 3 home page , 5 blog posts , 1 category/service page
3rd 5 home pages , 1 blog post , 4 category/service pages
1 home page , 1 blog post , 7 category/service pages
0 home pages , 5 blog posts , 5 category/service pages
4 home pages , 2 blog posts , 4 category/service pages
1 home pages , 0 blog posts , 9 category/service pages
0 home pages , 6 blog posts , 4 category/service pages
6 home pages , 0 blog posts , 4 category/service pages
0 home pages , 3 blog posts , 7 category/service pages

28 (29%) Home pages 24 (24%) blog posts 46 (47%) category/service pages

So odds are you need to target your services page or home page if possible otherwise a blog post.

Types of factors:
Zone Matches: Refers to keyword matches in various zones within a page
Page Qualities: Refers to measurements about a page not involving social or keyword matches
Social: Refers to measurements about a page involving social signal or links

Of the top 50 correlations:
19 (38%) were for zone matches
22 (44%) were for page qualities
9 (18%) were for social signals and linking.

Of the 11 factors showing strong correlation we found:

3 (27%) were for zone matches
section tags
og:title
leading matches in H3 tags
6 (55%) were for page qualities
# of site links
length of meta description
# of images
2 (18%) were for social signals and linking.
plus one count
links twitter accounts

Correlation Strength:

Strong Correlation – Statistically significant. The correlation coefficients exceeded critical values. According to the math geek (myself being one of them these are the only factors that “official” count)

Weak Correlation – These are the factors that almost made the cut. So I find them interesting but not significant.

No Correlation – These are the factors that lacked any evidence of being a factor in ranking.

Of the 39 factors showing weak correlation we found:

16 (41%) were for zone matches

Number of matches in canonical URL
Number of matches in web page sentences
Number of matches in web page alt attributes
Number of matches in web page P tags
Number of matches in web page H4-H6 tags
Number of matches in web page LI tags
Number of exact matches in web page I tags
Number of matches in the top 1 word by keywords density
Number of matches in the top three words by keywords density
Number of matches in the top five words by keywords density
Number of matches in the top ten words by keywords density
Number of exact matches in web page B tags
Number of matches in web page meta og:site_name
Number of matches in web page H4 tags

17 (44%) were for page qualities

Number of pages in bing site search for hostname
Has canonical URL in web page
Size in kilobytes of web page body tag
Size in kilobytes of web page html source
Number of forms in web page
Number of https links in web page
Number of do follow links in web page HTML source
Has schema.org organization in web page HTML source
Number of links in web page
Number of internal links in web page HTML source
Has schema.org article in web page HTML source
Number of sentences in web page html source
Number of words in web page stripped text
Size in kilobytes of web page stripped html text
Length in characters of Google result link text
Number of nofollow links in web page HTML source
Has rewritten title in Google result link text

6 (15%) were for social signals and linking.

Number of comments in facebook api for the URL
Number of likes in facebook api for the URL
Number of shares in facebook api for the URL
Number of likes,shares,comments, and clicks in facebook api for the URL
Number of youtube accounts in web page links
Number of social accounts in web page links

The Degree of Tuning:

When you start measuring factors for the purpose of achieving competitive parity you quickly will realize that not all keyword niches are created equal. I always chuckle when I see SEO advice that says things like “You need a title match, a heading match and a 3000 word article with X% keyword density” because having pulled the measurements for thousands of different keywords I can tell you that kind of advice is very specific and applies to a very specific keyword niche. Odds are that anyone following that advice will either invest a whole lot more effort than they need to or they will invest no where near enough. The odds of that kind of advice being correct for any random keyword is going to be a real long shot.

When you need to work with aggregate measurements from across a whole niche of keywords you really want to consider the scope of your aggregates. Jewelry is a lot more popular than home office decor. If you aggregated the measurements together for the two ninches you will likely end up under tuning for jewelry and over tuning for home office decor. But for the sake of saving time and money you will probably want to aggregate your jewelry keywords together and your office decor keywords together to get the competitive parity estimates for your large scale website. So when I pulled the keywords for “SEO services” in our example I did so knowing it was an adequately scoped category of very related keywords. The measures I am about to give are for my “SEO Services” niche specifically and probably won’t be applicable to other keyword niches. But this at least will give an example of a single niche’s degree of tuning.

Strong Correlation:

4 site links in Google result
355 image tags with alt text in web page
43 characters of web page meta description
11154 plus ones in google api for the URL
793 matches in web page section tags
617 kilobytes of web page script tags
108 to 143 characters of Google result summary
8 matches in web page meta og:title
9 twitter accounts in web page links
1 leading matches in web page H3 tags

Weak Correlation Page Qualities:

12 to 25 pages in bing site search for hostname
Has 1 canonical URL in web page
At least 10 do follow links in web page HTML source
Has schema.org organization in web page HTML source
Has schema.org article in web page HTML source
At least 7 sentences in web page html source
At least 126 words in web page stripped text
36 – 48 characters of Google result link text
350 nofollow links in web page HTML source
Has rewritten title in Google result link text

Weak Correlation Zone Matches:

4 matches in canonical URL
6 matches in web page sentences
12 matches in web page alt attributes
1 match in web page P tags
9 matches in web page H4-H6 tags
1261 matches in web page LI tags
4 matches in the top three words by keywords density
4 matches in web page meta og:site_name
9 matches in web page H4 tags

Weak Correlation Social Matches:

2333 comments in facebook api for the URL
3 youtube accounts in web page links
24 social accounts in web page links
8660 likes in facebook api for the URL
47154 shares in facebook api for the URL
54272 likes,shares,comments, and clicks in facebook api for the URL

A question I get all the time is “aren’t the factors you measure for one page usually the same factor you find on other pages?” The answer is not always. I like to use ecom categories as an example because sometimes social signals are a factor for ecom categories and sometimes they are not. The niche of the category seems to matter as to wether or not the social factors come into play for rankings. So the reality is that you have to measure for your keywords specifically to know for sure. My heart sinks every time I hear an “Expert SEO” tell someone that they must focus on social signals for SEO. Every business should focus on social because that is where the world is going, but if you are a cash strapped startup then investing in social signals for SEO benefit might end up being a big waste of time and money depending on the characteristics of your keyword niche and audience.

I sell software that collects these measurements for over 300 factors, but you don’t need the software to conduct this analysis. Just conduct a search and take some measurements and plot some graphs. Challenge what you think is true about SEO topics like keyword stuffing or the value of H1 tags or any other SEO advice you’ve heard so many times you simply automatically assume it is still true. You don’t start succeeding in SEO by buying tools. You start to succeed in SEO when you start thinking critically and proving things for yourself and only make changes to your money sites that have the highest probability to bring improvement.

I don’t gamble on my revenue sites. I experiment with tunings on small revenue generating test sites. If the experiments results are good then and only then will I consider deploying the tuning to the big sites and only after evaluating the risks. Nothing in SEO has no risk. If you think white hat is safe then you are a fool. All black hat SEO was white hat at one point in time. Its a moving target that in the past has change with little warning.

ENDING:

As always there where a lot of details in this episode. Be sure to download the FREEBIE which is BOTH the data I collected on over 300 factors for our target search terms and the template of content tuning guidelines for the niche.

Also be sure to come back for next week’s episode which will be all about SEO interview questions. If you’ve seen what passes for SEO interview questions on Google then you know they aren’t worthy of screening the unpaid intern. I’ll be diving into both sides of an SEO interview.

Thanks again, see you next time and always remember the first rule of SEO Fight Club: Subscribe to SEO Fight Club

Episode 5 – Setting SEO Guidelines For A Single Page

Get it on iTunes

Title: Setting SEO Guidelines For A Single Page

Intro:

Hi and thank you for listening to SEOFightClub.org. I’m Ted Kubaitis and I have 22 years of web development and SEO experience. I have patented web technologies and started online businesses. I am both an engineer and a marketer. My goal is to help you win your SEO fights.

This episode’s FREEBIE

With every episode I love to give something away of high value. This episode’s SEO freebie is BOTH the data I collected on over 300 factors for our target search term and the 2 page template of content tuning guidelines for the search terms. So if you are an SEO data junkie like I am then this download is GOLD.

You can download the freebie at: http://seofightclub.org/episode5

In this episode I’ll be taking measurements and constructing some content tuning guidelines to tune a single page for a single keyword. Obviously you want to write quality content that will pass Google’s quality rating guidelines. Given that we all agree on that point I just want to add a few extra requirements to help ensure we are achieving competitive parity in a “full text indexing” perspective. In episode 4 “The Truth About Keyword Stuffing” we did a dive into the fundamentals of “full text indexing” and the basic problem Google is solving at its core. We learned that when all else is equal there are places on the page where whoever says it more tends to win. It was not true that stuffing your pages in general did anything for you and we know there are manual penalties for doing that, so don’t be dumb. But there is a concept of competitive parity and there are zones with a web page that appear to matter more. This isn’t my opinion but it is coming from empirical measurements and the math of statistical correlation.

To start I like to look at all the match words. The match words are the words in that Google puts in bold case in the search results to highlight them has relevant hits for your search terms. These match words are the terms that appear to be getting credit for search relevance and when I talk about matches later on these are the matches I am talking about.

Matching words for “gourmet gift baskets” search:

basket
baskets
gift
gift basket
gift baskets
giftbasket
giftbaskets
gifts
gourmet
gourmet gift baskets
gourmetgiftbaskets

Next I like to characterize the kinds of pages that are ranking well:
Ecom category pages
lots of product tokens and no comment threads

We can see right away that we should be tuning a category page for this keyword.

Next I look at what search features in the search results are we competing with:

Related Searches
Top Ads
Right Ads
Google Shopping Results

These are other channel opportunities to win the business for the traffic on this keyword. In many cases you want to be in all of these zones.

Next I start to look at competitive parity with factor measurements. I create software that measures over 300 factors for each results for your keywords. The software also computes the mathematical correlation so we have strong clues as to which factors are arguably helping our rankings the most. By bringing this kind of math and empirical method to SEO we can save a lot of time and effort and focus on the areas where there is evidence of benefit or opportunity.

Strong Correlation:

social:
social accounts in general,twitter accounts
google plus pages
facebook accounts
social pages in general
likes in facebook api
likes,shares,comments, and clicks in facebook api
comments in facebook api for the URL
shares in facebook api for the URL
plus ones in google api for the URL

matches
leading matches in web page H1 to H6 tags

Weak Correlation:
Social
instagram pages in web page
matches in web page meta og:site_name
Website Info
phone numbers in web page HTML source
terms links in web page A tags
privacy links in web page A tags
SERP Context
exact matches in Google result URL
exact matches in Google result URL
site links in Google result
Matches
matches in web page class attributes
matches in web page H4 tags
matches in web page H4-H6 tags
matches in web page H1-H6 tags
leading matches in web page H1 to H3 tags
matches in web page HTML comments
matches in web page H1-H3 tags
Page Characteristics
kilobytes in web page body tag
do follow links in web page HTML source
words in web page stripped text
internal links in web page HTML source
links in web page
kilobytes in web page stripped html text
kilobytes in web page script tags
image tags with alt text in web page

To summarize:
Very important:
Social signals,
social linking to pages and accounts,
Keyword matches in headings
Also Important:
Contact Info,
Terms of Service,
Privacy Policy,
Amount of Main Content
Images
Supplemental Content

Degree of Tuning:

Next we need to measure the “Degree of Tuning”. With a higher competition keyword you often have a larger amount of tuning required. The degree of tuning is the amount of work you need to do so your empirical measures meet or exceed your competitors. This is called “comptitive parity”. This changes from one keyword to the next so the tuning for the search term in this example might be a lot more work than you need for the tuning in your niche. Or maybe it is nowhere near enough tuning when compared to your niche. The only way to know is to make the measurements for your specific keywords.

Lets look at the degree of tuning for this example:

Social
accounts
social accounts in web page links with a value of at least 3
twitter accounts in web page links with a value of at least 1
google plus page in web page links with a value of at least 1
facebook accounts in web page links with a value of at least 1
pages
social pages in web page links with a value of at least 2
instagram pages in web page links with a value of 1
signals
likes in facebook api for the URL with a value of at least 116
likes,shares,comments, and clicks in facebook api for the URL with a value of at least 361
comments in facebook api for the URL with a value of at least 42
shares in facebook api for the URL with a value of at least 202
plus ones in google api for the URL with a value of at least 286
meta data
open graph meta tags
Matches
Headings
leading matches in web page H1 to H6 tags with a value of at least 1
leading matches in web page H1 to H3 tags with a value from 1 to 2
matches in web page H4 tags with a value of 1
matches in web page H1-H3 tags with a value from 5 to 20
Website Info
phone numbers with a value from 2 to 4
terms link with a value of 1
privacy link with a value of 1
SERP Context
exact matches in Google result URL with a value greater than 0
site links in Google result with a value of 1
page characteristics
images
image tags with alt text in web page with a value from 18 to 45
links
internal links in web page HTML source with a value from 175 to 416
links in web page with a value from 190 to 433
do follow links in web page HTML source with a value from 190 to 410
page size
words in web page stripped text with a value from 930 to 1373
kilobytes in web page stripped html text with a value from 7 to 10
kilobytes in web page body tag with a value from 85 to 171

Again these aren’t my opinions. These values were measured from the sites that were ranking for our target keyword. The math of statistical correlation told me which factors appear to be the most important. As you can probably sense already, if you measured all of this for a different keyword all these values would likely be different because people are driven to compete at different levels based on the perceived value of the keyword to their business. I can also tell you that when you change types of keywords Google treats them differently with respect to which factors are more or less important. For some search types social signals and privacy policies are critical, but for other search types different factors seem to matter more. You have to measure and calculate to know.

Finally we get to Google’s Quality Rating Guidelines. Google now tells us what they specifically like and don’t like in terms of their manual review of websites. I would wage that some of their guidelines are enforced automatically by the Google algorithms but we know at a minimum they are enforced manually by human beings who visit your site. We should consider these as well in our recommendations because doing so will help future proof the performance of our page.

http://static.googleusercontent.com/media/www.google.com/en//insidesearch/howsearchworks/assets/searchqualityevaluatorguidelines.pdf

Here are some of the things Google calls out for high quality pages:
A satisfying amount of high quality MC.
The page and website are expert, authoritative, and trustworthy for the topic of the page.
The website has a good reputation for the topic of the page.
A satisfying amount of website information, for example, About Us information, Contact or Customer Service
SC which contributes to a satisfying user experience on the page and website.
Functional page design which allows users to easily focus on MC and use SC as desired.
A website which is well cared for and maintained.
high quality content takes a significant amount of: time, effort, expertise, and talent/skill.
The MC should be prominently displayed “front and center.”
The MC should be immediately visible when a user opens the page
Ads and SC should be arranged so as not to distract from the MC
It should be clear what parts of the page are Ads
About Us information.
Contact or Customer Service information.
Information about who is responsible for the content and maintenance of the website

Here are some of things Google calls out as low quality

The quality of the MC is low.
There is an unsatisfying amount of MC for the purpose of the page.
The author of the page or website does not have enough expertise for the topic
The website has a negative reputation.
The SC is distracting or unhelpful for the purpose of the page.
There is an unsatisfying amount of website information.
The page is lacking helpful SC.
The page design is lacking.
The website is lacking maintenance and updates.
Buying papers online or getting someone else to write for them.
Making things up or incorrect facts and information.
Writing quickly with no drafts or editing.
Filling the report with large pictures or other distracting content.
Copying the entire report from an encyclopedia, or paraphrasing content by changing words or sentence
structure here and there.
Using commonly known facts, for example, “Argentina is a country. People live in Argentina. Argentina has
borders. Some people like Argentina.”
Using a lot of words to communicate only basic ideas or facts, for example, “Pandas eat bamboo. Pandas eat
a lot of bamboo. It’s the best food for a Panda bear.”
Many Ads or highly distracting Ads
Repeated insertion of Ads between sections of the MC
Invasive Ads, such as popups that cannot be closed.
A large quantity of Ads with a relatively small amount of helpful MC.
Text ads, placed beside or within the site’s navigation links, which may confuse users
Poor page design

Here are the things Google considered the Lowest Quality

Harmful or malicious pages or websites.
True lack of purpose pages or websites.
Deceptive pages or websites.
Pages or websites which are created to make money with little to no attempt to help users.
Pages with extremely low or lowest quality MC.
Pages on YMYL websites with completely inadequate or no website information.
Pages on abandoned, hacked, or defaced websites.
Pages or websites created with no expertise or pages which are highly untrustworthy, unreliable,
unauthoritative, inaccurate, or misleading.
Websites which have extremely negative or malicious reputations
fake content
fake identity
gibberish
copied/stolen content
no website information
spammed comments
Lacking in purpose

Most people are unaware that Google punishes the content in the middle… The Medium Quality Punishments

Nothing wrong, but nothing special
Mixed, but with some redeeming qualities

So using Google’s likes and dislikes and using the data we gathered by direct measurements and our first hand observations we can create some guidelines for how to tune the content for this search term.

So lets go over the guidelines!

Content Tuning Guidelines For: Gourmet Gift Baskets

This search term benefits an ecommerce category page the most. The search term implies a desire to view a selection. Social signals appear to matter strongly for this search term. Perhaps Google views “Gourmet” as a synonym for “Best” in this context. The category should display products by highest rating and each product token should should use product and average rating schema markup.

The page should be tuned of one single target search term. Related terms should have their own specific pages.

This page should also be targeted in AdWords and Google Product Feeds for the same search term.

The page should have embedded social meta data (open graph meta tags) for facilitate social shares with quality images, description, and product information.

Links to accounts and pages for Facebook, Twitter, Google Plus, and Instagram are required.

UI components for liking,sharing, and commenting about this page in social channels should be employed.

There should be 5-20 matches in various heading tags on the page. This is likely best accomplished by making the category name an H1 tag and product token names either an H2 to denote significance or an H4 because they continue correlate in multiple studies.

The page must display a phone number and provide clear links in the footer to the Terms of Service and Privacy Policy

A dedicated contact page is recommended, but phone number should be on every page.

The URL should contain an exact match. A restful style URL with a keyword slug is recommended.

The page Title and H1 should have a leading match.

The Category page should display 20-40 product tokens with quality images.

The page should offer supplemental content in the form of the following:
Sorts and Filters
Category Navigation
Site Search
Website Information
Specials and Promotions
Similar & Recommended Products

Social Signals should eventually look like or better than the following
Shares: about 200
Likes: about 100
Comments: about 40

Product Token images are required to have alt text containing the product name.

Product names should be tuned to at least contain partial matches of the target search term.

The category page should have 900-1400 words of main content. This is best achieved with product names and promotional messaging in the product tokens. Some competitors also describe quality, service, and brand messaging. There is a risk of the non-product specific text becoming duplicate as it tends to be repeated from page to page and from category to category. It is recommended that you find a way to mitigate that kind of duplication.

Every page should have the following:
Unique Title
Unique Meta Description
Unique H1 Heading
Unique URL
Canonical URL Tag
rel next pref for pagination

The page should avoid 3rd party advertisements.

The page should clearly indicate that it is a category page and provide relevant bread crumbs if appropriate for category navigation.

The intuitive goal of the page should be to find and click on a desired product token AND/OR sharing the page socially.

Following these recommendations adheres to the requirements for a “high quality” rating according to Googles published guidelines and avoids the hazards found on low quality pages.

ENDING:

If you have a business with only a handful of target search terms then you should be specifying your content tuning on a word by word basis. If you have a business that targets hundreds or thousands of target search terms then this isn’t going to work for you. For that you will have come back for our next episode where I will walk you through creating SEO content tuning guidelines for lots of pages and keywords. It’s a whole different game.

Please download the FREEBIE which is BOTH the data I collected on over 300 factors for our target search term and the 2 page template of content tuning guidelines for the search terms.

Thanks again, see you next time and always remember the first rule of SEO Fight Club: Subscribe to SEO Fight Club

Episode 4 – Real Negative SEO

Get it on iTunes

Hi and thank you for listening to SEO Fight Club. I’m Ted Kubaitis and I have 22 years of web development and SEO experience. I have patented web technologies and started online businesses. I am both an engineer and a marketer. My goal is to help you win your SEO fights.

This episode’s FREEBIE!

With every episode I love to give something away of high value. This episode’s SEO freebie is my own personal 40 point negative SEO checklist. If you are concerned about negative SEO then this checklist will help you conduct periodic audits for suspicious activity. This free download will save you hours on checking your websites for attacks.

You can download the freebie at: http://seofightclub.org/episode4

Negative SEO refers to the practice of using techniques to ruin a competitor’s rankings in search engines, but really just in Google.

Most people think of Negative SEO as unwanted spammy backlinks. To put it simply, those people have a failure of imagination. To understand the dozens of tactics of negative SEO you first have to understand its numerous motivations and intentions.

Here are some of those intentions:

  • Get a site banned from search results.
  • Get a page demoted in the rankings.
  • Steal a website’s customers.
  • Steal a website’s content.
  • Steal a website’s viewers or contributors.
  • Change the topics a webpage ranks for.
  • To hurt the reputation of the website.
  • To hurt the reputation of the author.
  • To hurt the reputation of a product.
  • To disrupt a website’s ad revenue.
  • To squander a website’s ad spend.
  • To ruin a website’s content or data.
  • To disrupt the operation of the website.
  • To cause financial harm to a website.
  • To cause confusion about products and services or to blur the lines
  • differentiating them.
  • To slander or make false claims about a website, business, or person.
  • To bully, harass, or otherwise intimidate a person, website, or business.

I am certain there are more I’m not thinking of. Very quickly people can start to sense how they or people they know have already been hurt by negative SEO. I’m sure you are already sensing the kind of uneasy ground we all stand on now. When a business gets hit by these it can be crippling. It is absolutely devastating when you are hit by multiple of these at the same time.

There are an estimated 40 kinds of negative SEO tactics that I know of and the list seems to grow every year as Google adds new things to punish that can be weaponized by negative SEO practitioners.

  • spammy links
  • false ratings and reviews (worst product ever)
  • rating and review spam (this is ok but my product over here is better)
  • content spam
  • content theft
  • GoogleBot interruption
  • False canonicalization
  • False authorship
  • toxic domain redirection
  • denial of service
  • crippling the site’s speed
  • fraudulent DMCA takedown
  • Cross site scripting
  • Hacking the site
  • Keyword bombing posts and comments
  • Website or Social Identity Theft
  • Fake complaints
  • Injecting Malware
  • Click-spamming ads
  • CTR and other false signal bots
  • faking Email spam to get competitor publicly blacklisted
  • Fake bots that claim to be competitor and behave badly (again to get publicly
  • blacklisted)
  • Submitting alternative URLs or hosts to exploit missing canonical tags
  • Link building a page into a new keyword context
  • Link building incorrect pages into the keyword context for bad experience
  • Flooding a web database with bogus data
  • Posting adult or damaging content
  • Linking from adult sites or other toxic locations
  • Disavow misuse to declare a competitor as spam to google
  • Misuse of Google’s spam report form
  • Inserting grammar, spelling, and content encoding errors
  • Unwanted bot and directory submission
  • Redirecting many domains bot activity onto a target site all at once.
  • Negative and false Press
  • Inclusion in blog networks, link wheels, other linking schemes
  • Content overflow… keep posting to a one page thread until the page is too big
  • Topic Flooding… flood the forum with so many crappy posts the forum becomes unusable
  • Keep replying to bad or outdated posts so it keeps fresher or better content off the main indexes
  • Pretend to be a competitor and ask for link removals
  • Flooding junk traffic to a site so Google gets the wrong idea about the site’s
  • audience location or demographics
  • Domain squatting and hijacking

I’m not sure how many of these are still effective but I want to tell you about my experience to one of them that was extremely devastating. The GoogleBot interruption attack.

I used to say “negative SEO isn’t real”. My desk is in the engineering bullpen. There are no cubes or offices. This allows everyone to overhear of all of the issues of the day. I heard the network admin complaining about very weak denial of service attacks on our websites.

The specific type of denial of service attack my network administrator was battling is called “slow loris”.

Slow Loris Defined

Slowloris is a piece of software written by Robert “RSnake” Hansen which allows a single machine to take down another machine’s web server with minimal bandwidth and side effects on unrelated services and ports.

Slowloris tries to keep many connections to the target web server open and hold them open as long as possible. It accomplishes this by opening connections to the target web server and sending a partial request. Periodically, it will send subsequent HTTP headers, adding to—but never completing—the request. Affected servers will keep these connections open, filling their maximum concurrent connection pool, eventually denying additional connection attempts from clients.

Source: Wikipedia

The attacks didn’t make much sense. We could detect them and block them within an minutes, and they would keep appearing every two to three weeks. This went on and on months and possibly years. We are not entirely sure when the attacks started.

We didn’t understand the motivation behind these attacks. We could easily counter them. Why was someone working so hard to do this when we could stop it so quickly when it happens?

Several months went by and I was in a meeting trying to explain the unusual volatility in SEO revenue. While in that meeting I got chills down my spine. I had a thought I just couldn’t shake. Later, I put the chart of the slow loris attacks on top of the chart for SEO revenue, and every drop in SEO followed a slow loris attack. From then on I knew negative SEO was very real and very different from what everyone thought negative SEO was. This was a very effective negative SEO attack. It had absolutely nothing to do with backlinks.

I spent the next few weeks learning what I could about this attack. I learned how it worked. Basically, the attacker was waiting for an indication that googlebot was crawling our site, then they would launch the attack so our web server would return 500 errors to Googlebot. Googlebot would remove the pages that returned a 500 error from the search results. Googlebot would not retest the pages for days This is not the case today. Google will retests pages within hours now, but it was the case at the time. To make things even worse, once googlebot found working pages again they would reappear several places lower in the results for about 2 or 3 weeks before recovering to their original positions.

These attacks that we assumed were unsuccessful and weak were totally the opposite. They were both successful and devastating. They had lasting effects and the timing of the attacks was keeping us pinned down in the rankings.

If you were only watching rankings then this would just look like normal everyday Google dance. No one cares that one week a page went down 6 spots and then two weeks later comes back up. We have thousands of pages across 20 websites, and most of those websites are on shared servers. Google Search Console tools doesn’t let you see combined impact across sites. If I wasn’t reporting on SEO revenue, which many SEOs object to, this would have continued undetected.

So now I knew the attack was real, and I knew how it worked. So how do I stop them?

For the interruption attack to be effective the attacker needs to time his attack to coincide with Googlebot’s visit to the website. How can they do this? There are five ways I can think of:

Monitor the google cache and when a cache date is updated you know googlebot is crawling.
Analyze the cache dates and estimate when Googlebot will come back
Cross-site scripting to see visiting User agents
Attack often and hope for the best
Hack the system and access the logs

I believed the attacker was probably doing #1.

I put a NOARCHIVE tag on all of our pages. This prevents Google from showing the Cached link for a page. This would stop the attacker from easily monitoring our cache dates.

The attacks stopped for about 4 months following that change. I thought we had won, but I was wrong.

Late in the third quarter of 2014 we were hit hard by an extremely-precise attack. Our attacker then went dormant. The attacker had his attack capability back and the attacker knew we were on to them. We suspected he was picking his timing more carefully now. It was the second time I got chills down my spine. Like most online stores we do most of our sales in the fourth quarter. My suspicion was that the attacker was lying in wait for Black Friday. One of these attacks the week before Black Friday would cripple our top performing weeks of the year.

We scrambled to figure out how they were timing the attacks with GoogleBot. We failed. The week before Black Friday we were hit harder than we were ever hit before. We lost seventy percent of our SEO revenue for the month. I was devastated.

The company accepted that the attacks were amounting to significant losses. We knew the attacks were going to continue. We invested hundreds of thousands of dollars in security appliances that detect and block hundreds of different attacks.

It took six months to get all the websites protected by the new firewalls and to get all the URLs properly remapped. We had finally stopped the onslaught but at a pretty heavy cost in time and money. This year we have seen high double-digit growth in SEO. It is due to stopping the negative SEO attacks.

The attack I just described I call a “GoogleBot Interruption Attack”. Negative SEO is so new these attacks probably don’t have official names yet.

I have seen a number of other attacks too, but none we as crippling as the GoogleBot interruption attack.

Another attack I have encountered is when a black hat takes a toxic domain name that has been penalized into the ground and then points the DNS to your website. Some of those penalties appear to carry over at least for a short while. The worst is when a lot of these toxic domains are pointed all at once at your website.

Another similar attack to that is when an attacker redirects the URLs from the toxic sites to your URLs. This has the effect of giving your website a surplus of bad backlinks. What is scary about this is the attack can recycle those toxic backlinks and change targets over and over again.

Another attack is the attacker targets a page that is missing a canonical tag by submitting a version of the URL that works but Google has never seen before. This is done by adding things like a bogus URL parameter or anchor text. Then they link build to the bogus URL until it outranks the original. The original will fall out of the results as a lower PR duplicate. Then they pull the backlinks to the bogus URL, and they have effectively taken a page out of the Google index until Google recalculates PR again. Just put a canonical tag on every page and you can protect yourself from this one.

Another attack just requires a lot of domains and they don’t have to be toxic. The requirement is that they are indexed and visited by a LOT of bots. The attacker in many cases will point hundreds of these domains at a single website and use the collective of bot activity as a denial of service against the website.

I’m certain there are more kinds of attacks out there. It is limited to the creativity of the attackers, and the bad guys can be pretty creative. I am constantly afraid of what I might find next having run the gauntlet and paid the price already.

Your only hope is to accurately attribute SEO revenue and monitor it regularly. Conversions are good, but if you’re looking for strong signal on the health of your SEO then revenue is better. Revenue implies good indexation, rankings, traffic, and conversions all in one very sensitive gage. Conversions are good too, but the needle doesn’t move as much, making it harder to see the signals.

Secondly… sit next to your engineers. The issues they encounter are directly relevant to the effectiveness of your SEO. The frantic firefighting of the network administrator is one of the best indicators. Log serious events and plot them with revenue and other KPIs

Third… logs. The crawl error logs in Google Search Console and your web server logs tell you about the issues googlebot encounters and the attempts made on your server.

  • Lots of 500 errors might signal a GoogleBot interruption attack.
  • Lots of 404 errors might be toxic domain redirection.
  • Lots of URLs in errors or duplicate content that make no sense for your website might signal canonical misuse.

Following each and every soul-crushing SEO revenue event I had to pour through the logs and testimony of everything to try and make sense of things. Not every SEO event was an attack. In many cases the events were caused by errors deployed on our websites. Or the marketing team installed a new problematic tracking pixel service. Several times the owners bought domains and pointed them at our sites not knowing that the previous owners had made them permanently toxic. As an SEO, you need to detect and address these as well:Revenue, Logs, and general awareness of daily events was critical to early detection.

when I went to the reader base of many popular SEO forums and blogs, I was ridiculed and called a liar for asking for help with a problem most SEOs had never seen or heard of before. It was all too common that the peanut gallery of SEO professionals would criticize me for not having links and kept saying I had the burden of proof. These were supposedly members of the professional SEO community, but it was just a political flame war. The black hat community actually helped me research the kinds of attacks I was facing, explained how they worked and suggested ideas for countering them. Google and the SEO community in general were very unsupportive. I’m going to remember that for a very long time.

For some reason we are a big target. It is probably because so many of our products compete with similar products that are often affiliate offerings. If you are an online retailer that does a lot of sales seasonally, you need to be on the look out. The big threat is solved for my sites for now, but the vast majority of retail sites are unprotected, and many of them aren’t in a position to solve the issue the way we did.

Over the years I’d say we’ve been attacked hundreds of times but it wasn’t until 2014 that we became aware of it, and there were a lot of random events that helped that happen. There is “security by obscurity” for most websites. You have to be a worthy enough target to get this kind of attention.

Detection is paramount. You can’t mitigate problems if you are unaware of them. For false parameters specifically there are several options… you can use canonical tags on every page, which I highly recommend. You can also use URL rewriting to enforce very strict URL formatting. But if you aren’t looking at the logs and if you aren’t looking at your search result URLs closely then you wont even know about the issue.

Detailed revenue attribution is the big one. Seeing that the losses only come from Google is an important signal. For me, SEO revenue comes from dozens of sources. Search Engines, like Google, Bing, Excite, AOL, Yahoo, etc… Syndicated Search like laptop and ISP start pages and meta search engines, Safe Search AVG, McAfee, etc… and finally my SEO experiments.

Having the revenue attribution lets me know the revenue loss only occurred on Google this time so it can’t be consumer behavior like Spring Break because the drop would have been across the board if consumers just went on holiday.

Also keep an eye on your errors, search results, and logs. Also keep an eye on your network administrator’s “Frustration Meter”.

Here are a few specific things to check when looking for negative SEO attacks:In

Google Search Console:

  • Check GSC Messages for penalties.
  • Check GSC Messages for outages.
  • Check GSC Messages for crawl errors.
  • Check Server Errors Tab for Desktop and Mobile
  • Check Not Found Errors for Desktop and Mobile
  • If errors look suspicious then Download and archive them.
  • If errors look minor and are no longer reproducible then mark them as fixed so you only see new errors next time.
  • Check the Index Status page and make sure your total number of pages looks correct.
  • Check your content keywords and make sure nothing looks spammy or out of place there.
  • Check Who Links to your site the most under search traffic
  • Make sure your link count hasn’t abnormally grown since last check. Update your link count spreadsheet
  • Check your Manual Actions
  • In search analytics check countries and make sure your not suddenly popular in Russia
  • In search analytics check CTR and Position and make sure the chart looks ok… no drastic events
  • In Search Appearance investigate your duplicate title and meta description pages. Check the URLs to make sure they aren’t bogus
  • Check Security Issues

In your web server logs:

  • Check Server Logs: SQL injection
  • Check Server Logs: Vulnerability Testing
  • Check Server Logs: 500-503 Errors
  • Check Server Logs: Outrageous requests per second
  • Check Server Logs: Bad Bots
  • Check Server Logs: Large volume 404 errors from 1 referring domain

In the Google Search Results:

  • Check For Bizarre URLs
  • Check For Domains matching your content
  • Check For Unusual Sub-domains
  • Check For Odd URL parameters and URL anchors
  • Check For Negative mentions of domain or products

On your website and servers:

  • Check Ratings and Reviews
  • Check For Comment or Post Spam
  • Check Content Indexes For Over-abundance of old or bad topics
  • Check for profile spam
  • Actively run anti-virus on server
  • Routinely back up server
  • Periodically run vulnerability testing on your site to close security vulnerabilities
  • Patch your server regularly
  • Update your web platform and plugins regularly
  • Double check your WordPress Security plugin for any loose ends if applicable.
  • Periodically change your admin passwords and account names
  • Use strong passwords
  • Don’t share accounts or email credentials
  • Use a version control system and check the update status of deployed code for
  • changes regularly.
  • Check your domain name expiration date

There is a lot to consider in this episode. Please download the FREEBIE which is my own personal 40 point negative SEO checklist. If you are concerned about negative SEO then this checklist will help you conduct periodic audits for suspicious activity. This free download will save you hours on checking your websites for attacks.

Episode 4 – Real Negative SEO

Please subscribe and come back for our next episode where we will be “Setting SEO Guidelines For A Single Page” and I will continue the lesson by teaching my most powerful methods and secrets to content tuning a single page that targets a single keyword.

Thanks again, see you next time and always remember the first rule of SEO Fight Club: Subscribe to SEO Fight Club!