Posts

How Google Attempts to Understand What a Query or Page is About Based Upon Word Relationships

search-policyA little crunched for time, and feeling both hungry and lazy, I treated myself to a meal from the local Taco Bell tonight. It’s a new store, and I like that when you place an order at the drive-through, it shows what you ordered on a screen, and the person taking the order asks if your order is right before charging you for it and processing the order. The voice coming out of the speakers asked me if my order looked correct. I took a look, and responded, “I guess it does.” I had to guess, because the list didn’t look very legible:

1 Smthr Burr SC
1 Chalupa SPR Stk
1 Sft Taco Bf Spr
1 Lrg Root Beer

Of course, the screen shows abbreviations for the order, because it needs to abbreviate those words if they stand a chance of fitting on a single line on a ticker tape receipt. That doesn’t make my order any easier to read or understand when it’s displayed that way, and it really doesn’t need to be presented on the computer screen as abbreviated words as long as the abbreviations only appear on the receipt. Repeating what I ordered on a screen and allowing me to confirm the order is a really good idea. But, using the abbreviations for the receipt on that confirmation screen isn’t such a good idea. The people taking the order may recognize the abbreviations, especially after at least one night of having to look at them. But, even though the items look similar to what I ordered, they seem more like gibberish to me.

In the 2008 paper Finding Cars, Goddesses and Enzymes: Parametrizable Acquisition of Labeled Instances for Open-Domain Information Extraction, the authors describe how text on web pages might be labeled as it is crawled, to understand the concepts found in words on those pages. The paper may be a few years old, but Google was granted a patent on a similar process that was granted this past May. If words in queries are processed in the same way, to better understand the concepts in them, then search results can be returned on the basis of matching concepts in a query to concepts found on web pages.

The patent is:

Extracting and scoring class-instance pairs
Invented by Marius Pasca
Assigned to Google
US Patent 8,452,763
Granted May 28, 2013
Filed: March 19, 2010

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for extracting and scoring class-instance pairs. One method includes applying extraction patterns to document text to derive class-instance pairs, determining a frequency score and a diversity score for each distinct class-instance pair, and determining a pair score for each class-instance pair from the frequency score and the diversity score.

Another method includes applying extraction patterns to document text to derive candidate class-instance pairs, determining, for each distinct candidate class-instance pair, a number of distinct phrases from which the distinct candidate class-instance pair was derived, and determining a pair score for each distinct candidate class-instance pair from the number of distinct phrases from which the candidate class-instance pair was extracted.

Google’s recent Hummingbird update is aimed at examining long and complex queries and returning search results for those queries that don’t necessarily rely upon matching all the words within those queries. The focus of the paper and patent is on finding patterns in data that is mined from web pages by looking for relationships between words in “class-instance” pairs. As the patent tells us:

A class-instance pair is made up of a class name corresponding to a name of an entity class and an instance name corresponding to an instance of the entity class. The instance of the entity class has an “is-a” relationship with the entity class; in other words, the instance of the entity class is an example of the entity class. An example class-instance pair is the pair (food, pizza), because pizza is a food.

By better understanding that “Pizza is a food”, it makes it easier for Google to understand what is meant by “pizza” when it appears in a query or on a web page, and to match up queries and Web pages that both include that class instance pair. Much like knowing that “Smthr Burr SC” is a menu item that I may or may not have ordered at Taco Bell, makes it easier for me to know that what was meant by the abbreviation is “Smothered Burrito, Shredded Chicken”. Yes, that’s what I ordered.

A couple of days ago, I wrote about a patent from Google where the search engine tries to find “known for” terms of interest for entities. A restaurant might be known for a famous chef working there, or a specific menu item that might be unique to that restaurant; like Gordon Ramsay’s restaurants are known for his version of Beef Wellington. The post was How Google Finds ‘Known For’ Terms for Entities. What makes that patent similar to this one is that it focuses upon a specific type of relationship – a “known for” relationship. This new patent also looks for a relationship, an “is a” relationship.

If you want to dig into the process or mathematics behind how Google might identify is a relationships and extract terms and concepts that fit into those patterns from data extracted from the web, you can get a sense of those from the paper and the patent. What’s more important here is understanding that Google is building a knowledge base of concepts and relationships between words that can help it return relevant results for queries.

When Google acquired, or merged with (technically it was called a merger), Applied Semantics in 2003, Google also inherited Applied Semantic’s CIRCLA Technology. At the heart of the technology was the ability to learn about and understand relationships between words. I’ve mentioned “known for” relationships and “is a” relationships, but here are some other relationships mentioned in a white paper about Circla:

  • Synonymy/antonymy (e.g. “good” is an antonym of “bad”)
  • Similarity (“gluttonous” is similar to “greedy”)
  • Hypernymy (is a kind of / has kind) (“horse” has kind “Arabian”)
  • Membership (“commissioner” is a member of “commission”)
  • Metonymy (whole/part relations) (“motor vehicle” has part “clutch pedal”)
  • Substance (e.g. “lumber” has substance “wood”)
  • Product (e.g. “Microsoft Corporation” produces “Microsoft Access”)
  • Attribute (“past”, “preceding” are attributes of “timing”)
  • Causation (e.g. “travel” causes “displacement” or “motion”)
  • Entailment (e.g. “buying” entails “paying”)
  • Lateral bonds (concepts closely related to one another, but not in one of the other relationships, e.g. “dog” and “dog collar”)

The future of rankings of search results may rely upon Google building a concept-based knowledge base that understands the relationship between words, as well as probabilities that a certain relationship was intended when words are used on a page. For example, a page that mentions Microsoft might be about Microsoft as a member of technology companies, or it might be about Microsoft products. If you write a page that includes “Microsoft” in it, and the page also mentions Cisco, Redhat, Apple and Sun Microsystems, there’s a decent chance that the page is about technology companies. If you write a different page that includes “Microsoft” in it, and it also mentions Access and Word and Excel, then the page is more likely to be about products produced by Microsoft.

The words that you choose to use on a web page might send signals to Google about the relationships between those words, influencing Google’s interpretation of your page.

It’s possible that someone reading that last paragraph might say, “That’s obvious, and if you write naturally those relationships will appear on their own.” But writing “naturally” isn’t just something that flows from your mind to your fingers to your keyboard to your page. Knowing that Google will try to understand the relationships between words that appear in a query or on a page makes it less likely that in creating those queries or that content, you don’t send mixed signals that might be caused by a lack of focus on showing off those relationships.

About the Author

Bill Slawski is the Director of Search Marketing at Go Fish Digital and has been promoting websites since 1996. He often blogs about SEO and search-related patents and white papers on his blog SEO by the Sea. Originally, as an in-house SEO who then worked at agencies and as a solo consultant, he has worn a lot of different hats and has tested and tried out ideas from patents and papers as an ongoing SEO education. Connect with him at Twitter, @bill_slawski, if you’d like to stay in touch or have questions.

Photo thanks to Anders Sandberg (Random search)

Want even more SEO news? Sign up for the SEO Copywriting Buzz newsletter today!

Where have all the keywords gone? It’s your (not provided) John Wayne

These keys are as diverse as the keywords that are now (not provided)I can’t tell ya where all the cowboys have gone, Paula Cole, but I’ve come to your keyword rescue with this week’s (not provided) post!

For SEOs, Google’s (not provided) keyword practice is nothing new. It all began in 2011, when the search goliath moved to protect the privacy of secure searchers.

Google’s recent shift to 100% (not provided) for more privacy protection – so it claims – has webmasters looking for new ways to analyze search traffic and SEO copywriters seeking the right keyphrases for optimized copy.

Never fear, my friends! I’ve wrangled the most helpful posts from the search arena to save you from keyphrase catastrophe!

Search Engine Land‘s Matt McGee gives us “Google’s [Not Provided] At 87% Of Google Search Traffic To Major News Sites [Report]”.

Sonia Simone and Sean Jackson write “Why You Don’t Need to Freak Out over Google’s (Not Provided)” over on Copyblogger.

Koozai‘s Emma North shares “Using Google Webmaster Tools To Reclaim Organic Keywords & Rankings”.

The fabulous Laura Crest posts “Has Google Gone Evil or Is It Just Wicked Smart?” over at Level 343.

Search Engine Watch shares “Google Keyword ‘(Not Provided)’: How to Move Forward” by Ray “Catfish” Comstock.

Crispin Sheridan writes “Google & Not Provided Keywords: Forcing Marketers to Innovate Since 2013” for ClickZ.

Search Engine Land‘s Matt McGee writes “Google’s [Not Provided] At 87% Of Google Search Traffic To Major News Sites [Report]”.

Jonathan Rose gives us “Google Not Provided: Goodbye Organic Keyword Data, Hello Onsite Behaviour Tracking” on Business 2 Community.

Brafton Editorial gives us “More data not provided? New Google Analytics reports” on the Brafton.

Carrie Hill writes “Gathering New Keyword Insights In A (Not Provided) World” for Search Engine Land.

Michael King gives us “Why I Don’t Care About (Not Provided)” on SlideShare.

iAcquire‘s Devin Asaro writes “(Not Provided) Sets You Free”.

Robert Ramirez posts “The Importance of Site Structure in the Absence of Keyword Data” on Bruce Clay Inc.

Avinash Kaushik writes “Search: Not Provided: What Remains, Keyword Data Options, the Future” on Occam’s Razor.

Search Engine Watch‘s Ben Goodsell writes “How to Use PPC Data to Guide SEO Strategy in a ‘(Not Provided)’ World”.

Higher Education Marketing gives us “3 Other Ways to Get Visitor Keyword Info, Now That It’s ‘Not Provided’ in Google Analytics” by Scott Duncan.

Rebecca Bredhold writes “Three Rules for Effective (Not Provided) Brand Content” on Vocus.

Photo thanks to CileSuns92 (Keys)

Why do some freelance copywriters rake in the bucks while others struggle to make ends meet? Hint: It’s all about tightening up the back end of your business. Learn how to make more money, faster with the Copywriting Business Bootcamp. Save 10% until 11/13/13 with coupon code SECRETS.

Updated: Google snippet trick tips for success

Google Snippet Trick

I’ve been talking about how to write a meta description and the “Google snippet trick” for a long time. In fact, this blog post originally ran in 2008…

…And now it’s time to update it.

A couple weeks ago, Bill Slawski posted about a patent Google was granted in March 2012. His article, How Google Might Generate Snippets for Search Results is a must-read – and gives us a clue on how we can better construct our content.

Here are some interesting tidbits from the post (italics are mine.)

If there is a page with a lengthy introduction (or an abstract) – and the words in the search query are present – Google may pull the snippet from the start of the page.

If the page has a conclusion – and the words in the search query are present – Google may pull the snippet from end of the page.

Different paragraphs are scored differently – and where the snippet is pulled from depends on the paragraph score. According to Bill’s article, “Other signals, such as the lengths of paragraphs, amount of punctuation, bold and italics, and more can also influence the choice Google makes.”

(For complete information, I encourage you to read Bill’s post. Do it now. I’ll wait.)

Below is the updated post with brand-new info.  How does this change the way you’ll write content in the future (or will it change anything at all?)  Post your comments below – thanks!

The meta description is like the Title’s trusted sidekick.

Batman had Robin. Sherlock Holmes had Watson. The Title has the meta description. The Title helps the page position in the search engines (and if it’s written correctly, it is also written like a enticing headline.) Yet, it’s the meta description that truly tells the story. It serves as a “tease,” giving your readers a taste of what the landing page is about.

SEO writers love to sweat over their meta descriptions. After all, it’s a great place to highlight important benefits and drive click-through. But there’s one catch: More often than not, instead of the submitted description, Google displays a “snippet” of text that appears around the search term.

That means that your carefully crafted descriptions (where you’ve painstakingly outlined your benefits and calls to action) don’t show on the search engine results page. Rather, Google takes a “snippet” of text that appears around the search term (like this):

 

Screen Shot 2013-03-05 at 9.55.38 AM

Feeling frustrated and want to curse out the Google gods? Relax. This is a situation that you can (kind of) control. You just have to know how Google works.

The key is using your keyphrases in a very specific way that increases the probability of a good description – even if it is in a snippet form.

(And by “good,” I mean the description includes a call to action, a phone number, a benefit – anything that would encourage click-through.)

Please note that these tips provide general guidelines about how to “look” at your copy a certain way, and how to tweak your writing accordingly. This isn’t meant as a “must-follow-at-any-cost” formula, nor am I advocating a certain keyphrase density.

 – Review the keyphrase focus for the page

Chances are, you have two “main” keyphrases, and up to three “bonus” keyphrases. Yes, you’ll want to exact match the keyphrase  (you don’t need to overdo it, according to this video.) Mixing and matching the individual words in the keyphrase works, too.

– Use your most important keyphrases in your headline/subheadlines

Headlines should be benefit-rich, reader-savvy and oh-so enticing. And yes, they should also include a keyphrase whenever possible (maybe even two keyphrases if you can make them flow and fit.)  Remember, people will quick-scan your headlines before diving into your content, so how you write them counts.

– Think “snippet text” as you’re writing/editing

Remember, the words around the search query appear as part of the Google snippet. Whenever possible, you’ll want a benefit statement, call-to-action or an interesting fact near the first instance of your main keyphrases. That way, there is compelling snippet text that could entice the reader to click through from the search results page and read more.

–  Try to include your second most important keyphrase within your first couple paragraphs

This is typically very easy to do. If you can’t include the exact match keyphrase, try to include the individual words within the keyphrase.

– If it’s possible to include any “bonus” keyphrases in the first couple paragraphs, do it

(But not at the expense of your copy.)

The same “rules” apply – whenever possible, use the keyphrases in a way that would be compelling in a snippet.

– Include keyphrases throughout your body copy, including synonyms and keyphrase variations. 

Yes, you can include synonyms – which often makes the copy much easier to write (plus, it helps the copy read much more naturally.) Here is some additional information on why synonyms are your friends.

– Don’t forget to add keyphrases towards the end of your document

If Google doesn’t pull the meta description from the beginning, it may pull it towards the end (especially if you have a longer conclusion.)

– VERY IMPORTANT POINT

If you find that adding keyphrases (or a variation of them) makes your copy read funny – delete them. The purpose of the Google snippet trick isn’t to destroy your content in favor of (possibly) getting a great Google snippet. The purpose is to control what you can control around your meta description – and try to tilt the odds in your favor. DO NOT randomly add keyphrases “just because.”

Be warned — the Google Snippet Trick doesn’t always work – and where Google pulls the snippet is based on many factors. But it works often enough (and is easy enough to do,) that implementation is a snap.

And heck, it allows you to somewhat control a previous uncontrollable situation (what Google includes as the description.) What could be better?

Love this post and want to learn more about SEO Copywriting? Looking for an up-to-date training resource? Check out the SEO Copywriting Certification training.