~ Search Engines Anti-Optimization ~
Get your own stop words!

Advanced Web searching tricks
by Nemo
      to basic   N  
  e   m  
  o  
Back to the Essays
Slightly edited by fravia+ and published @ searchlores.org in January 2003 ~ Version 0.3. February 2003

introduction   bookmarklets usage description   words statistics   stop words   examples   conclusion

This is a wondrous essay, that will give you quite some power when filtering out commercial crap and SEOs-spam from the search engines results. But not only! The bookmarklets techniques explained herein (and in Ritz's famous essay), can of course be further developed and modified for many other sound seeking purposes :-)
It's a tough world! Users simply HAVE TO defend themselves in a world where some commercial minions are allowed to write that "Twenty out of the 30 links Google is presenting on each page is not earning them money. That's an ad break of only 33%"... as if google would continue to predominate among search engines selling more crap, poor idiot.
seeing beyond the surface


Introduction

Certainly the reader frequently comes across search results pestered by commercial crap pages. I think that for the reader is easier to build a query with the most wanted terms, those that every bona fide person would use, than to find keywords capable to exclude effectively the unwanted search results.

The objective of this essay is to give you the means to find the excluding keywords in a fairly easy way, in order to get rid of the spammed results on your own, without any need to trust the anti-spamming algos used by the search engines themselves.
The idea is to reverse the SEO-spammers approach and build a list of the most common terms appearing in the spammed search results, with their relative frequency, so that you can spot at once the most spammed, and hence the most unwanted keywords for your query.
With such a list you will be able to

We can build a script, for instance in REBOL, that generates a table with the words contained in a given portion of text, sorted by their frequency. However it would be an hassle to build such an universal script and maintain it, because search engines change their pages' layout on a regular basis. One way to solve this problem is to use bookmarklets. You can get some advanced bookmarklets or ideas at the authorities:


Bookmarklets usage description

WARNING: The following bookmarlets may not work with all browser configurations, and wont surely work if you have disabled javascript, duh. The following bookmaklets have been tested, and work, with Netscape 4.7, IE 5.5/6.0 and/or Mozilla 1.2a. Should you have problems with any other browser, we believe it would be a VERY good idea to install ALSO a browser capable to use bookmarklets on your box.

Bookmarklet: nomens (NS 4+, Mozilla),   nomens (IE 4+) Because IE converts the HTML entities when bookmarking.

Select the text you want to analyze and click the bookmarklet nomens. This bookmarklet will open a new window with the selected text. If that window is already open this bookmarklet will append the selected text. This way you can create a page with as many search results as you want.

Code:
javascript:(
    function(){
        if(document.getSelection)txt=document.getSelection();
        if(document.selection)txt=document.selection.createRange().text;
        txt=txt.replace(/</g,'&lt;').replace(/>/g,'&gt;');
        W=open();
        W.document.write('<pre>'+txt+'</pre>\n');
    }
)()

Bookmarklet: omens (NS 4+, Mozilla, IE 4+ to IE 5.5)

Select the text you want to analyze and click the bookmarklet omens. This bookmarklet will open a new window containing a table with all words of length greater or equal to four sorted by frequency (its a dirty way of avoiding all non content words, you can change 4 by 1 in bookmarklet if you really want all the words).

Code:
javascript:(
    function(){
        var RelArray=new Array();
        var Raw=new Array();
        var SingNames=new Array();
        if(document.getSelection)txt=document.getSelection();
        if(document.selection)txt=document.selection.createRange().text;
        nomens=txt.match(/([a-zA-Z0-9]{4,})/g);
        for(i=0;i<nomens.length;i++){
            if(RelArray[nomens[i]]==undefined){
                Raw[Raw.length]=nomens[i];
                RelArray[nomens[i]]=[nomens[i],1];
            }else{
                RelArray[nomens[i]][1]++;
            }
        };
        for(j=0;j<Raw.length;j++){SingNames[j]=RelArray[Raw[j]];};
        function mySort(a,b){
            if(a[1]>b[1])return -1;
            if(a[1]==b[1]){
                if(a[0]>b[0])return 1;
                if(a[0]<b[0])return -1;
                return 0;
            };
            return 1;
        };
        omens=SingNames.sort(mySort);
        l=Raw.length;
        l=(l-l%4)/4;
        body='<table with=640 border=1 align=center>';
        for(k=0;k<l;k++){
            body+='<tr>';
            for(m=0;m<4;m++){
                body+='<td width=140>'+omens[4*k+m][0]+'</td><td width=20>'+omens[4*k+m][1]+'</td>';
            };
            body+='</tr>\n';
        };
        d=window.open().document;
        d.write('<title>Words</title><body>'+body+'</table></body>');
        d.close();
    }
)()

Internet Explorer 6 limits to the number of characters your bookmarklet can contain to 508. Which means this bookmarklet won't work on IE 6. You can solve this problem downgrading to IE 5.5, if you really have to use IE, or use the following workaround which also works for recent versions of Netscape and Mozilla:

Bookmarklet: omens (NS 6+, Mozilla, IE 6)

Add the bookmarklet omens to favorites, save the file omens.js to a directory or folder on your computer and change in the bookmarklet omens path/to/ by the corresponding path to omens.js on your computer. This bookmarklet works in the way previously explained.

Code:
javascript:
    void(
        (
            function(){
                var element=document.createElement('script');
                element.setAttribute('src','file:///C|/path/to/omens.js');
                document.body.appendChild(element)
            }
        )()
    ) 

Words statistics

Given that we have the tools, lets use them to refine our conceptual query:

(search OR searching OR searcher OR seek OR seeking OR seeker) AND (web OR internet OR document OR documents OR file OR files OR webpage OR webpages OR "web page" OR "web pages" OR tips OR hints OR strategies) --> 32262 hits

First of all we can improve our query changing AND by NEAR. Which means that must exist a pair of keywords, one of each set, within ten words. Said otherwise, for altavista, the operator NEAR is distributive in respect to the operator OR.

(search OR searching OR searcher OR seek OR seeking OR seeker) NEAR (web OR internet OR document OR documents OR file OR files OR webpage OR webpages OR "web page" OR "web pages" OR tips OR hints OR strategies) --> 9914 hits

In all queries for altavista I changed nbq=50 by nbq=100 in order to have one hundred results per page. As you can guess there are plenty of webcomercial sargassos. Altavista is a very easily spammed search engine and we'll take advantage of that, to get a list of stopwords, doing what you should actually never do: searching for only one word!

In order to produce the following list I select the portion of text containing the search results, click the bookmarklet nomens, load the next altavista's search results page, and repeat the procedure until we have one thousand results. Don't worry, with the previously explained trick, altavista will give you ten pages with one hundred search results each. We select all text containing the search results and we don't care about the noise produced by altavista's words: Translate, More, pages, from, Refreshed, past, hours as they are relatively innocuous terms.

After that, we will select all text in the "popuped" window and click the bookmarklet omens to get the word's frequency table for our thousand results.

In the following list I provide links to the queries on altavista and respective words frequency tables. I only show words with frequencies greater or equal to nine.


search: see search frequency table
Contact 74, Jobs 48, Business 36, Resources 34, Shopping 34, Submit 34, jobsearch 34, business 31, Games 27, Health 26, optimization 26, Genealogy 25, Travel 25, jobs 25, Optimization 22, Submission 21, Products 20, Service 20, offers 20, genealogy 19, Marketing 18, Sports 18, Career 17, Shop 17, Product 16, hotels 16, Adult 15, Entertainment 15, submission 15, adult 14, Hosting 13, Advertise 12, Celebrities 12, Employers 12, Order 12, Positioning 12, Promotion 12, positioning 12, Placement 11, career 11, hosting 11, marketing 11, placement 11, porn 11, ranking 11, submit 11, hotel 11, reservations 10, shopping 10, Hotel 9, Hotels 9, employment 9.
seek: see seek frequency table
Business 46, Games 46, women 42, Women 33, Bondage 30, Sports 28, games 28, marriage 28, Adult 28, Health 27, adult 25, business 25, service 25, Jobs 24, Travel 22, Entertainment 20, Jesus 17, dating 17, Bingo 15, services 15, porn 14, woman 14, Christian 13, Shopping 13, sports 13, Christ 12, Game 12, health 12, young 12, Casino 11, Church 11, love 11, Advertise 10, Medical 10, Sport 10, entertainment 10, Advertising 9, Premium 9, career 9, casino 9, employment 9, mission 9.
web: see web frequency table
design 335, Design 319, Hosting 284, hosting 274, Services 132, development 127, services 99, business 84, Development 82, Marketing 66, offers 58, service 58, commerce 57, solutions 57, Business 56, Solutions 50, marketing 49, affordable 42, Contact 41, promotion 34, Commerce 32, Promotion 27, Consulting 25, shopping 24, advertising 23, ecommerce 23, businesses 22, Affordable 21, consulting 19, resources 18, Designs 14, Ecommerce 14, Shopping 12, offering 12, HOSTING 11, Advertising 10, Designing 10, Order 10, Shop 10, Sports 10, Store 10, travel 10, store 9.
internet: see internet frequency table
Services 204, design 182, Design 140, Hosting 106, hosting 105, Service 101, Solutions 101, services 93, Marketing 89, marketing 72, service 67, solutions 63, business 55, Business 46, commerce 38, Development 33, provider 32, provides 32, Webdesign 31, Promotion 26, Resources 26, businesses 26, Consulting 25, offers 24, Commerce 23, Casino 21, designers 20, providing 19, Providers 16, consulting 16, offering 15, Advertising 14, advertising 14, promotion 14, Consultants 13, Gambling 12, SERVICES 12, Shopping 12, Sports 12, ecommerce 12, Products 12, gambling 11, shop 11, Providing 10, Shop 10, affordable 10, casino 10, resources 10, solution 10, designed 9, positioning 9, resource 9.
document: see document frequency table
Management 129, management 109, Services 95, Delivery 72, Solutions 65, Service 51, services 50, products 47, provides 37, Contact 36, solutions 33, business 29, design 26, Products 25, delivery 25, Design 23, Health 15, Travel 11.
file: see file frequency table
products 21, provides 21, Products 20, solutions 20, hosting 17, Hosting 15, Resource 15, Resources 15, Solutions 15, Support 15, contact 15, Business 13, Design 15, Genealogy 12, business 11, Career 9, Medical 9, games 9.
webpage: see webpage frequency table
design 108, hosting 104, Design 84, Official 33, Hosting 29, service 29, Business 29, business 27, services 25 Contact 19, Marketing 17, support 17, Designs 15, HOSTING 15, Designers 12, Development 12, affordable 12, development 11, marketing 11, solutions 11, webdesign 11, Consulting 10, businesses 10, contact 10, designed 10, designers 10, official 10, Health 9, Promotion 9, designing 9.
tips: see tips frequency table
travel 34, Health 32, business 30, Weight 27, Gardening 26, Marketing 24, Shop 23, Training 23, Wedding 23, health 23, Loss 22, Design 21, Fitness 21, Shopping 21, marketing 20, offers 20, Casino 18, Gambling 17, Tour 17, Blackjack 16, weight 16, Racing 14, Recipes 14, promotion 14, Dating 13, Garden 13, Golf 13, services 13, Sports 12, cooking 12, fitness 12, Beauty 12, Promotion 11, betting 11, games 11, loss 11, shopping 11, Cooking 10, Game 10, Vegas 10, advertising 10, blackjack 10, love 10, Career 9, Casinos 9, Employment 9, Games 9, Motivation 9, Professional 9, Weddings 9, offer 9, shop 9.
hints: see hints frequency table
Cheats 84, cheats 71, Game 57, game 48, games 35, Games 31, Playstation 22, Cooking 20, Cheat 19, Recipes 18, Service 18, Design 16, recipes 16, Training 14, Health 11, Nintendo 11, cheat 11, resources 11, travel 11, Wedding 10, business 10, cooking 10, Dreamcast 9, Marketing 9. Pokemon 9, gardening 9, health 9.
strategies: see strategies frequency table
Business 134, business 110, Marketing 102, marketing 90, services 81, consulting 78, Management 67, management 67, Development 54, Services 53, Design 49, development 47, provides 46, training 45, Contact 44, Consulting 42, service 37, Training 36, design 36, Solutions 33, Resources 31, market 29, providing 24, Health 23, clients 23, Market 22, resources 22, sales 22, Casino 21, products 21, Optimization 20, Products 18, Sales 18, Service 18, health 18, hosting 17, Advertising 16, Career 15, blackjack 15, businesses 15, Blackjack 14, Winning 14, commerce 13, Coaching 12, Game 12, Play 12, roulette 12, Consultants 11, Employee 11, games 11, career 10, promotion 10, Sports 9, advertising 9, betting 9, coaching 9.

Lets organize these stopwords by subject to get a better comprehension of our enemies:


It's interesting to compare the idiosyncrasies of each keyword... words are not made equal, even synonyms! Lets squeeze all this information and get the list of our own stopwords.


Stop words

With it we can do a deep cleaning of our search results:

(search OR searching OR searcher OR seek OR seeking OR seeker) NEAR (web OR internet OR document OR documents OR file OR files OR webpage OR webpages OR "web page" OR "web pages" OR tips OR hints OR strategies) AND NOT (design OR marketing OR promotion OR advertising OR optimization OR positioning OR service OR services OR business OR development OR contact OR consulting OR product OR products OR offers OR cooking OR recipes OR official OR game OR games OR casino OR gambling OR gardening OR garden OR genealogy OR health OR fitness OR hosting OR job OR jobs OR career OR jesus OR christ OR church OR adult OR porn OR sex OR sport OR sports OR coaching OR travel OR hotels OR reservation OR wedding OR dating OR marriage) --> 554 hits

Well don't trust altavista... everyone knows that altavista doesn't know to count. Now we can navigate on more pure waters! That does not mean that all pages are relevant... In this big pound we still need to provide some help to search engines. Let's see some interesting queries where this strategy was used.


Examples
Conclusion

Ironically the SEO-webmaster's obsession to optimize their webpages, using every possible keywords associated to their content, simplifies our task: avoiding them.

(c) Nemo 2003
Note that if you did not take the time to follow the queries examples listed above you're gonna miss some serious knowledge :-) Now you have Nemo's knowledge-cleaning bookmarklets... so bookmark them! Enjoy!

Petit image

(c) III Millennium: [fravia+], all rights reserved