Noise words.
That's what I'm working on right now.  Noise words.
Not words like clang or pththththt but words that can be
ignored in Natural Language Processing.  Interesting problem.  Words like
the and a can be stripped as noise words.  But what else? 
And does frequency of occurance count?
Conjunctions, interjections, and maybe propositions can be cut. 
Maybe.
Doing a quick search for precompiled word lists, I came across the 
Language Technology
Group Helpdesk FAQ which is incredible if you're into this type of
thing.
 
        You have my permission to link freely to any entry here.  Go
        ahead, I won't bite.  I promise.
        
        The dates are the permanent links to that day's entries (or
        entry, if there is only one entry).  The titles are the permanent
        links to that entry only.  The format for the links are
        simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are
        interested in, say 2000/08/01,
        so that would make the final URL:
        
        https://boston.conman.org/2000/08/01
        
        You can also specify the entire month by leaving off the day
        portion.  You can even select an arbitrary portion of time.
        
        You may also note subtle shading of the links and that's
        intentional: the “closer” the link is (relative to the
        page) the “brighter” it appears.  It's an experiment in
        using color shading to denote the distance a link is from here.  If
        you don't notice it, don't worry; it's not all that
        important.
        
        It is assumed that every brand name, slogan, corporate name,
        symbol, design element, et cetera mentioned in these pages is a
        protected and/or trademarked entity, the sole property of its
        owner(s), and acknowledgement of this status is implied.