Noise words.
That's what I'm working on right now. Noise words.
Not words like clang or pththththt but words that can be
ignored in Natural Language Processing. Interesting problem. Words like
the and a can be stripped as noise words. But what else?
And does frequency of occurance count?
Conjunctions, interjections, and maybe propositions can be cut.
Maybe.
Doing a quick search for precompiled word lists, I came across the
Language Technology
Group Helpdesk FAQ which is incredible if you're into this type of
thing.
You have my permission to link freely to any entry here. Go
ahead, I won't bite. I promise.
The dates are the permanent links to that day's entries (or
entry, if there is only one entry). The titles are the permanent
links to that entry only. The format for the links are
simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are
interested in, say 2000/08/01,
so that would make the final URL:
https://boston.conman.org/2000/08/01
You can also specify the entire month by leaving off the day
portion. You can even select an arbitrary portion of time.
You may also note subtle shading of the links and that's
intentional: the “closer” the link is (relative to the
page) the “brighter” it appears. It's an experiment in
using color shading to denote the distance a link is from here. If
you don't notice it, don't worry; it's not all that
important.
It is assumed that every brand name, slogan, corporate name,
symbol, design element, et cetera mentioned in these pages is a
protected and/or trademarked entity, the sole property of its
owner(s), and acknowledgement of this status is implied.