You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
denotes a non-void finite set of ``words." A sentence is an assignment of words to positions, . If thinking of as a word-valued random variable defined at positions (like a "field" of words), then is the probability of word occurring at some position of the sentence.
The triple with dom$$\tilde{w}=S$$ and cod$$\tilde{w}=W$$ is exactly an object of the slice category . Call it the category of sentences over .
A document is a ``field" of sentences over a discrete category of locations, as in . The sentence in the document is denoted by . By the universal property of coproduct in , there is a random variable
such that if for , then is the word at position in the sentence at location . And is the probability of word occurring at some position of some location in the document.
For natural language understanding it is of interest to determine, for a sample of sentences (such as the consecutive sentences in a paragraph), the ``density of highest-probability words" in . The density of word in sample is
In applications a document is subdivided into consecutive intervals of fixed length so the denominator equal to .
Define the relative to a set of words of highest rank (according to some threshold ) by
The question is whether prevalence is a weighted colimit. Evidence is a detailed formal correspondence:
Statement and proof of a theorem to substantiate this claim and evidence might involve ``enriched category theory." Of course, the best would be references to the literature.