Web apps

NLP tools
ThomAIs

Natural Language Processing tools ↑

I'm currently using this Heroku site for running web apps involving Natural Language Processing data analysis. I'm aiming to convert research software, usually sitting in repositories, into more usable versions this way. It can take a few seconds to start up.

The idea is that the apps are small discrete tools that could be used in combination with each other, so the output of some apps can be copy-pasted to be input for others.

Currently, the apps include:

A set of Natural Language Processing scripts/apps online, all based on word embedding vectors.
- "SemSim" is a kind of semantic filter, calculating the similarity of each a list of target words with one versus another set of "attribute" words.
- "SemTag" assigns tags to the content of a text, as a whole and per paragraph (by default, emotion-words are used as tags).
- "SemNull" gives a null hypothesis distribution for similarity scores over randomly selected words, given a specified role in a template sentence.
- "SemCluster" finds clusters of words with simlar meanings (a kind of automated affinity mapping).
A conceptual explanation of these "Semantic Vector Tools" is available here.
A very simple text analyzer, TAR (Topic-Attribute Relationships), that tries to extract topics and related attributes based on paragraphs of texts. The number of time attributes are assigned to a topic are counted, at most once per paragraph. (Very much something I'd currently use a generative AI model for in practice, but I think simpler attempts are worth making and keeping in mind; if only because they could potentially usefully be combined with generative AI.)

ThomAIs ↑

ThomAIs is a Gen-AI interface that primarily had automated response coding and thematic analysis for semi-structured interviews in mind. A toy example can be loaded on the page. It has an optional "Thomas mode" if you want to deal with that.

Some thoughts on the technology in general. People can certainly abuse generative AI in all sorts of ways, but I think there are appropriate use case where it's authentically a valuable tool. These revolve around what it really can do, i.e., manipulate the meaning of texts, rather than more magical functionality.

If you have a set of unstructured and maybe complex texts, say, and want to extract well-defined lists of information about their content, that's the kind of thing where it seems to shine. Maybe interview transcripts or open text responses on a survey. But note, you have to have some actual data and an analysis objective - it's not just asking the thing random questions and going ooh or baah at what it plucks out of its memory. Or, another use I think might be valid, under one critical condition, is as a complementary source of criticism for your own work - you don't have to believe it, and likely will tend to be motivated towards skepticism if anything, but it's a useful exercise to test yourself and maybe anticipate objections you could get, rightly or wrongly. The critical condition, of course, is that you don't start and end with using AI but get as far as you can using your own brain first. But after that, all sources of potential falsification or theoretical critique should be welcome, in principle, and getting AI to check for gaps is one source that seems almost remiss not to at least check. If it's garbage you should be able to easily articulate why, but maybe it picks up something you missed.

So, in a sense, funnily, perhaps the "generative" aspect isn't what's most valuable (and maybe that focus on "creation" is where things might start going wrong in public perception and usage?), but the ability to process semantic patters; with enough generative functionality to communicate the findings.