Organizing knowledge with LLM-augmented tags

Get an LLM to manage your tags

My conclusion is that you want to create some kind of tag-like system but that it should be better than tags. Each topic, each claim, each piece of information, could be organized with tags that label them as relevant for certain areas of inquiry. You could imagine tags such as “historical”, “demographic”, “public perception”, “discrimination”, etc., which could be used in an attempt to make these diverse knowledge elements more readily navigable. This could work, or at least it could get you some of the way. My main concern is that the tags are not general enough. What if you were actually looking for something related to public perception but you didn’t think of phrasing it that way? Well, actually, this generality problem is exactly something an LLM would be well-suited to solving. It could maintain tags completely hidden from the user and, in response to user queries, look through the list of available tags and pull up the ones that seem relevant! You could even imagine using hundreds or thousands of tags using a multi-layer search process where the LLM first opens the broadest “folders” of tags that seem relevant, then searches within those for more specific tag folders, and so on. You could also have the system keep track of certain metrics relating to how difficult it was to find certain pieces of information and refactor its tagging structure accordingly.

Tag generation algorithm idea

For each node, you could ask an LLM “give me the top 3 tags you might use to organize this node within the scope of general public debate”. Then, for each tag, compare it to each existing top-level tag and ask whether to a) add it as a new top-level tag, b) merge it with an existing tag, or c) nest it under an existing tag. Then, once you know which top-level tags it belongs under, repeat the process but this time narrow the scope to within the selected top-level tags. Recursively repeat until you go down 5 or 10 layers. You may end up doing a lot of comparisons here so you may need to find a way to cull some of them — this is just the naive implementation. You could later refine the tags by measuring how salient they are to users with forms of A/B testing and a feedback mechanism.

This is actually not an easy problem. There are an almost unlimited number of tags that you could assign to a given node. To identify the ones that are appropriate to assign requires a lot of knowledge about the existing debate. The whole point is to make the content easier for people to navigate — this requires knowing a lot about the people doing the navigation, and about how they think.

Limiting the number of node subheadings

I believe it would be desirable to constrain the number of subheadings that a node can have because doing so would make for a cleaner presentation in the interface. A super long list of too-detailed topics is never good. Part of what makes for a good hierarchical organization scheme is a relatively balanced breakdown of items — no folder gets too many, no folder gets too few. So imposing constraints on the min and max subheading counts could be an important component of a system that automatically generates these things.

Other kinds of semantic organization/search

I originally thought you might want to do something at a deeper level of the ML stack, perhaps something adjacent to the semantic searches famously employed by RAG. But honestly, having LLM agents that manage and navigate traditional tag-based setups actually seems like a pretty good place to start. At least, I don’t see anything obviously wrong with that.

Below are my notes, for reference, from Miro where I was attempting to map out the landscape of topics relating to homelessness.

Challenges to mapping

No single cause

Homelessness has no single cause. It is best understood as a confluence of overlapping factors. Therefore, a visual diagram is likely to have many overlapping edges.

No single way to model; all ways somewhat wrong

One way to understand the factors that lead to homelessness is by thinking about a combination of 1) underlying conditions and 2) precipitating events (triggers). So you could break down the root-level nodes into those two categories. However, you could also break down the root-level categories in terms of 1) economic factors, 2) political factors, and 3) cultural factors. In fact, you would expect that even researchers and authors on homelessness — among the most educated on the subject — would have disagreements about how best to break down the situation. Indeed, within the umbrella of psychology, there are entire fields dedicated to different ways of modelling the key factors that influence human thought and behaviour.

Additional lenses

You might also be interested in exploring the moral narratives around homelessness, the historical or geographical contexts, or the seasonal or demographic patterns.

Ideal view depends on the use case

There are different reasons that you might want to interact with a map about homelessness which would influence how the best-case map would look for you.

To develop an understanding of the shape of the issue
1. Priority on readability of breakdown
To examine different points of leverage by which the issue might be ameliorated
1. Requires asking questions “what if X were different?” and modelling outcomes
2. More important to model all relationships with weights; propagated scores

The mind map & tags

It actually seems to improve readability a lot to use a sort of “mind-map” view like this where the nodes are not claims but a set of branching categories or subtopics — perhaps clusters of claims. Maybe you could use some kind of tag-based system to make this happen?