About In Case You Missed It (Law)
Algos and "AI": A look under the hood

This site exists because of a bot—@ICYMI_Law@esq.social—created to help members of the legal community discover content on Mastodon. In the early days when I was experimenting, the ICYMI (Law) family included the bot, this website, a podcast, and a newsletter. FWIW, I retired the podcast and newsletter (descriptions remain for posterity). This page explains in plain language how each of these work(ed) so readers/listeners can/could decide how to fit them into their media diet.

Jump to a description of the algorithms/workflows behind each of the following:

  1. The Bot
  2. The Website
  3. The Podcast (RETIRED)
  4. The Newsletter (RETIRED)

Before we talk automation, it's worth noting that ICYMI is at it's heart a community aggregator. It helps surface content from a specific community, namely the folks followed by the bot. I've leaned heavily on the wisdom of crowds to help with this discovery, but who the bot "listens" to / follows and what rules are used to surface content are editorial choices. Other folks might make different choices. Actually, I hope several someones will do just that, launching their own curated feeds. I think there's a big future for such feeds esp. if there's transparency into how they work. "They're like a news editor choosing what to put in the paper. The trick is valuing their contribution while not mistaking them for the whole conversation."

1. The Bot

A conscious choice was made to pace the bot's boosts. The goal was to make them frequent enough that they could flesh out a sparse timeline, but not so frequent that you couldn't catch up on a day's activity in one sitting. Consequently, the bot aims to post somewhere between one and two hundred times a day. Below is how I've tried to make that happen while keeping the content relevant and interesting.

  1. Boosts. Every 30 minutes, the bot does the following:
    1. It reads its timeline for the past 24 hours.
    2. It reads in a list of all the posts (including boosts) it has made.
    3. It removes all posts from the timeline:
      • authored by anyone with the #nobot or #noindex tag in their bio;
      • originally posted more than 24 hours ago (i.e., boosts of content authored more than 24 hours ago);
      • made by folks it doesn't follow (i.e., boosts made by folks it follows of folks it doesn't); OR
      • it has already boosted;
    4. It looks at how many reblogs and favorites each of the remaining posts have gotten and calculates a score based on their geometric mean. Note: This is only a subset of the true counts as it primarily knows what its home server (esq.social) knows. Consequently, esq.social's interactions with a post hold a special sway. This is why I decided to base a legal content aggregator on a legal-focused server. If we assume its users will more frequently interact with the target content, it ups the chances that the counts will be current. Additionally, the bot also knows the counts as they appear on mastodon.social for folks followed by @colarusso since it shares an infrastructure with @colarusso_alo. So, four communities strongly influence what the bot sees: (1) the folks it follows; (2) the folks who interact with their posts; (3) the members of esq.social who can give more insight into the actions of 2; and (4) the folks followed by Colarusso mediated by colarusso_algo who can give more insight into the actions of 2.
    5. It divides the score above by a number that increases with the author's follower count. That is, as the author's follower count goes up, their score goes down. As of this writing, this denominator is a sigmoid with values between 0.5 and 1, maxing out at a few thousand followers. However, I'm always fiddling with this.
    6. It sorts the timeline by this new score, from highest to lowest.
    7. It finds all the posts in the timeline that look like they came from Twitter (i.e., they include a link to twitter.com). If one of the last n boosts it has made looked like it was from Twitter, it removes all of the suspected Twitter posts from the timeline. Otherwise, it gets rid of all of the Twitter posts but the one with the highest score. As of this writing n was around 20, but I'm always playing with this value.
    8. It removes from the timeline posts from any author it has boosted in the last n boosts, where n again is a number subject to change but on the order of 10s.
    9. It makes sure it hasn't posted more than 200 times already. If it has, it stops. You may be wondering why this or some of the following tests don't come earlier, and the answer has to do with the fact that while examining and constructing the timeline the bot is collecting info it will use elsewhere regardless of whether or not it reblogs anything.
    10. It makes sure it's between 6 AM and and 11:30 PM US/Eastern, if it isn't it stops.
    11. It looks to see how many posts were made in the original timeline over the last 24 hours and the last 20 minutes. It uses these two numbers to estimate the frequency of posts it would need to make to hit a target of roughly 150 posts a day. The assumptions are such that the estimate tends to underestimate.
    12. It removes form the timeline all posts with a score below some multiple of the median score for available posts. Note: this can result in there not being enough posts to hit the target. Also, the multiple is always being fiddled with. See next.
    13. Based on this frequency it calculated above, it figures out how many boosts it should make over the next 30 min. It chooses that number of posts from the top of the timeline, if available, and tries to boost them out over the next 30 minutes. If there's an error it tries to boost a post from lower in the timeline.
  2. go to top

  3. Scheduled Posts.
    1. Most-Shared. Every day Monday through Saturday at around 9:05 AM (US/Eastern), the bot looks back at its timeline for the past 24 hours and collects all the links folks have shared. It filters out links to twitter.com, ssrn.com and a few other sites. It counts how may times these links were shared and post links to the five (5) most-shared after attempting to get the name of the page. On Sundays the bot goes through the same process except that instead of looking back over the last 24 hours, it looks back over the last seven (7) days. Note: if the bot couldn't grab the title, I may edit the post to include some of the title. Also, given space constraints, full titles aren't included in these posts.
    2. Hashtags. Every day between 12:30 AM and 1 AM (US/Eastern), the bot looks back on the last 24 hours of its timeline and collecting and counting the hashtags it saw. It produces a post listing each hashtag along with a count of how many times it was used.
    3. Traffic. Every day between 12:30 AM and 1 AM (US/Eastern), the bot looks back on the last 24 hours of its timeline and counts how many original posts it has seen. It produces a graph showing the number of posts per hour and posts this.
    4. That's All Folks. Every day after posting its traffic post (see above), the bot closes with a good night post saying how many posts it saw and made over the day.
    5. Follow Fridays. Every Friday after 10 AM (US/Eastern), the bot recomends the eight (8) most reblogged accounts using the #followfriday hashtag. It excludes accounts it has recomended in recent weeks.
    6. The Weekend! Every Friday soon after 5 PM (US/Eastern), the bot makes with an animated GIF of Daniel Craig introducing "The Weekend."
  4. go to top

  5. Triggered Posts.
    1. SSRN Bundles. Every time the bot runs, it keeps an eye out for links to SSRN. Once it collect five, it shares links to them in a post, assuming it is between 10 AM and 5 PM (US/Eastern). If it isn't, it will wait until 10 AM the following day.
    2. LOL SCOTUS. Every time the bot runs it check to see if @LOLSCOTUS has mad a post. If it has made more than three post in the last 30 minutes, it makes a post noting that its sibling @LOLSCOTUS seems to be active and suggest that folks go check out its account to see what's going on.
    3. Replies. At this time all replies are manual. So if you see the bot replying to a post, it was done by me (@Colarusso).
  6. go to top

  7. Intermittent Posts.
    1. Links to Digest. If you see a snappy post summarizing the content of the digest and providing a link to check it out, that was written by a Large Language Model (LLM) and cleared by me. I have a little script that points at the digest and produces the summary post with link. But as I will discuss more below, I'm not comfortable having the LLM posing on its own.
    2. Manual Posts. If you see a post that doesn't fit nicely into one of the above categories, it is safe to assume that it was made by a person.
  8. go to top

2. The Website

This website provides a daily digest of the bot's timeline, including a list of top posts, most-shared links, SSRN papers shared, a list of hashtags along with their usage, and an hourly breakdown of the original posts it has seen. The current digest lives on the main page, with each digest archived at a url of the following form: https://icymilaw.org/archive/YYYY-MM-DD.html

The list of Top Posts follows the same scoring as the bot (described above). It, however, does not care about whether the bot has boosted something before. Consequently, it is a list of the top fifteen posts from the prior 24 hours, as of when it ran which is noted at the bottom of the page.

The AI Summaries / Podcast Transcript section includes a collection of AI-generated "summaries." They are a byproduct of the podcast's production. A description of how they are generated can be found in The Podcast below.

The Most-Shared Links section displays the ten (10) "most-shared" links form the bot's timeline. It builds on the bot's Most-Shared Scheduled Posts to collect and display the links for the prior day. It's list may differ from the bot's for three reasons because: (1) it runs the night before the page goes live to accommodate a human-in-the-loop prior to publication; (2) it ignores articles that have already shown up in recent daily digests; and (3) I occasionally remove items if they are hard to read—e.g., paywalls, site formatting—or the lack of context provided in a post make their presence hard to understand. As for the human-in-the-loop, this is necessitated by the fact that I review the Podcast before hitting publish. See The Podcast.

The SSRN Roundup section displays the most-recent bundle of SSRN papers shared on the bot's timeline, just like the SSRN Scheduled Posts. Depending on how much folks are sharing, there could be more or less than one bundle per day, this is just the most-recent one as of when the page was built.

The Hashtag section presents a list of up to 50 hashtags, representing the most-used hashtags found in the bot's timeline for the last 24 hours. They are presented along with the number of times they were seen and a link to the hashtag on esq.social.

The Traffic section presents a graph of yesterday's posting counts, taken from the Traffic flavored Scheduled Post.

go to top

3. The Podcast (RETIRED)

Note: the podcast has been retired. This explanation remains for posterity. ;)

The podcast (available on digest pages) is a daily summary of two articles from the Website's Most-Shared Links section and one paper from the SSRN Roundup. It is a combination of scripted and AI-generated text. The intro and outro are scripted, everything between them is the output of a Large Language Model (LLM). First, let's be clear. LLMs are bullshitters in that they produce plausible sounding assertions with no regard for accuracy, filtered through a set of problematic societal biases. This is why, as mentioned above, I review the podcast before hitting publish. But I could miss something. The podcast is as much an exploration of LLMs as it is a summary of daily activity, and I want to see what I can learn from the daily process of editing an LLM in a low-stakes environment like a small one-person podcast and website. For more on the limits and drawbacks of algos as well as my practice of learning/teaching by building, see Portland’s Precrime Experiment and the Limits of Algorithms.

I pick which of the top articles to summarize based upon my tastes and the available text (e.g., if an article is behind a paywall or on a site that populates the page via javascript it's "too hard" to get the text so I skip them). I also occasionally edit text for readability, and if I don't like a summary, I'll either re-run the LLM or pick a different article. I don't check everything the LLM says, but I try to keep my BS detector on high sensitivity. See e.g., the time it didn't summarize the article and just made up a click-bait hellscape.

You can find the podcast on Apple Podcast, via it's RSS feed, or search for it in your podcatcher of choice.

go to top

4. The Newsletter (RETIRED)

Note: the newsletter has been retired. This explanation remains for posterity. ;)

The Newsletter is just an email repackaging of the podcast transcript with a slightly different lead in and outro. See The Podcast.

You can sign up for the newsletter here.

go to top