Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: practice: software

Topic: n-category cafe

Julius Hamilton (Jul 13 2024 at 03:16):

This site looks like it has great content but the web design is kind of old school. https://golem.ph.utexas.edu/category/ Does anyone want to build a new front end for it? I also didn’t see an ability to email subscribe, only RSS. Would be interested to try to recreate the site as is but just slightly more modern.

John Baez (Jul 13 2024 at 06:28):

This site was set up by Jacques Distler and it was really cool when it first appeared, because there weren't many blogs that did LaTeX - and there still aren't many that do it so nicely. But I'm very worried about what will happen when he dies, because it's on a server he runs, and I don't know if anyone has access to the back end, and knows how the software works, except him!

John Baez (Jul 13 2024 at 06:31):

Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like @Mike Shulman and @David Corfield and @Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.

Matteo Capucci (he/him) (Jul 15 2024 at 06:34):

+1 for preservation, -1 for 'renovation' (unless it's done intelligently). Old school reliable, HTML websites are great.

Eric M Downes (Jul 15 2024 at 14:10):

John Baez said:

Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like Mike Shulman and David Corfield and Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.

If y'all decide what you'd like to do, and there is any way that those of us who work in tech can help ensure its longevity, please do reach out. It's a wonderful resource and benefits us all.

FWIW It sounds like the critical goals are:
(1) archiving/backing-up the content in a human readable way accessible to the community
(2) displaying it online using something like MathJax
with two stretch goals:
(3) documenting what was done so it can be done easily by others
(4) moving to an nLab-like strategy where serving is distributed; no single-points of failure.
... is that the direction you're thinking of?

Federica Pasqualone (Jul 15 2024 at 14:58):

What about a Jupyter book ? https://jupyterbook.org/en/stable/start/your-first-book.html One can publish it online, it is very flexible, can also put blocks of code, images and so on ... (like in the standard notebook)

Eric M Downes (Jul 15 2024 at 15:34):

Whatever format the comtributors would like, within reason, I’m fine with.

Is there code on the nCafe? I love notebooks but generally think of Jupyter as being for code you actually want to run intermixed with graphs and LaTeX or text. Otherwise it’s a lot of overhead.

I think maybe markdown with a KaTeX plugin (like Zulip is probably the most minimal we could go and still have nice easy display options. There will be lots of libraries that can handle that.

The interlinking of comments etc is more what I’m worried about translating, if we had to translate into something with more forward support. The context there of who is replying to whom about what is often critical for understanding.

Tim Hosgood (Jul 15 2024 at 15:50):

the source of each post, including comments, is really nicely structured html, so it would be "trivial" to port over everything to pretty much any static site generator if the need ever arises in the future. if somebody with access sets up a backup (with moderators' permissions, it would be nice to maybe automatically mirror these to a github repo)

John Baez (Jul 15 2024 at 18:52):

Eric M Downes said:

John Baez said:

Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like Mike Shulman and David Corfield and Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.

If y'all decide what you'd like to do, and there is any way that those of us who work in tech can help ensure its longevity, please do reach out. It's a wonderful resource and benefits us all.

Thanks! So far we've been putting off doing anything, because it's a bit of a touchy subject ("Hey, we'd like to make sure your blog keeps working when you die") and also none of the people blogging on the n-Category Cafe has any special interest in, or knowledge of, the software challenges needed to solve the problem.

So for now you could help just by talking about this stuff... as you are now:

FWIW It sounds like the critical goals are:
(1) archiving/backing-up the content in a human readable way accessible to the community
(2) displaying it online using something like MathJax
with two stretch goals:
(3) documenting what was done so it can be done easily by others
(4) moving to an nLab-like strategy where serving is distributed; no single-points of failure.
... is that the direction you're thinking of?

I think what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles. This goes beyond merely "archiving" it as in (1) and (2).

However, archiving the old stuff would still be of value, if for some reason the ideal is too hard.

John Baez (Jul 15 2024 at 19:06):

Tim Hosgood said:

the source of each post, including comments, is really nicely structured html, so it would be "trivial" to port over everything to pretty much any static site generator if the need ever arises in the future. if somebody with access sets up a backup (with moderators' permissions, it would be nice to maybe automatically mirror these to a github repo)

I'm poking around behind the scenes of the n-Category Cafe trying to download all the html, but I don't see how. Maybe I could get it from Jacques Distler.

Tim Hosgood (Jul 15 2024 at 19:10):

oh you shouldn't need to do anything like that: wget can do this (probably wget -r --no-parent https://golem.ph.utexas.edu/category/)

edit: yeah, this seems to work just fine

John Baez (Jul 15 2024 at 19:18):

Okay, excellent! If this is a "trivial" way to back up the n-Cafe - i.e., not too much work for people who actually know what they're doing - I guess it's worth doing. It doesn't solve the harder problem of moving the blog to a better long-term location, and it doesn't preserve the source data used to generate the html. But it's something.

Tim Hosgood (Jul 15 2024 at 19:29):

maybe whoever backs up this zulip (which i think somebody does?) could also keep a copy of the nlab cafe? it took ~15 minutes to download the whole site (though i would say to not all start doing this and overload the server)

Tim Hosgood (Jul 15 2024 at 19:30):

happy to chat about moving to a better long-term location, but i'm sure that lots of other people would also be keen to volunteer :)

Julius Hamilton (Jul 15 2024 at 19:38):

I’d like to volunteer once the admins reach a consensus about what they would like for the site.

If we can put the source code on GitHub that is a tiny thing with a lot of benefit.

John Baez (Jul 15 2024 at 20:48):

@Tim Hosgood - how big is the backup file for the n-Category Cafe?

Tim Hosgood (Jul 15 2024 at 22:40):

having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly

Tim Hosgood (Jul 15 2024 at 22:41):

this is definitely the first thing that i would address if i were trying to preserve the ncafe: making sure that the images aren't just links to other websites, but instead self contained

Tim Hosgood (Jul 15 2024 at 22:42):

John Baez said:

Tim Hosgood - how big is the backup file for the n-Category Cafe?

(so basically i can't give a real answer for this right now because my copy includes essentially none of the images)

Tim Hosgood (Jul 15 2024 at 22:44):

e.g. somewhere there is a link to some file at http://tolman.physics.brown.edu/ , but this is now a dead link

Eric M Downes (Jul 16 2024 at 04:52):

Tim Hosgood said:

having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly

Thankfully there are wget options for this! If you haven't already solved it, this answer collects the options I was thinking of using and some I didn't know about. :)

Eric M Downes (Jul 16 2024 at 05:09):

John Baez said:

Thanks! So far we've been putting off doing anything, because it's a bit of a touchy subject ("Hey, we'd like to make sure your blog keeps working when you die")
...
I think what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles.

100%.

I hope he is well of course, but if you know otherwise, that could be a welcome and very soulful conversation for him. A kind of baton-passing and an opportunity for him to state any unrealized aspirations he had for the n-Cafe. It's meaningful to everyone to see things they valued carry on.

So we can put together technical options for now, and once there is something like consensus, perhaps it will be easier to propose something to Jacques.

Eric M Downes (Jul 16 2024 at 07:10):

As the contributors are happy with the existing interface, probably the most conservative thing to do, would be to price out and experiment with movable type running inside docker on an EC2 / gcloud instance not hosted at U-Texas (movable type is what n-Cafe currently uses). This doesn't look to be too hard: https://movabletype.org/start/.

So how about this as a survival plan:

setup a backup strategy more robust and alertful than "cronjob wget on Tim's desktop". :)
run an experiment restoring to a live blog from said backup
document what needed to be done, and how to check if everything is working, ideally dummy proof everything into a docker script
Periodically check everything still works and is being backed up.

At that point, there would seem to be nothing further to do and we can all move on to getting distracted by something else. :)

Matteo Capucci (he/him) (Jul 16 2024 at 07:34):

BTW it seems the Wayback Machine is mostly doing that for us already: https://web.archive.org/web/20230501000000*/https://golem.ph.utexas.edu/category/

Eric M Downes (Jul 16 2024 at 08:30):

Probably archive.org would be fine in a pinch, I’m certainly glad it’s there, but it often doesn’t save images and certainly doesn’t address 2-4 above.

Eric M Downes (Jul 16 2024 at 08:37):

Specifically it doesn’t help us solve this:

John Baez said:

what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles.

Tim Hosgood (Jul 16 2024 at 11:49):

Eric M Downes said:

Tim Hosgood said:

having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly

Thankfully there are wget options for this! If you haven't already solved it, this answer collects the options I was thinking of using and some I didn't know about. :)

yeah, I tried this, but because I don't have a complete list of the domains on which all the images are found (because they're very scattered), I can't use this flag, which means it tries to get everything that's linked — I ran the command overnight and woke up to over 2500 folders from different domains :upside_down:

Tim Hosgood (Jul 16 2024 at 11:50):

99.999% of them are just robots.txt files, but i'm not good enough at wget magic to figure out how to only download files of a certain format (i.e. html + images)

JR (Jul 16 2024 at 13:33):

Tim Hosgood said:

oh you shouldn't need to do anything like that: wget can do this (probably wget -r --no-parent https://golem.ph.utexas.edu/category/)

edit: yeah, this seems to work just fine

A little wget magic worked for Mark Zuckerberg also.

John Baez (Jul 17 2024 at 11:27):

Eric M Downes said:

As the contributors are happy with the existing interface, probably the most conservative thing to do, would be to price out and experiment with movable type running inside docker on an EC2 / gcloud instance not hosted at U-Texas (movable type is what n-Cafe currently uses). This doesn't look to be too hard: https://movabletype.org/start/.

That sounds great. Not hard for you, perhaps, and probably not hard for Jacques Distler, who might help. But essentially impossible for the other n-Cafe hosts!

So how about this as a survival plan:

setup a backup strategy more robust and alertful than "cronjob wget on Tim's desktop". :)

run an experiment restoring to a live blog from said backup

document what needed to be done, and how to check if everything is working, ideally dummy proof everything into a docker script

Periodically check everything still works and is being backed up.

So the idea is that a new version of the blog would only get activated when the old one died, or was about to die - and all the old articles would get copied to the new one?

It might actually be easier to transfer to a new version 'now', while the old one is still running and Jacques is still peppy

(I'm using 'now' in a loose sense: as opposed to postponing until some crisis occurs.)

Eric M Downes (Jul 17 2024 at 11:42):

John Baez said:

That sounds great. Not hard for you, perhaps, and probably not hard for Jacques Distler, who might help. But essentially impossible for the other n-Cafe hosts!

Ok!

(It will still be frustrating to be clear; I reserve the right to curse at my screen for instance… Nothing with computers is ever as easy as it should be. :)

I’ll look into setting up a dockerized movable type instance and whether that does indeed look simple and affordable and reproducible I’ll report back here. I might not actually get to this until late August due to family stuff — I assume there is no pressing deadline; if others want to get started experimenting, by all means, please do.

So once I make a “do-ability” report y’all can make the call about how Jacques should be contacted, and fill in the rest of the strategy.

John Baez (Jul 17 2024 at 12:27):

Thanks, @Eric M Downes. This sounds great. Indeed, there's no pressing deadline - this is one of those things like getting a lawyer to write a document authorizing your spouse to make emergency medical decisions, which is very easy to put off until all of a sudden it's too late.

Julius Hamilton (Jul 18 2024 at 17:32):

I think they are using a LAMP stack. This command curl -I https://golem.ph.utexas.edu/category/includes this output: server: Apache/2.4.62 (Unix) OpenSSL/3.0.14 PHP/8.3.9 mod_fcgid/2.3.9 Phusion_Passenger/6.0.22.

Julius Hamilton (Jul 18 2024 at 17:36):

The documentation says that there is are "Amazon Machine Images" preinstalled with Moveable Type and its dependencies, for convenient deployment: https://movabletype.org/documentation/installation/aws/

Julius Hamilton (Jul 18 2024 at 17:49):

According to https://similarweb.com, the site had an average of 75,988 visitors per month, from April 2024 until now.

Screenshot-2024-07-18-at-11.45.25AM.png

81% of these visits were from a mobile device, which I find interesting, because some issues with the page fitting on screen of my phone was one of the things that made me wonder if the site UI could be updated.

Screenshot-2024-07-18-at-11.45.12AM.png

Julius Hamilton (Jul 18 2024 at 17:59):

This provides a cost estimator, which I assume would depend on traffic: https://aws.amazon.com/marketplace/pp/prodview-rgdxnjtyky4r4?sr=0-1&ref_=beagle&applicationId=AWSMPContessa

One estimate I read for a site with this level of traffic would be $50 - $100 / month.

Julius Hamilton (Jul 18 2024 at 18:09):

I think this provides some reference on migrating a website: https://en.wikipedia.org/wiki/Content_migration

Julius Hamilton (Jul 18 2024 at 18:10):

I think we would need to ask Jacques Distler for a database dump.

Julius Hamilton (Jul 18 2024 at 18:10):

I think it will be more straightforward to ask him for the application source code than trying to scrape the website.

Julius Hamilton (Jul 18 2024 at 18:11):

My two cents :+1:

Morgan Rogers (he/him) (Jul 18 2024 at 19:26):

Must it be Amazon web services, though?

John Baez (Jul 18 2024 at 19:59):

I'd prefer to avoid Amazon for polical reasons.

It's nice to hear how much traffic we're getting - I had no idea!

John Baez (Jul 18 2024 at 20:00):

Distler is good with software and I should simply ask him what he thinks about all this.

Julius Hamilton (Jul 18 2024 at 21:58):

Nice.
No, AWS isn't necessary at all - I prefer Google Cloud, personally.

Eric M Downes (Jul 19 2024 at 01:01):

Moveable Type has some recommendations on hosting services based on their own testing, and criteria like ease and cost:
https://www.moveabletype.org/best-movable-type-hosting/

They do provide much more documentation for AWS than gcloud, as Julian linked to. But perhaps one of the options above is even better suited.

Eric M Downes (Jul 19 2024 at 01:02):

Just leaving this here in case we do go with gcloud, and can find an MT docker for prod (I could only easily find dev and test):
https://medium.com/@taylorhughes/how-to-deploy-an-existing-docker-container-project-to-google-cloud-run-with-the-minimum-amount-of-daca0b5978d8

Eric M Downes (Jul 19 2024 at 01:11):

Also, final thought for today — maybe we’re entirely overthinking this and just paying Squarespace is the way to go!
https://5help.squarespace.com/hc/en-us/articles/205632287-Importing-Movable-Type-or-TypePad-blog-posts

Should be included in the spreadsheet of options.

Eric M Downes (Jul 19 2024 at 12:47):

Whatever we do lets continue to avoid Microsoft products facepalm!!

(I agree for migration it would be silly to scrape the site. Whatever we go with, I just would like to have some kind of failover/backup strategy that is independent of our hosting service.)

Patrick Nicodemus (Jul 21 2024 at 23:33):

I don't see why Microsoft should be held responsible for the Crowdstrike outage.

Eric M Downes (Jul 22 2024 at 03:39):

Some version of Crowdstrike security software is available for windows, MacOS, UNIX, and linux.

On windows the Crowdstrike code must run at a very low level, as in it breaks a lot of abstractions that other programs respect. (It is also designed to do a lot more, perhaps so big companies can hire cheaper sysadmins…)

macOS, UNIX, and (much) Linux architecture have different security models. For example on Macs, the Crowdstrike updates are loaded first in a user space. So if something goes wrong, the problem is much more likely recoverable because the lowest level of the OS kernel is not (yet) affected.

Why doesn’t windows do the same? My understanding is that it’s not possible — there’s not enough designed separation between core functions managed by root/admin and those of the user at a low level — instead these ideas were added after. (The code base of windows is also literally orders of magnitude larger in lines of code so harder to manage, update, understand… there’s a reason something like 85% of internet servers run some variety of unix)

So, when you have third party code affecting your kernel, and auto-updating daily, and running at the lowest level it possibly can… you get exactly this kind of issue. :(

Now Red Hat Enterprise and Ubuntu have been moving in the direction of windows, and Crowdstrike did cause kernel panics on Red Hat Enterprise a few years back. So even if you start from a better designed security model, it’s still possible to screw things up! I’m not making some kind of “inherent superiority” argument.

Crowdstrike is the proximate cause of this meltdown. Poor security models and bloatware are IMO the fundamental causes. You could say for historical reasons Windows is much more strongly affected by these problems than *nix systems.

Eric M Downes (Jul 22 2024 at 03:52):

Specifically… if your “security software” is auto-updating to kernel ring 0… you have already lost.

Eric M Downes (Jul 22 2024 at 04:51):

But in a sense I do agree with you. The “endpoint protection” Crowdstrike sells is just the same idea as Norton Antivirus etc. — third party rootkits — on steroids and repackaged as a solution to sell to people with big wallets and no interest/understanding in security, but a need to CYA and check boxes. Crowdstrike is crap! Ok I’ll stop my ranting now. :)

Tobias Fritz (Oct 07 2024 at 13:16):

Does anyone else have technical issues with the n-Category Café again? I've tried to add a comment to the post Introduction to Categorical Probability, and there are two things that go wrong:

Upon clicking on "Preview", my browser (Vivaldi, a fork of Chromium) replaces the text in the comment box by this:
image.png
Nevertheless, the preview displays correctly.
In order to allow me to post, I need to paste the same text again into the comment box, since otherwise the system complains about me having "edited" my comment due to the previous issue. After pasting the text again, the webpage of the post with the discussion below appears as it usually does, but my comment is still missing. I've just tried this twice and it doesn't appear.

John Baez (Oct 07 2024 at 17:05):

I just posted a comment using Firefox on a Windows machine and experienced no such problems. In fact I got so caught up in writing my comment that I completely forgot I was supposed to be helping you diagnose your problem!

Tobias Fritz (Oct 07 2024 at 17:16):

Glad to know that it works for you! I'll keep trying a bit, and if the problem persists will try to get in touch with Jacques.

Tobias Fritz (Oct 07 2024 at 19:17):

Okay, I've figured it out: the recently installed browser extension Proton Pass was trying to hijack the comment text box on the n-café.

David Michael Roberts (Oct 07 2024 at 22:31):

I have had the problem Tobias is having for ages now. I copy the text of the comment before hitting preview, and then once the preview is shown, paste in the text again and hit submit, and it works ok 99% of the time. I'm using Firefox, with uBlock origin installed and a few other extensions. I'll have to experiment to see which if any of these are causing trouble.

Tobias Fritz (Oct 08 2024 at 04:28):

How odd that it works 99% of the time, but then the comment fails to appear some 1%? I have uBlock origin as well and it has not caused me any trouble before.

David Michael Roberts (Oct 08 2024 at 04:36):

The only problem is if the comment processing modifies the raw text input, and then the pasted text in the box somehow mismatches the "cleaned" input (at least this is my guess as to what is happening).