You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.
This site looks like it has great content but the web design is kind of old school. https://golem.ph.utexas.edu/category/ Does anyone want to build a new front end for it? I also didn’t see an ability to email subscribe, only RSS. Would be interested to try to recreate the site as is but just slightly more modern.
This site was set up by Jacques Distler and it was really cool when it first appeared, because there weren't many blogs that did LaTeX - and there still aren't many that do it so nicely. But I'm very worried about what will happen when he dies, because it's on a server he runs, and I don't know if anyone has access to the back end, and knows how the software works, except him!
Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like @Mike Shulman and @David Corfield and @Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.
+1 for preservation, -1 for 'renovation' (unless it's done intelligently). Old school reliable, HTML websites are great.
John Baez said:
Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like Mike Shulman and David Corfield and Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.
If y'all decide what you'd like to do, and there is any way that those of us who work in tech can help ensure its longevity, please do reach out. It's a wonderful resource and benefits us all.
FWIW It sounds like the critical goals are:
(1) archiving/backing-up the content in a human readable way accessible to the community
(2) displaying it online using something like MathJax
with two stretch goals:
(3) documenting what was done so it can be done easily by others
(4) moving to an nLab-like strategy where serving is distributed; no single-points of failure.
... is that the direction you're thinking of?
What about a Jupyter book ? https://jupyterbook.org/en/stable/start/your-first-book.html One can publish it online, it is very flexible, can also put blocks of code, images and so on ... (like in the standard notebook)
Whatever format the comtributors would like, within reason, I’m fine with.
Is there code on the nCafe? I love notebooks but generally think of Jupyter as being for code you actually want to run intermixed with graphs and LaTeX or text. Otherwise it’s a lot of overhead.
I think maybe markdown with a KaTeX plugin (like Zulip is probably the most minimal we could go and still have nice easy display options. There will be lots of libraries that can handle that.
The interlinking of comments etc is more what I’m worried about translating, if we had to translate into something with more forward support. The context there of who is replying to whom about what is often critical for understanding.
the source of each post, including comments, is really nicely structured html, so it would be "trivial" to port over everything to pretty much any static site generator if the need ever arises in the future. if somebody with access sets up a backup (with moderators' permissions, it would be nice to maybe automatically mirror these to a github repo)
Eric M Downes said:
John Baez said:
Since there's a lot of really good material on the n-Category Cafe, I think there should be a systematic attempt by the n-Category Cafe hosts (like Mike Shulman and David Corfield and Emily Riehl and me) to ensure that the n-Category Cafe has a survival plan.
If y'all decide what you'd like to do, and there is any way that those of us who work in tech can help ensure its longevity, please do reach out. It's a wonderful resource and benefits us all.
Thanks! So far we've been putting off doing anything, because it's a bit of a touchy subject ("Hey, we'd like to make sure your blog keeps working when you die") and also none of the people blogging on the n-Category Cafe has any special interest in, or knowledge of, the software challenges needed to solve the problem.
So for now you could help just by talking about this stuff... as you are now:
FWIW It sounds like the critical goals are:
(1) archiving/backing-up the content in a human readable way accessible to the community
(2) displaying it online using something like MathJax
with two stretch goals:
(3) documenting what was done so it can be done easily by others
(4) moving to an nLab-like strategy where serving is distributed; no single-points of failure.
... is that the direction you're thinking of?
I think what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles. This goes beyond merely "archiving" it as in (1) and (2).
However, archiving the old stuff would still be of value, if for some reason the ideal is too hard.
Tim Hosgood said:
the source of each post, including comments, is really nicely structured html, so it would be "trivial" to port over everything to pretty much any static site generator if the need ever arises in the future. if somebody with access sets up a backup (with moderators' permissions, it would be nice to maybe automatically mirror these to a github repo)
I'm poking around behind the scenes of the n-Category Cafe trying to download all the html, but I don't see how. Maybe I could get it from Jacques Distler.
oh you shouldn't need to do anything like that: wget can do this (probably
wget -r --no-parent https://golem.ph.utexas.edu/category/
)
edit: yeah, this seems to work just fine
Okay, excellent! If this is a "trivial" way to back up the n-Cafe - i.e., not too much work for people who actually know what they're doing - I guess it's worth doing. It doesn't solve the harder problem of moving the blog to a better long-term location, and it doesn't preserve the source data used to generate the html. But it's something.
maybe whoever backs up this zulip (which i think somebody does?) could also keep a copy of the nlab cafe? it took ~15 minutes to download the whole site (though i would say to not all start doing this and overload the server)
happy to chat about moving to a better long-term location, but i'm sure that lots of other people would also be keen to volunteer :)
I’d like to volunteer once the admins reach a consensus about what they would like for the site.
If we can put the source code on GitHub that is a tiny thing with a lot of benefit.
@Tim Hosgood - how big is the backup file for the n-Category Cafe?
having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly
this is definitely the first thing that i would address if i were trying to preserve the ncafe: making sure that the images aren't just links to other websites, but instead self contained
John Baez said:
Tim Hosgood - how big is the backup file for the n-Category Cafe?
(so basically i can't give a real answer for this right now because my copy includes essentially none of the images)
e.g. somewhere there is a link to some file at http://tolman.physics.brown.edu/ , but this is now a dead link
Tim Hosgood said:
having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly
Thankfully there are wget
options for this! If you haven't already solved it, this answer collects the options I was thinking of using and some I didn't know about. :)
John Baez said:
Thanks! So far we've been putting off doing anything, because it's a bit of a touchy subject ("Hey, we'd like to make sure your blog keeps working when you die")
...
I think what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles.
100%.
I hope he is well of course, but if you know otherwise, that could be a welcome and very soulful conversation for him. A kind of baton-passing and an opportunity for him to state any unrealized aspirations he had for the n-Cafe. It's meaningful to everyone to see things they valued carry on.
So we can put together technical options for now, and once there is something like consensus, perhaps it will be easier to propose something to Jacques.
As the contributors are happy with the existing interface, probably the most conservative thing to do, would be to price out and experiment with movable type running inside docker on an EC2 / gcloud instance not hosted at U-Texas (movable type is what n-Cafe currently uses). This doesn't look to be too hard: https://movabletype.org/start/.
So how about this as a survival plan:
At that point, there would seem to be nothing further to do and we can all move on to getting distracted by something else. :)
BTW it seems the Wayback Machine is mostly doing that for us already: https://web.archive.org/web/20230501000000*/https://golem.ph.utexas.edu/category/
Probably archive.org would be fine in a pinch, I’m certainly glad it’s there, but it often doesn’t save images and certainly doesn’t address 2-4 above.
Specifically it doesn’t help us solve this:
John Baez said:
what we'd really like is to give the blog a longer life, moving the old content to a new platform that will be just as good as the existing one but not rely on the expertise of one particular person to survive, while also allowing new blog articles.
Eric M Downes said:
Tim Hosgood said:
having a proper look, i just learnt that most of the images in posts aren't actually hosted on the web server, but instead are just links to other images on the web. i guess a "good" backup should also scrape copies of these, but that does make things a bit more fiddly
Thankfully there are
wget
options for this! If you haven't already solved it, this answer collects the options I was thinking of using and some I didn't know about. :)
yeah, I tried this, but because I don't have a complete list of the domains on which all the images are found (because they're very scattered), I can't use this flag, which means it tries to get everything that's linked — I ran the command overnight and woke up to over 2500 folders from different domains :upside_down:
99.999% of them are just robots.txt files, but i'm not good enough at wget magic to figure out how to only download files of a certain format (i.e. html + images)
Tim Hosgood said:
oh you shouldn't need to do anything like that: wget can do this (probably
wget -r --no-parent https://golem.ph.utexas.edu/category/
)edit: yeah, this seems to work just fine
A little wget magic worked for Mark Zuckerberg also.
Eric M Downes said:
As the contributors are happy with the existing interface, probably the most conservative thing to do, would be to price out and experiment with movable type running inside docker on an EC2 / gcloud instance not hosted at U-Texas (movable type is what n-Cafe currently uses). This doesn't look to be too hard: https://movabletype.org/start/.
That sounds great. Not hard for you, perhaps, and probably not hard for Jacques Distler, who might help. But essentially impossible for the other n-Cafe hosts!
So how about this as a survival plan:
- setup a backup strategy more robust and alertful than "cronjob wget on Tim's desktop". :)
- run an experiment restoring to a live blog from said backup
- document what needed to be done, and how to check if everything is working, ideally dummy proof everything into a docker script
- Periodically check everything still works and is being backed up.
So the idea is that a new version of the blog would only get activated when the old one died, or was about to die - and all the old articles would get copied to the new one?
It might actually be easier to transfer to a new version 'now', while the old one is still running and Jacques is still peppy
(I'm using 'now' in a loose sense: as opposed to postponing until some crisis occurs.)
John Baez said:
That sounds great. Not hard for you, perhaps, and probably not hard for Jacques Distler, who might help. But essentially impossible for the other n-Cafe hosts!
Ok!
(It will still be frustrating to be clear; I reserve the right to curse at my screen for instance… Nothing with computers is ever as easy as it should be. :)
I’ll look into setting up a dockerized movable type instance and whether that does indeed look simple and affordable and reproducible I’ll report back here. I might not actually get to this until late August due to family stuff — I assume there is no pressing deadline; if others want to get started experimenting, by all means, please do.
So once I make a “do-ability” report y’all can make the call about how Jacques should be contacted, and fill in the rest of the strategy.
Thanks, @Eric M Downes. This sounds great. Indeed, there's no pressing deadline - this is one of those things like getting a lawyer to write a document authorizing your spouse to make emergency medical decisions, which is very easy to put off until all of a sudden it's too late.
I think they are using a LAMP stack. This command curl -I https://golem.ph.utexas.edu/category/
includes this output: server: Apache/2.4.62 (Unix) OpenSSL/3.0.14 PHP/8.3.9 mod_fcgid/2.3.9 Phusion_Passenger/6.0.22
.
The documentation says that there is are "Amazon Machine Images" preinstalled with Moveable Type and its dependencies, for convenient deployment: https://movabletype.org/documentation/installation/aws/
According to https://similarweb.com, the site had an average of 75,988 visitors per month, from April 2024 until now.
Screenshot-2024-07-18-at-11.45.25AM.png
81% of these visits were from a mobile device, which I find interesting, because some issues with the page fitting on screen of my phone was one of the things that made me wonder if the site UI could be updated.
Screenshot-2024-07-18-at-11.45.12AM.png
This provides a cost estimator, which I assume would depend on traffic: https://aws.amazon.com/marketplace/pp/prodview-rgdxnjtyky4r4?sr=0-1&ref_=beagle&applicationId=AWSMPContessa
One estimate I read for a site with this level of traffic would be $50 - $100 / month.
I think this provides some reference on migrating a website: https://en.wikipedia.org/wiki/Content_migration
I think we would need to ask Jacques Distler for a database dump.
I think it will be more straightforward to ask him for the application source code than trying to scrape the website.
My two cents :+1:
Must it be Amazon web services, though?
I'd prefer to avoid Amazon for polical reasons.
It's nice to hear how much traffic we're getting - I had no idea!
Distler is good with software and I should simply ask him what he thinks about all this.
Nice.
No, AWS isn't necessary at all - I prefer Google Cloud, personally.
Moveable Type has some recommendations on hosting services based on their own testing, and criteria like ease and cost:
https://www.moveabletype.org/best-movable-type-hosting/
They do provide much more documentation for AWS than gcloud, as Julian linked to. But perhaps one of the options above is even better suited.
Just leaving this here in case we do go with gcloud, and can find an MT docker for prod (I could only easily find dev and test):
https://medium.com/@taylorhughes/how-to-deploy-an-existing-docker-container-project-to-google-cloud-run-with-the-minimum-amount-of-daca0b5978d8
Also, final thought for today — maybe we’re entirely overthinking this and just paying Squarespace is the way to go!
https://5help.squarespace.com/hc/en-us/articles/205632287-Importing-Movable-Type-or-TypePad-blog-posts
Should be included in the spreadsheet of options.
Whatever we do lets continue to avoid Microsoft products facepalm!!
(I agree for migration it would be silly to scrape the site. Whatever we go with, I just would like to have some kind of failover/backup strategy that is independent of our hosting service.)
I don't see why Microsoft should be held responsible for the Crowdstrike outage.
Some version of Crowdstrike security software is available for windows, MacOS, UNIX, and linux.
On windows the Crowdstrike code must run at a very low level, as in it breaks a lot of abstractions that other programs respect. (It is also designed to do a lot more, perhaps so big companies can hire cheaper sysadmins…)
macOS, UNIX, and (much) Linux architecture have different security models. For example on Macs, the Crowdstrike updates are loaded first in a user space. So if something goes wrong, the problem is much more likely recoverable because the lowest level of the OS kernel is not (yet) affected.
Why doesn’t windows do the same? My understanding is that it’s not possible — there’s not enough designed separation between core functions managed by root/admin and those of the user at a low level — instead these ideas were added after. (The code base of windows is also literally orders of magnitude larger in lines of code so harder to manage, update, understand… there’s a reason something like 85% of internet servers run some variety of unix)
So, when you have third party code affecting your kernel, and auto-updating daily, and running at the lowest level it possibly can… you get exactly this kind of issue. :(
Now Red Hat Enterprise and Ubuntu have been moving in the direction of windows, and Crowdstrike did cause kernel panics on Red Hat Enterprise a few years back. So even if you start from a better designed security model, it’s still possible to screw things up! I’m not making some kind of “inherent superiority” argument.
Crowdstrike is the proximate cause of this meltdown. Poor security models and bloatware are IMO the fundamental causes. You could say for historical reasons Windows is much more strongly affected by these problems than *nix systems.
Specifically… if your “security software” is auto-updating to kernel ring 0… you have already lost.
But in a sense I do agree with you. The “endpoint protection” Crowdstrike sells is just the same idea as Norton Antivirus etc. — third party rootkits — on steroids and repackaged as a solution to sell to people with big wallets and no interest/understanding in security, but a need to CYA and check boxes. Crowdstrike is crap! Ok I’ll stop my ranting now. :)
Does anyone else have technical issues with the n-Category Café again? I've tried to add a comment to the post Introduction to Categorical Probability, and there are two things that go wrong:
Upon clicking on "Preview", my browser (Vivaldi, a fork of Chromium) replaces the text in the comment box by this:
image.png
Nevertheless, the preview displays correctly.
In order to allow me to post, I need to paste the same text again into the comment box, since otherwise the system complains about me having "edited" my comment due to the previous issue. After pasting the text again, the webpage of the post with the discussion below appears as it usually does, but my comment is still missing. I've just tried this twice and it doesn't appear.
I just posted a comment using Firefox on a Windows machine and experienced no such problems. In fact I got so caught up in writing my comment that I completely forgot I was supposed to be helping you diagnose your problem!
Glad to know that it works for you! I'll keep trying a bit, and if the problem persists will try to get in touch with Jacques.
Okay, I've figured it out: the recently installed browser extension Proton Pass was trying to hijack the comment text box on the n-café.
I have had the problem Tobias is having for ages now. I copy the text of the comment before hitting preview, and then once the preview is shown, paste in the text again and hit submit, and it works ok 99% of the time. I'm using Firefox, with uBlock origin installed and a few other extensions. I'll have to experiment to see which if any of these are causing trouble.
How odd that it works 99% of the time, but then the comment fails to appear some 1%? I have uBlock origin as well and it has not caused me any trouble before.
The only problem is if the comment processing modifies the raw text input, and then the pasted text in the box somehow mismatches the "cleaned" input (at least this is my guess as to what is happening).