Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.

Stream: community: general

Topic: Disappearing archives

David Michael Roberts (May 27 2020 at 14:59):

@Valeria de Paiva I mean the archive of the categories mailing list. Already the post-2009 mails are nearly lost: some of them are preserved in the Internet Archive's sporadic scrapes of the Gmane servers, but with no consistent url choice. It's not clear if all of them survived. The early ones are neatly curated, and everything from 1990–2009 is stored in big text files complete with email headers, one file per month.

David Michael Roberts (May 27 2020 at 15:02):

There was concern at MathOverflow meta over old links to the Gmane domain, since that broke (and now seems only accessible using nntp protocol, and via a different TLD address). I easily found four published books on the first page of Google search with Gmane urls in references, pointing to specific emails on the list, including one published this year.

David Michael Roberts (May 27 2020 at 15:04):

There is a lot of good historical and mathematical information from pioneers in the field on there, including observations not recorded anywhere else. So I hope someone with the skills can hook up to the Gmane servers and extract the last 10 or so years worth of posts (though of late, this list is functionally dead), if they are there, and then host the whole history somewhere accessible and searchable.

Valeria de Paiva (May 27 2020 at 18:08):

David Michael Roberts said:

Valeria de Paiva I mean the archive of the categories mailing list. Already the post-2009 mails are nearly lost: some of them are preserved in the Internet Archive's sporadic scrapes of the Gmane servers, but with no consistent url choice. It's not clear if all of them survived. The early ones are neatly curated, and everything from 1990–2009 is stored in big text files complete with email headers, one file per month.

Thanks for the explanation. I thought the back ups were ok before and never really looked. I was furious when Hypatia went offline some 20 years ago. It isn't even on Wikipedia, a shame.

I wish I knew someone which the ability to extract those emails from the Gmane servers, no one comes to mind at the moment. I totally agree that the work there should be preserved.

John Baez (May 27 2020 at 18:47):

Ugh, this is a tragedy that could easily have been prevented.

Valeria de Paiva (May 27 2020 at 19:20):

John Baez said:

Ugh, this is a tragedy that could easily have been prevented.

Agreed!!!
I could only find a picture of Hypatia's opening page, attached here. I hope we can find someone to help with the categories mailing list from 2009, it's totally absurd to lose this stuff!
Hypatia-openingpage.tiff

David Michael Roberts (May 28 2020 at 02:21):

Those with know-how should grab the files from here: https://www.mta.ca/~cat-dist/#archives and store them somewhere public as a backup.

David Michael Roberts (May 28 2020 at 02:31):

Hypatia's long term archival will protect your work for future generations, (https://web.archive.org/web/19990222165443/http://hypatia.dcs.qmw.ac.uk/html/faq-mirror.html)

Oh the irony...

David Michael Roberts (May 28 2020 at 02:36):

Sadly, because of the conservative robots.txt file, all the actual material from Hypatia has evaporated, leaving the Internet Archive with just front-facing material that doesn't link anywhere: https://web.archive.org/web/19990208004719/http://hypatia.dcs.qmw.ac.uk/

Simon Burton (May 28 2020 at 13:35):

David Michael Roberts said:

Those with know-how should grab the files from here: https://www.mta.ca/~cat-dist/#archives and store them somewhere public as a backup.

I am working on it... will report back if I succeed...

Simon Burton (May 28 2020 at 13:41):

On a related note, someone (me?) should make an index for the TWF's... At one point I ended up downloaded all of them so I could search through them for a specific thing I was looking for.

Daniel Geisler (May 28 2020 at 15:18):

Let me know if you need technical assistance or a place to store the documents. Big thanks for your service @Simon Burton

Gershom (May 28 2020 at 15:26):

I have reached out to the owner of gmane and asked if I can get a copy of the archives it hosts as well. If there's no response, a newsreader can always be configured to just suck everything down, I suppose...

David Michael Roberts (May 28 2020 at 15:40):

Simon Burton said:

On a related note, someone (me?) should make an index for the TWF's... At one point I ended up downloaded all of them so I could search through them for a specific thing I was looking for.

I thought there was one already. But I can't remember where. It was some (?random) mathematical internet citizen.

Simon Burton (May 28 2020 at 17:37):

David Michael Roberts said:

Those with know-how should grab the files from here: https://www.mta.ca/~cat-dist/#archives and store them somewhere public as a backup.

I got this to work: https://arrowtheory.com/mirror/www.mta.ca/cat-dist/

Simon Burton (May 28 2020 at 17:38):

There's also a tarball if anyone else wants to host it: http://arrowtheory.com/mirror.tbz2

John Baez (May 28 2020 at 18:00):

Simon Burton said:

On a related note, someone (me?) should make an index for the TWF's... At one point I ended up downloaded all of them so I could search through them for a specific thing I was looking for.

It would be great having an index of This Week's Finds. I'm going to start writing more polished articles on some of the recurrent themes, but that's a bit different. I also want to put This Week's Finds into a PDF file on the arXiv. Jason Erbele was starting to LaTeX them, but it's a big job and he soon gave up. I'll probably do something less difficult.

I guess you know there's a table of contents of the first 239 issues. I got tired at that point and quit.

Daniel Geisler (May 28 2020 at 19:04):

I recommend that we host a CT torrent for publicly available publications. While I work at providing long term storage of text from wisdom traditions, I have the mind set and technology to assist any interested parties.

John Baez (May 28 2020 at 19:57):

It might be easy to get the files at the top of this page:

https://www.mta.ca/~cat-dist/#archives

It says

There is an archive of postings at nntp://news.gmane.org/gmane.science.mathematics.categories maintained by Gmane. You may need a news-reader client to access it.

I'm too busy to install a news-reader and download these, but if someone does, they could put these files in a location that's 1) stable, 2) easier to access.

Fabrizio Genovese (May 28 2020 at 20:57):

Daniel Geisler said:

I recommend that we host a CT torrent for publicly available publications. While I work at providing long term storage of text from wisdom traditions, I have the mind set and technology to assist any interested parties.

Well, actually there are "public" repositories that have pretty much any book/paper in maths you can think about. I won't state them explicitly here because I guess it's illegal, but suffices to say that there is this library that is called as the first book of the Bible... :slight_smile:

John Baez (May 28 2020 at 21:08):

Maybe someone should assemble the old category theory mailing list postings and put them on libgen.

(I don't think there's a law against saying libgen.)

Fabrizio Genovese (May 28 2020 at 21:20):

John Baez said:

Maybe someone should assemble the old category theory mailing list postings and put them on libgen.

(I don't think there's a law against saying libgen.)

I think this is the best option for making them universally available!

Simon Burton (May 28 2020 at 22:49):

From what I can see, gmane is not working . I'm not sure if this is a temporary situation, or what..

John Baez (May 28 2020 at 22:56):

Oh, so you tried accessing it with a newsreader?

Cole Comfort (May 28 2020 at 23:01):

John Baez said:

Maybe someone should assemble the old category theory mailing list postings and put them on libgen.

(I don't think there's a law against saying libgen.)

If you say libgen in the mirror in the dark 3 times, Elsevier will come knocking on your door.

John Baez (May 28 2020 at 23:02):

:ogre:

Valeria de Paiva (May 28 2020 at 23:08):

Fabrizio Genovese said:

Daniel Geisler said:

I recommend that we host a CT torrent for publicly available publications. While I work at providing long term storage of text from wisdom traditions, I have the mind set and technology to assist any interested parties.

Well, actually there are "public" repositories that have pretty much any book/paper in maths you can think about. I won't state them explicitly here because I guess it's illegal, but suffices to say that there is this library that is called as the first book of the Bible... :)

well, they don't have what was in the old Hypatia, as they didn't exist then, I'm afraid.

David Michael Roberts (May 29 2020 at 03:36):

@Simon Burton are you using the new domain? The one listed at the categories home page is outdated. See https://lars.ingebrigtsen.no/2020/01/15/news-gmane-org-is-now-news-gmane-io/

David Michael Roberts (May 29 2020 at 03:37):

It should be nntp://news.gmane.io, not gmane.org

Fabrizio Genovese (May 29 2020 at 11:49):

Valeria de Paiva said:

Fabrizio Genovese said:

Daniel Geisler said:

I recommend that we host a CT torrent for publicly available publications. While I work at providing long term storage of text from wisdom traditions, I have the mind set and technology to assist any interested parties.

Well, actually there are "public" repositories that have pretty much any book/paper in maths you can think about. I won't state them explicitly here because I guess it's illegal, but suffices to say that there is this library that is called as the first book of the Bible... :slight_smile:

well, they don't have what was in the old Hypatia, as they didn't exist then, I'm afraid.

...Which means that we should upload this stuff there! :D

dusko (May 31 2020 at 05:26):

Fabrizio Genovese said:

Valeria de Paiva said:

Fabrizio Genovese said:

Daniel Geisler said:

I recommend that we host a CT torrent for publicly available publications. While I work at providing long term storage of text from wisdom traditions, I have the mind set and technology to assist any interested parties.

Well, actually there are "public" repositories that have pretty much any book/paper in maths you can think about. I won't state them explicitly here because I guess it's illegal, but suffices to say that there is this library that is called as the first book of the Bible... :)

well, they don't have what was in the old Hypatia, as they didn't exist then, I'm afraid.

...Which means that we should upload this stuff there! :D

this thread is interesting at many levels:

** how to preserve network content. categories mailing list is an easy early web community product: you just need to find it and save it. but some people maybe talk about thinkgs worth remembering here on zulip. there are in the meantime many versions of "proprietary email" (as whit diffie calls them). some platforms for social, scientific, dating interactions are provided for free, and they "maintain free services" by collecting data for advertising, campaigning, credit rating. (can a theorem that you mentioned on facebook be used to predict your political afiliation?)

** note the name of hypatia. libraries are sometimes murdered for a purpose. we know 5 tragedies by aeschyllus, and i think about 80 titles, but there were allegedly 200 of them in the catalog. one more godless than the other. destroying the web as the tower of babel will undoubtedly become an attractive proposition. eg to disrupt a global conspiracy of category theorists. and it might be not as hard as it used to be. the original internet was designed to survive the fragmentation in case of a nuclear war. but a network resilient to fragmentation is suboptimal for monetizing...

** memory is based on forgetting (cf funes the memorious). if you record all that happens, it all becomes noise. how should the web select what to remember? ants amplify shorter paths using pheromons. dropbox can drop unused links...

QUESTIONS:

1) what might be a good architecture for a community to store old archives. should everyone donate a bit of memory, and someone writes a simple private cloud module? also multiparty?

2) how should a network version of long term memory be managed?

thoughts?

David Michael Roberts (May 31 2020 at 06:07):

how to preserve network content.

See for instance the evaporation of Google+. For example: I'm glad Lieven le Bruyn saved his posts working through the complete details of a Frobenioid and re-hosted them on his blog, but this is merely one example among many excellent mathematical discussions that are now either gone, or saved in a zip file on someone's personal computer, if they were diligent in saving their timeline before the end.

Daniel Geisler (May 31 2020 at 10:14):

I recommend we focus on mathematics and not reinvent technology. I'm looking at creating a CT torrent. N'uff said?

Pastel Raschke (May 31 2020 at 10:42):

archive.org, arxiv, libgen, ipfs, dat, bittorrent, upspin and perkeep (not sure how good these are at distributed access)

torrents have a huge problem in that they are fixed file trees, and making a torrent for every paper would be heavy on metadata and likelihood of availability, usually stopgapped by periodically forming aggregated chunks. ipfs/infs and dat both have granular addressing and versioning.

programmers have been working on the necessary technology for a while. the other problem, which comes down to resources, is robust hosting, both the content itself and whatever indexes are useful to organize search and access.

Fabrizio Genovese (May 31 2020 at 11:37):

IPFS is probably the most resilient way to store content right now, but I don't know how practical it is for a paper/math repo

Grant B (May 31 2020 at 16:44):

What was Jason Erbele's progress on the LaTeX document? I would be willing to contribute to this effort as I had recently discovered TWF's.

Grant B (May 31 2020 at 16:46):

have*

John Baez (May 31 2020 at 18:22):

@Grant B - umm, he did the first 5 or so. You can contact him at

erbele@math.ucr.edu

When he did this I was swamped with other work and not able to give him much help.

dusko (May 31 2020 at 18:46):

Fabrizio Genovese said:

IPFS is probably the most resilient way to store content right now, but I don't know how practical it is for a paper/math repo

great, yes, the functions of IPFS are definitely needed. but as far as i can tell, IPFS seems distributed and anonymous, but is it really resilient? could i not seed it with malware from within that would, say, flood and overload all nodes? can it be resilient without any form of reputation or authentication? but yes, a persistent memory will have to be some sort of ledger. is that an interesting question to pursue? seems like a question thst naturally leads into categorical crypto :) as a security proof would need to be at 3 levels at least

dusko (May 31 2020 at 19:05):

Pastel Raschke said:

archive.org, arxiv, libgen, ipfs, dat, bittorrent, upspin and perkeep (not sure how good these are at distributed access)

torrents have a huge problem in that they are fixed file trees, and making a torrent for every paper would be heavy on metadata and likelihood of availability, usually stopgapped by periodically forming aggregated chunks. ipfs/infs and dat both have granular addressing and versioning.

programmers have been working on the necessary technology for a while. the other problem, which comes down to resources, is robust hosting, both the content itself and whatever indexes are useful to organize search and access.

very good! so i ask myself: is the problem not already solved by archive.org, arxiv and libgen? i use all 3 every day. why do i then ask this question? well, in theory at least, each of them could be sold to elsevier. they are not distributed. but it also gives rise to a more serious question. is there a solution that will not take into account the incentives? who has an incentive to maintain public goods. ((BTW, for that reason the projects that are concerned with privacy seem to be going in a different direction.))

Daniel Geisler (May 31 2020 at 20:11):

@dusko said:

who has an incentive to maintain public goods.

That is the social service project I've taken upon myself, although from the comments on this thread I see I need to up my game. I help several different groups of yogis preserve their work. I live in Eugene, Oregon in the North West US. This is one of the richest bioregions in the world, and as a result it supports more endangered native languages than anywhere in the world. Lots of potential good technical service projects.

John Baez (Jun 01 2020 at 00:08):

By the way, I hope everyone here knows this github site:

Matt Earnshaw, The collected works of F. W. Lawvere.

It's very good but it's not complete - it lists more works that it actually has. So, anyone who has access to these works should contribute them!

Fabrizio Genovese (Jun 01 2020 at 00:27):

dusko said:

Fabrizio Genovese said:

IPFS is probably the most resilient way to store content right now, but I don't know how practical it is for a paper/math repo

great, yes, the functions of IPFS are definitely needed. but as far as i can tell, IPFS seems distributed and anonymous, but is it really resilient? could i not seed it with malware from within that would, say, flood and overload all nodes? can it be resilient without any form of reputation or authentication? but yes, a persistent memory will have to be some sort of ledger. is that an interesting question to pursue? seems like a question thst naturally leads into categorical crypto :slight_smile: as a security proof would need to be at 3 levels at least

For sure you can host malware-laced files on IPFS, exactly as you can do on bittorrent or with the PDFs you host on github. For me "resilient", in this particular case, means "decentralized". Which means that as long as at least one person seeds the files, it cannot be shut down. The issue of hosting community, "public" files on platforms as arXiv and github is that they are owned by someone, be it a university or a private company. And this someone can decide to shut everything down and leave everyone hanging.

Fabrizio Genovese (Jun 01 2020 at 00:29):

You may say "well, arXiv won't shut down all of a sudden destroying all of its logs forever" and I agree, but still, I prefer to use something that can mathematically guarantee me that this does not happen instead of having to trust some institution. So for the real "public" stuff (as in "owned by the community") p2p file networks as IPFS are the only reasonable and "ethical" way to go.

dusko (Jun 01 2020 at 08:02):

Fabrizio Genovese said:

You may say "well, arXiv won't shut down all of a sudden destroying all of its logs forever" and I agree, but still, I prefer to use something that can mathematically guarantee me that this does not happen instead of having to trust some institution. So for the real "public" stuff (as in "owned by the community") p2p file networks as IPFS are the only reasonable and "ethical" way to go.

i just said above that arxiv and archive are not a solution because they can be sold to elsevier. and i also appreciate the information that you can always store PDFs on github. but the IPFS design assumption, that there is no need for trust because there is no trusted 3rd party, is very naive. resilient could only mean decentralized if attacks had to be centralized. are you in contact with the IPFS designers?

Fabrizio Genovese (Jun 01 2020 at 08:49):

Well, what I mean is that, for instance, GitHub does not really test for malware in the repos you host. So from the point of view of using a repo as a possible attack vector, I do not see IPFS as less secure than other solutions.
Can you give me an example of an attack using the decentralized nature of IPFS to succeed? I'm not sure I'm really understanding what you mean by "attack" here

Fabrizio Genovese (Jun 01 2020 at 08:50):

Yes, we vaguely know them, but there are other people, as @davidad (David Dalrymple), which know them way better

Grant B (Jun 01 2020 at 11:22):

John Baez said:

Grant B - umm, he did the first 5 or so. You can contact him at

erbele@math.ucr.edu

When he did this I was swamped with other work and not able to give him much help.

Thank you, I reached out to him. I will keep this thread posted on any progress I make.

Jason Erbele (Jun 05 2020 at 13:21):

Grant B said:

John Baez said:

Grant B - umm, he did the first 5 or so. You can contact him at

erbele@math.ucr.edu

When he did this I was swamped with other work and not able to give him much help.

Thank you, I reached out to him. I will keep this thread posted on any progress I make.

The immediate progress is that Grant's reaching out to me finally spurred me to figuratively get off my butt and get on zulip. I'd say the first three TWFs are done, and two more are "basically" done, only needing ASCII graphics to be redrawn in TikZ. So "the first 5 or so" was actually remarkably accurate. I did a bit of Spring cleaning, so the Overleaf project I started for TWF is a little bit more organized than it was a week ago, but it'll probably be later this month when I'll have time to make another appreciable dent.

Simon Burton (Jun 05 2020 at 13:46):

@Jason Erbele Would you like to put your work on github ? It seems like this project could be worked on collaboratively...

Jason Erbele (Jun 05 2020 at 14:04):

@Simon Burton
I don't have any objections, per se, but I don't see what the advantage of github would be. And a major obstacle would be that I don't know how to use github. The project is on Overleaf, which is a collaborative platform with version control already, plus it has the LaTeX compiler built in. I can supply a link that anyone can use to join and edit.
But first, I need to get some sleep – it's already 7am here. :grimacing:

John Baez (Jun 05 2020 at 16:45):

Hi, Jason! It's great to see you here and I hope you and your family are doing okay.

If you remind me of the Overleaf link I can put the first few remastered TWFs on my website!

Matteo Capucci (he/him) (Jun 05 2020 at 17:37):

(Overleaf can be used as a git repo, for the record)

Jason Erbele (Jun 05 2020 at 18:38):

Link to TWF on Overleaf, read-only: https://www.overleaf.com/read/sspmkpykyvhr
Editable: https://www.overleaf.com/5857655535xtmhnqbvvkjr

John Baez (Jun 05 2020 at 19:01):

Thanks!

Robert Furber (Jun 07 2020 at 13:56):

John Baez said:

I'm too busy to install a news-reader and download these, but if someone does, they could put these files in a location that's 1) stable, 2) easier to access.

I put an explanation of how to access it with Emacs here: https://meta.mathoverflow.net/a/4583

I think someone who knows how to program in Emacs Lisp could download a full copy, but I think it's illegal by the letter of the law (unless you asked each and every poster individually for their copyright). So it would have to "fall off a lorry/truck".

Robert Furber (Jun 07 2020 at 14:41):

I think storing these things legally in the US would involve one of these "safe harbour" provisions: https://en.wikipedia.org/w/index.php?title=Online_Copyright_Infringement_Liability_Limitation_Act&oldid=954682139#Safe_harbor_provision_for_online_storage_-_%C2%A7_512(c)

Mike Stay (Sep 09 2020 at 16:18):

In my gmail account, I have the categories mailing list archive since 1/1/06. I can't think of any reason I would have deleted any particular messages, but I guess it's possible I'm missing a few. I can export the whole thing via IMAP. Does anyone have a place to host it?

Mike Stay (Sep 09 2020 at 16:20):

Oh, looks like someone else did the same thing and put it here: http://arrowtheory.com/mirror/categories.tgz

Valeria de Paiva (Sep 09 2020 at 23:19):

Mike Stay said:

Oh, looks like someone else did the same thing and put it here: http://arrowtheory.com/mirror/categories.tgz

Yes @Mike, @Simon Burton has gotten both the archives online on GitHub https://github.com/punkdit/categories yay!!!

John Baez (Sep 10 2020 at 23:33):

Yay, @Simon Burton!

David Michael Roberts (Sep 11 2020 at 07:24):

The trick is now trying to extract the headers of each mail and make an index, maybe even a threading system, like at fom (eg https://cs.nyu.edu/pipermail/fom/2020-August/)

Simon Burton (Sep 11 2020 at 11:34):

@David Michael Roberts Right... The github repo is pretty much unusable (not user friendly) at the moment... I wonder if there is some kind of python library for doing something like this (parsing headers & generating a threaded index). Also, I'm wondering if/when google will see these messages..

Avi Levy (Sep 16 2020 at 04:24):

This open-access book has some python scripts that might be a useful starting point: http://www.opentextbooks.org.hk/zh-hant/ditatopic/6826