Category Theory
Zulip Server
Archive

You're reading the public-facing archive of the Category Theory Zulip server.
To join the server you need an invite. Anybody can get an invite by contacting Matteo Capucci at name dot surname at gmail dot com.
For all things related to this archive refer to the same person.


Stream: practice: software

Topic: Future of Software Development


view this post on Zulip Uroš Nedić (Oct 11 2020 at 22:49):

I hope it is not against the rules of this community to start one topic about future of software development. It will be one very broad discussion where I plan to put list of questions I collected over some period of time and I feel strong temptation to do so because there are very deep knowledge in this community so some answers might occur. In this post I ask for approval whether I have green light to do it?

view this post on Zulip John Baez (Oct 11 2020 at 22:54):

Starting topics is something everyone should do freely. Starting streams should be done more cautiously.

You might start your topic in the stream #practice: software.

view this post on Zulip Notification Bot (Oct 11 2020 at 22:55):

This topic was moved here from #learning: questions > Topos theory by John Baez

view this post on Zulip John Baez (Oct 11 2020 at 22:56):

In fact I just started a topic "software development" in the stream #practice: software, and moved your question there!

So, fire away!

view this post on Zulip Uroš Nedić (Oct 12 2020 at 00:34):

As someone who is practicing software development and reading papers in this field (from conferences and journals but never wrote one) and recently started to read papers in mathematics (CT mostly and know less than 10% of whole field but hopefully plan to improve it to 30% at the end of next year if I learn hard) I started to think about some broad questions mainly to be able to coordinate my learning and do my own knowledge management. Answers to these questions will put some structure in my learning, I hope, but also might help someone else, who knows.

As we all know we are now at the beginning of one very big transition in software development. Reason are well known - number of bugs in software synthesis is growing exponentially from number of lines of code. Put simply, code bases are (or soon will be) unmaintainable using current techniques and languages to write them.

Current paradigm of design programs incorporates three phases:

  1. Modelling software (or writing informal specification)
  2. Writing code
  3. Testing code

After all phases are completed we can conclude that software works as expected and it can be deployed (or delivered).

All three phases in this process have big pitfalls:

i. Modelling phase. Here we use real world language to describe what we want and then transform it to many different theories either from mathematics (like graph theory, matrix, number theory, etc.) either we invent them ad-hoc during this phase. All these theories together inconsistently describe our future program. Upon these theories we write algorithms that make our future program. These algorithms are also written in so-called pseudo-code which is also very unreliable syntax and semantics.

ii. Writing code phase. Cocktail of theories and pseudo-codes then we transform into one of known programming languages. These languages are also lacking exact theory because many of them started one paradigm then incorporate many others to be more attractive and usable. In this phase we encounter inconsistency in ability to express our already inconsistent models from former phase.

iii. Testing phase. At the end of the whole process we cannot do anything else but to end our journey to this phase i.e. to test our software and to check whether our fragile construction works.

After we finish it we cannot tell much more than our program runs on inputs we tested and that we have some vision how much memory (space) our software consumes and how fast (complex) it is. Answering how secure is even worse. Many developers even do not think about it!

To be able to address all these issues we have to bring consistency in all these phases. This is place where my questions start:

  1. If mathematics model our real world and CT model mathematics itself, does it mean that CT is a good candidate to reason in it about any task in software development we have.

  2. Are there other theories? Are they better or worse than CT for reasoning about software design? How many of them up to isomorphism are there?

  3. After we have one such theory is it possible to design our new programming language as some sort of internal logic of that theory but also satisfying following requirements:

    a. High level programming capability

    b. Low level programming capability

    c. Efficient (it makes sense to develop programs in it - counterexample is machine language)

    d. Fast (compiled version of code has to be comparable in speed as compiled version of corresponding C code)

    e. Specification capability (ability to write specifications i.e. phase i. models)

    f. Proof capability - Here we have to be able to write several types of proofs

    I. Space (how much MEM/HDD/CPU program consumes)

    II. Time (how long computation lasts, i.e. how complex it is)

    III. Correctness - Here we have two cases
    i. Algorithms - to prove correctness of algorithms.
    ii. Heuristics - to prove how heuristics are precise depending on input data, available memory, CPU power, and execution time.

    IV. Security (prove that system is secure and in case of breaking security how much of data are affected)

    Each proof has to deliver proofs about space/time/correctness/security of particular program AS IS and to provide additional proofs that these space/time/correctness (in case of heuristics)/security are the best possible for given specification.

    g. General purpose - ability to write any program for any CPU (standard, FPGA, Quantum, some not existing yet, etc).

  4. Can we synthesise programs from Specifications automatically using AI?

  5. Can we prove f.I f.II f.II and f.IV requirements automatically using AI?

  6. Does it make sense to design CPU based on CT primitives (CT-based CPU/ASIC)?

view this post on Zulip Jacques Carette (Oct 12 2020 at 14:18):

The fundamental problem here is that you are missing the most important phase [of course, you're not alone in this, way too much of software development misses it too]:

  1. Modeling the Problem

As long as the knowledge about the problem space, its specification and constraints, is informal, all the rest of the steps are built on quicksand. For some software, once you've modeled the problem adequately, a series of refinement steps will much more clearly give you a viable solution. Doug Smith's team at the Kestrel Institute already used a lot of CT starting in the mid-80s with exactly that in mind. In my opinion, they were too far ahead of their time, which is why the ideas did not catch on.

Put another way: as long as you focus on software as being about code, you're doomed. Software is a tool for obtain solutions to problems. You need to refocus on the problem, and its solution space as abstract entities first, then the realization of that as software.

Note that a number of people in software engineering are using different words to say the same thing. Some people in the Models community are slowly abstracting themselves from the mess that UML created (because it modeled software, not the problem). A lot of people who do DSLs understand this. Look up "domain modeling" and "domain engineering"; Dines Bjorner is a good one to look up there.

view this post on Zulip Uroš Nedić (Oct 12 2020 at 15:12):

Thank you for your post but I somehow merged your 0. phase and my i.phase.

I also was wondering whether Topoi are good candidates for this?

view this post on Zulip Uroš Nedić (Oct 12 2020 at 15:14):

Also UML is not my cup of tea.

view this post on Zulip Jacques Carette (Oct 12 2020 at 15:34):

Asking "is X a good solution for Y" only makes sense to me once the structure of Y has really been sufficiently investigated. So I have no idea if Topoi are good/bad/irrelevant here.

[Also, note that I disagree with your statement that "number of bugs are growing exponentially from number of lines of code". Yes, it grows. What the rate of growth is, AFAIK, is unknown. It is definitely not exponential. It may not even be as bad as linear! The reason for that is code bloat, not anything smart. ]

Category Theory and software development do relate to each other. CT, at its very foundations, is about compositionality. Scalable software development needs a way to break things up into reusable, well-understood pieces that can be assembled into larger pieces.

Note that we have far from explored all the meanings of 'pieces' and 'assembled' in software. In my personal opinion, we have been very narrow-minded about all of the different ways that one can break things up and assemble software. [That's why I've spent a lot of time doing meta-programming!] I don't actually think that meta-programming is a good solution. It's just a way to get outside the current constraints of the narrow modes of 'piece' and 'composition' offered by current languages.

view this post on Zulip Uroš Nedić (Oct 12 2020 at 16:13):

One of my friends works in Microsoft - Windows team. He says it is almost unmaintainable.

Linux already has ~30M LoC.

Regarding bugs, my estimation is not about discovered ones but for total number of them.

view this post on Zulip Jacques Carette (Oct 12 2020 at 16:24):

Ah, Microsoft, neat place. I worked there, a long time ago (1988-89). I have friends who have retired from there, rich.

Much has been written about the un-maintainability of Windows. One huge part: backwards compatibility. It kills all good designs. [I can't quite find the exact blog posts, but they were on MSDN.] Entropy is a big deal in software. Apple manages it differently than Microsoft, and that's a big deal.

I am 100% convinced that, from a pure information-theoretic point of view, there is much much less than 30M LoC's worth of information in the code base of Linux. I think Alan Kay's work at the Viewpoints Institute is a good entry point for thinking along the lines of needing 4 orders of magnitude less code to get the job done.

view this post on Zulip Fintan Halpenny (Oct 12 2020 at 20:26):

I'm also starting to see the value of that 0th step of Modeling The Problem. I'm starting to see it crop its head more and more. Conal Elliot tackles it when he talks about denotational design. Sandy Maguire is talking about it in Algebra Driven Design book.

I'm finding it hard to get into these better practices of think about the problem and design first.

  1. I think one issue is because I don't have tools to help me work through the solution. I can use pen and paper but I dunno when I make a mistake.
  2. I think the second issue is that I need to have a notation or way of communicating the design back to my colleagues.
  3. I have to convince others that THIS IS FEATURE WORK. :sweat_smile:

view this post on Zulip Jacques Carette (Oct 13 2020 at 12:06):

I've started to use a theorem prover (Agda in my case) for modeling. But you can use TLA+, Epsilon or Alloy too. I think Jay Parlar has some nice videos of what he was able to do with Alloy (at a company that first didn't believe in doing that).

view this post on Zulip Fabrizio Genovese (Oct 13 2020 at 12:44):

Fintan Halpenny said:

I'm also starting to see the value of that 0th step of Modeling The Problem. I'm starting to see it crop its head more and more. Conal Elliot tackles it when he talks about denotational design. Sandy Maguire is talking about it in Algebra Driven Design book.

I'm finding it hard to get into these better practices of think about the problem and design first.

  1. I think one issue is because I don't have tools to help me work through the solution. I can use pen and paper but I dunno when I make a mistake.
  2. I think the second issue is that I need to have a notation or way of communicating the design back to my colleagues.
  3. I have to convince others that THIS IS FEATURE WORK. :sweat_smile:

Actually, the whole idea of categorical graphical languages is exactly Problem Modelling

view this post on Zulip Fabrizio Genovese (Oct 13 2020 at 12:45):

There are also implementations. At Statebox the idea is to use Petri nets and string diagrams as modelling tools, and then having a whole infrastructure that turns these models into code

view this post on Zulip Fabrizio Genovese (Oct 13 2020 at 12:45):

So this should, in the long run, solve the problem of having only pen and paper to model a problem

view this post on Zulip Fintan Halpenny (Oct 13 2020 at 17:52):

I'll have to look into these more. I know of a few listed above but haven't dug in. The other thing I suppose is that modelling something in a different language and then having to translate it to Rust seems error prone

view this post on Zulip Uroš Nedić (Oct 15 2020 at 12:45):

Jacques Carette said:

I've started to use a theorem prover (Agda in my case) for modeling. But you can use TLA+, Epsilon or Alloy too. I think Jay Parlar has some nice videos of what he was able to do with Alloy (at a company that first didn't believe in doing that).

  1. Do you mean Epsilon calculus?

  2. It would be nice to lift Alloy mathematical foundations into Topoi and TLA+ and to see how they relate and does it make sense to combine them.

view this post on Zulip Jacques Carette (Oct 17 2020 at 15:58):

  1. No, I meant http://www.eclipse.org/epsilon/ for model-based engineering.
  2. There was a paper recently on the arxiv doing something quite similar (i.e. CT and TLA+). Sorry, can't find it right now.

view this post on Zulip Uroš Nedić (Oct 17 2020 at 22:19):

This one - http://eptcs.web.cse.unsw.edu.au/paper.cgi?ACT2019.11 ?

view this post on Zulip Jacques Carette (Oct 19 2020 at 14:37):

Yes!