Menu

Filter by
content
PONT Data&Privacy

0

The link between artificial intelligence (AI) and software engineering

Developments around data, algorithms, machine learning and artificial intelligence (AI) are rapidly following one another, especially since the launch of ChatGPT late last year. The field of software engineering is important here, as AI systems are made up of software at their core. And software engineering and AI also influence each other. A conversation about the relationship between AI and software with Geert-Jan Houben, pro-vice rector of AI, Data and Digitalization, leader of the TU Delft AI Initiative and professor of Web Information Systems, and Arie van Deursen, professor of Software Engineering at TU Delft's Faculty of Electrical Engineering, Mathematics and Computer Science (EWI). The latter is concerned with software testing, reliable AI and the human aspects of software engineering.

TUDelft 24 May 2023

Interviews

Interviews

Data, algorithms, machine learning, deep learning, ChatGPT, artificial intelligence; for starters, can you tell us what software has to do with this?

Arie van Deursen: Classic software development consisted of prescribing to a machine step by step what it should do. With the availability of large amounts of data, machine learning has taken off: we can learn a lot from data, such as patterns and behavior.

Geert-Jan Houben: You hear a lot about learning from data, you hear a lot about algorithms and that that requires data. But those ultimately run in software systems that have to be made for that. The fact that data and machine learning are now in software on a large scale does not change that. Ultimately, you have to make good software systems. Those software systems may have a different manifestation than they used to. But you can also apply the concepts of software engineering when engineering that new AI- or machine learning-based software. After all, it's still a process of designing, building, testing and releasing software for end users and clients, and we've been doing that at TU Delft for years.

You can also apply the concepts of software engineering to engineering that new AI- or machine learning-based software. After all, it's still a process of designing, building, testing and releasing software for end users and clients, and we've been doing that at TU Delft for years.

How has software engineering changed due to availability of data and AI?

Arie vvan Deursen: Originally the word algorithm means: I know exactly how to solve a problem and I write it out step by step in a series of instructions. In other words, a kind of recipe. Nowadays, with the word algorithm we mean a machine-learning algorithm, where you train an algorithm with all kinds of data. We call that a model, which can algorithmically make certain recommendations.

So an algorithm is a recipe for solving a problem in advance, step by step. And with AI and machine learning, we can now create self-learning algorithms. Do you have an example of this?

Arie van Deursen: Think about product recommendations in an online bookstore. Before, you wrote out an algorithm, for example: if someone is under 25, recommend books in the young adult category. And if someone is over 25, recommend books in the novels category. You then translated that algorithm into a programming language that we used to build software that followed exactly this recipe. Nowadays, with the availability of a lot of data, there's a self-learning way. With that, you can look at customers' past behavior, which books they look at, which ones they indicate as favorites, which ones they bought, and make recommendations based on that.

Once you had designed software in the past, then changes followed that had to be rolled out in new releases. With this self-learning way, doesn't that have to happen anymore?

Arie van Deursen: No, because it is still the case that existing software has to be renewed. In software engineering, you can apply learning algorithms to some extent. But not in all software is there anything to learn. Sometimes there is just legislation that tells how something should be done, and then we have to program it exactly that way. So you still keep that more classical, static part in software engineering.

Geert-Jan Houben: It's not a question of classical software versus machine-learned software, or even software versus AI. In software there are pieces that are pre-programmed and there are pieces that are machine-learned. Then there's the design challenge in that: how do you make sure that whole thing still functions properly and can be understood? Take a self-driving car. Part of the self-driving car's software, for example how the steering wheel should respond, you can program in advance. But there is also a part that might be too complicated to program in advance. Then you can choose to let the car's software actually collect data and start learning. So there's both in there.

It is not a question of classical software versus machine-learned software, or even software versus AI. There are pieces in software that are pre-programmed and there are pieces that are machine-learned. Then there's the design challenge in that: how do you make sure that that whole thing still functions properly and can be understood?

Is ChatGPT, and more generally: generative AI, hype or is there really something revolutionary going on?

Arie van Deursen: GPT-3 and now GPT-4, a large language model on which ChatGPT is based, emerged from a so-called Transformer paper from 2017. You could say that's when the revolution of generative AI started, especially in the world of data science. Since November last year, there is ChatGPT, which is very widely accessible to the general public, and works with a "chat interface" that is very pleasant. As a result, everyone can see how big the revolution is.

Geert-Jan Houben: There are two aspects that now come together: on the one hand, the large language models and, on the other hand, the interface, which makes the application attractive. Instead of getting a list of 10 or 20 answers, you now get 1, which is also written in a quasi-human form and can build on previous answers. Great strides are now being made in both aspects. What exactly they can mean we have yet to discover.

And in relation to software engineering?

Arie van Deusen: Programming languages turn out to be very similar to ordinary languages. They satisfy the same statistical properties as natural language.... So all the techniques that are in language models can also be applied to source code. In writing software, in fact, there is also a pattern. And you can machine learn that again. That's what we're doing research into at TU Delft. We look at how large language models help software developers to be more productive. For tasks that happen 90% of the time, these kinds of language models work well enough. Think of AI programming assistants like Github CoPilot and ChatGPT. There is, however, the problem of hallucination, as we call it with this type of language model. Answers given by AI tools sound very sensible, but are sometimes not correct or complete. The future lies in combining these types of language models with ways to verify the outcomes.

Programming languages appear to be very similar to ordinary languages. They satisfy the same statistical properties as natural language. So all the techniques that are in language models can also be applied to source code.

Geert-Jan Houben: This reminds me of the development that search on the Internet ("search") has gone through. First we searched by words or terms to find information. Then we discovered that using words alone did not always lead us to the right thing. For example, when searching for "apple," which returned both fruit and computers: you missed the meaning behind the word. The same seems to be happening now with large language models: first they are based on text representation. We will have to discover where that works and where it doesn't.

TU Delft works a lot with organizations to test in practice what is going on and conduct research on it. What kind of research do you do?

Arie vvan Deursen: With the research project AI for FinTech, we are looking at explainability, integration of different data sources and software engineering at ING. There are 50 thousand people working there, 15 thousand of whom are software developers. Among other things, we are looking there at using AI to test the software systems that ING builds. How can we ensure that testing takes less time and energy?

Software from large companies, government agencies and implementing organizations consists of millions of lines of code. If a developer needs to change a piece of code, it can help to find the person who has worked on it in the past, for example, to do a code review. What we did with organizations is we put their data about who worked on what when and who is present when into a model. That model then learned who is best to bring in for review. We know from previous research that it helps to give that person a "nudge" reminding them to review the piece of code. This speeds up the modification process. In addition to doing research with organizations like ING and Microsoft on improving software development processes, we also look at keeping software systems up and running in those processes. In doing so, we draw on run-time data, and incident data.

Geert-Jan Houben: The beauty of this kind of collaboration is that you can test real-world problems, and learn from them. In which we promote both science and impact in practice. The modern way of doing science. It provides insight and recommendations that improve, make more insightful and accelerate software and the software development process.

A word about that "nudge": so software engineering is about more than technology?

Arie van Deursen: Definitely. At TU Delft, we do empirical software development. That is about building software on the one hand, but on the other hand about understanding why that is sometimes difficult. We investigate that by analyzing data, but also by interviewing people. So although it's about something technical like software, it's mostly about people, and how to improve processes at scale.

AI automates the more predictable tasks and intensifies the more human-oriented tasks. The pre-programmed software and machine-learning software must work well together. I think it will make the field of software development more interesting.

What do developments such as generative AI mean for software engineering education at TU Delft?

Arie van Deursen: Programming will continue to exist, only in the future you can be helped more and more by AI programming assistants. We need to train students to work wisely with those kinds of tools. In programming education, students start with simple programming tasks, which they have to do without help. After all, if they create them right away with help from Github CoPilot or ChatGPT, they won't learn how the basics of programming works. But eventually they should be able to work with it, and value answers.

Geert-Jan Houben: To enable students and professionals to deal with this wisely, we need to research it. By looking at how ChatGPT makes suggestions, we can investigate what exactly is happening and what such a large language model bases its responses on. From our research, tips and tricks can then follow on how to use this type of tool.

Where do you see AI and software engineering evolving toward?

Arie van Deursen: Software development is people work. Their work revolves a lot around communication and natural language. Because of AI, the more predictable tasks are becoming increasingly automated. The more human-oriented tasks are becoming more intensive. Ultimately, you want to build the software so that it fits the user's and the client's needs, in which pre-programmed software and machine learning software work well together. I think it makes the field of software development more interesting. The development of simple systems becomes easier and easier. In that sense, it is even democratizing: soon anyone will be able to create software with a natural language as an interface. That is a higher goal we have always pursued, so I like that.

Share article

Comments

Leave a comment

You must be logged in to post a comment.