Summary: Researchers have developed natural language embedded programs (NLEP), enabling AI models to solve complex tasks by generating and executing Python programs.
This method increases accuracy in reasoning tasks and improves transparency by allowing users to inspect and debug the code. NLEPs also improve data privacy by processing information locally.
Key facts:
- NLEPs encourage AI to create Python programs to solve complex tasks.
- The method improves accuracy and transparency, allowing code inspection.
- NLEPs improve data privacy by processing information locally.
Source: myth
Large language models such as those powering ChatGPT have shown impressive performance in tasks such as drafting legal briefs, analyzing the sentiment of customer reviews, or translating documents into different languages.
These machine learning models typically use only natural language to process information and answer questions, which can make it difficult to perform tasks that require numerical or symbolic reasoning.
For example, a large language model may be able to memorize and recite a list of recent US presidents and their birthdays, but the same model may fail if asked the question “Which US presidents were elected after the year 1950 were born on Wednesday?” (The answer is Jimmy Carter.)
Researchers from MIT and elsewhere have proposed a new technique that enables large language models to solve natural language, math and data analysis, and symbolic reasoning tasks by generating programs.
Their approach, called natural language embedded programs (NLEPs), involves prompting a language model to create and execute a Python program to solve a user’s question, and then output the solution as natural language.
They found that NLEPs enabled large language models to achieve higher accuracy on a wide range of reasoning tasks. The approach is also generalizable, meaning that an NLEP application can be reused for multiple tasks.
NLEPs also improve transparency, as a user can check the program to see exactly how the model reasoned about the question and adjust the program if the model gives an incorrect answer.
“We want AI to perform complex reasoning in a transparent and reliable way. There is still a long way to go, but we have shown that combining programming and natural language skills in large language models is a very good potential first step towards a future where people can fully understand and trust what’s going on inside their AI. model,” says Hongyin Luo PhD ’22, a postdoc at MIT and co-author of a paper on NLEPs.
Luo is joined on the paper by co-lead authors Tianhua Zhang, a graduate student at the Chinese University of Hong Kong; and Jiaxin Ge, an undergraduate at Peking University; Yoon Kim, an assistant professor in the Department of Electrical Engineering and Computer Science at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); senior author James Glass, senior research scientist and head of the Spoken Language Systems Group at CSAIL; and others. The research will be presented at the Annual Conference of the North American Chapter of the Society for Computational Linguistics.
Solving problems with programs
Many popular big language models work by predicting the next word or token, given some natural language input. While models like GPT-4 can be used to write programs, they embed those programs within natural language, which can lead to errors in the program’s reasoning or results.
With NLEPs, the MIT researchers took the opposite approach. They prompt the model to generate a step-by-step program entirely in Python code, and then embed the necessary natural language within the program.
An NLEP is a four-step problem-solving template. First, the model calls the necessary packages or functions it will need to solve the task. The second step involves importing natural language representations of the knowledge the task requires (such as a list of US presidents’ birthdays).
For the third step, the model implements a function that calculates the response. And for the last step, the model outputs the result as a line of natural language with an automatic visualization of the data, if necessary.
“It’s like a digital calculator that always gives you the correct calculation result as long as the program is correct,” says Luo.
The user can easily probe the program and fix any errors in the code directly instead of having to re-run the entire model to troubleshoot.
The approach also offers greater efficiency than some other methods. If a user has many similar queries, he can generate a basic program and then replace some variables without having to run the model repeatedly.
To prompt the model to generate an NLEP, the researchers give it a general instruction to write a Python program, provide two NLEP examples (one with math and one with natural language), and a test question.
“Typically, when people do this kind of push with few shots, they still have to draft requirements for each task. We found that we can have a multitasking requirement because it’s not a requirement that teaches LLMs to solve one problem, but an order that teaches LLMs to solve multiple problems by writing a program,” Luo says.
“Having language models reason with code unlocks many opportunities for tooling, production validation, more structured understanding of the model’s capabilities and mindset, and more,” says Leonid Karlinsky, principal scientist at MIT-IBM Watson AI Lab.
“There is no magic here”
NLEPs achieved greater than 90 percent accuracy when prompting the GPT-4 to solve a variety of symbolic reasoning tasks, such as tracking mixed objects or playing a game of 24, as well as following directions and text classification tasks.
The researchers found that NLEPs even exhibited 30 percent greater accuracy than task-specific prompting methods. The method also showed improvements over open source LLMs.
Along with increasing the accuracy of large language models, NLEPs can also improve data privacy. Since NLEP programs run locally, sensitive user data does not need to be sent to a company like OpenAI or Google to be processed by a model.
In addition, NLEPs can enable small language models to perform better without having to retrain a model for a given task, which can be a costly process.
“There is no magic here. We don’t have a more expensive or fancy language model. All we do is use program generation instead of natural language generation, and we can make it perform significantly better,” says Luo.
However, an NLEP relies on the model’s program generation capability, so the technique does not work as well for smaller models that are trained on limited data sets.
In the future, the researchers plan to study methods that can make smaller language models generate more effective NLEP. In addition, they want to investigate the impact of instantaneous variations in the NLEP to increase the robustness of the model’s reasoning processes.
Funding: This research was supported, in part, by the Hong Kong Center for Perceptual and Interactive Intelligence.
About this news about artificial intelligence research
Author: Adam Zewe
Source: myth
Contact: Adam Zewe – MYTH
Image: Image is credited to Neuroscience News
Original research: The findings were presented at the Annual Conference of the North American Chapter of the Society for Computational Linguistics
#Tool #Blends #Programming #Language #Problem #Solving #Neuroscience #News
Image Source : neurosciencenews.com