They train AI with Hispanic data


BUENOS AIRES (AP).— Yesterday Chile launched the first major linguistic model of artificial intelligence trained with Latin American cultures and dialects, with the aim of reflecting regional realities and strengthening the Hispanic presence in the global race for AI.

The open source project, based on a titanic set of data from the region that has gone unnoticed, seeks to increase the accessibility of artificial intelligence models and ensure that they better reflect Latin American realities in a world where the United States, China and the European Union continue to be the biggest beneficiaries of the technological race.

Latam-GPT was developed by Chile’s National Center for Artificial Intelligence (Cenia) over the past two years, in collaboration with 30 institutions from countries throughout the region, including Mexico, Argentina, Brazil, Colombia, Ecuador, Peru and Uruguay.

“Artificial intelligence is the greatest technological revolution of recent times and in Latin America and the Caribbean it is strategic and urgent that we play a role,” declared the president of Chile, Gabriel Boric, at the launch of the project.

He added that it will be key to incorporating Latin American data and identity into AI. “Technological and cultural sovereignty so that our region can be part of this global conversation.”

The project, announced at the Artificial Intelligence Action Summit in Paris last year, began in January 2023 with the goal of addressing inaccuracies in AI models trained largely on English data. Latam-GPT functions as a tool for the development of future applications, rather than being a direct competitor to existing consumer-oriented products such as ChatGPT and Google’s Gemini.

“Latam-GPT is trained with a proportion of Latin American data that did not exist on the Internet and that was not included in previous models. This allows for more precise, correct and efficient performance when it comes to Latin America and the Caribbean,” said Rodrigo Durán, executive director of CENIA.

Data sources

Latam-GPT uses data from private sources obtained through strategic alliances throughout the region, as well as synthetic data to address underrepresented areas, explained Gabriela Arriagada, Cenia researcher and head of the project’s ethics team.

The development of Latam-GPT required the collection of more than eight terabytes of information, which is equivalent to millions of books.

“When we talk about incorporating Latin American culture, we are talking about a training vision that allows us to take charge of data that represents cultural realities, understand where the gaps are in other models, where they fail, and gain knowledge to improve that representation,” added Arriagada.

For now, the project will operate mainly in Spanish and Portuguese, and it is planned to incorporate indigenous languages ​​later.

The development of Latam-GPT means that the region now has the technical capacity to develop AI models, according to Rodrigo Durán. “The fact that Latin America has come together to form a collaborative group is a very positive sign,” he considered.

“It shows that Latin America can develop and understand how to create this technology, which also has important implications for regulation, because you cannot regulate something that is not understood.”

The race for leadership in AI has led countries to rethink their policies and initiatives to develop technologies in the field. The United States, China and the European Union concentrate more than half of the world’s most important data centers for developing AI systems, according to data published by the University of Oxford.

Africa and South America have almost no AI centers, according to the report.

Danger of falling behind

In recent years, Chile has accelerated its efforts to expand its role in the AI ​​boom, attracting new talent and building data centers. In June last year, President Boric said the country must start adopting AI, adding that “a country that does not invest in artificial intelligence today risks being left behind on the global stage of tomorrow.”

The creation of Latam-GPT “is a very important milestone for Latin America” by incorporating data from all countries in the region, according to Luis Chiruzzo, professor at the Faculty of Engineering at the University of the Republic in Uruguay.

However, the academic warned that it will be difficult for the model to compete directly with large technology corporations, which have many more resources. “Even so, it is a significant advance and will allow the region to begin to position itself in the development of language models with its own voice,” he added.

Latam-GPT was developed with only $550,000 in financing, coming from Cenia and the Development Bank of Latin America (CAF). The team used the Amazon Web Services cloud to develop its first version, which will be released later this month.

Later versions will be trained on a supercomputer at the University of Tarapacá in northern Chile, which costs approximately $4.5 million, starting in the first half of this year.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *