Tuesday, February 4, 2025

After creating 2 million GPT tokens, this UNILAG student has built an AI text-to-speech model with Nigerian accent


















In November 2024, when I asked Saheed Azeez how difficult it was to create Naijaweb — a dataset of 230 million GPT-2 tokens based on Nairaland — he brushed it off as something simple. "It's just web scraping," he said.

However, in my latest conversation with him, his new passion project seems to have pushed him further. He calls it YarnGPT, a text-to-speech AI model that can read text aloud in a Nigerian accent.

In a world where AI can generate lifelike voices in seconds, a text-to-speech model with a Nigerian accent might not seem groundbreaking at first. But when you consider two things, it becomes a big deal.

First, Azeez is a Nigerian university student with limited resources. Second, developing a model that accurately captures the nuances of a Nigerian accent is technically challenging.

From tokenising audio to the many mathematical concepts Azeez referenced while explaining the process, it was clear that this wasn’t a simple task. Even Azeez, in his usual fashion, didn’t downplay the effort involved.

"It was quite tasking, especially gathering the data needed to make this happen."


How YarnGPT was created

Inspired by the success of Naijaweb, Azeez was eager to build something new. "The amount of conversations and interest people had in Naijaweb was a great motivation. Imagine getting featured on Techpoint Africa; it motivated me to do this."

He was also motivated by failure. Before starting YarnGPT, he had applied for a job at a Nigerian AI company but didn’t perform as well in the interview as he had expected.

YarnGPT became the project that would help him improve his skills and increase his chances of securing such roles in the future.

Building an AI model that sounds Nigerian required gathering a vast amount of Nigerian voices.

"I used some movies that were available online. I extracted their audio and subtitles."

Nollywood produces over 2,500 movies a year, and with many filmmakers uploading their work to YouTube, it seemed like Azeez had plenty of data to work with. But in reality, he had almost none.

"The problem with building in Nigeria is data. Replicating what has been built overseas isn’t that hard, but data always gets in the way."

While there are thousands of movies for him to choose from the audio wasn't up to the standard he wanted, and their subtitles were inaccurate. To compensate, Azeez turned to Hugging Face, an open-source platform for machine learning and data science. He combined the audio from Nigerian movies with high-quality datasets from Hugging Face to train his model.

The next step was training the AI model, but without access to his own GPU, he had to rely on cloud computing services like Google Colab. This cost him $50 (₦80,000) — a significant amount for a university student. Unfortunately, it was a waste.

"The model I built wasn’t working well, and the $50 cloud credit was burnt just like that. It was painful for me."

Determined to find another way, he discovered Oute AI, a platform that had developed a text-to-speech model in an autoregressive manner.

"The way the model works is, you give it a piece of text, and it predicts one word at a time. It takes that word, adds it back to the text, then predicts the next one — kind of like how ChatGPT completes sentences. That’s what makes it autoregressive."

While I found the autoregressive framework difficult to understand, Azeez pointed out that it simply gave him better results.


Maths, tokenisation, and the hard part of YarnGPT

Oute AI provided a structure, but Azeez still had to build his own model. He took a language model called SmolLM2-360M from Hugging Face and added speech functionality to it, a process that involved major algorithmic changes.

After this, the final-year Mechanical Engineering student at the University of Lagos had to spend another $50 to train the model. The training took three days.

Interestingly, like he pointed out when he created Naijaweb, AI models need data to be tokenised. Large language models (LLMs) understand numbers, not words, so tokenisation converts words into numerical representations.

"If we were to tokenise the word CALCULATED, for example, we could split it into four tokens: CAL-CU-LA-TED. A number is assigned to each token."

Meanwhile, tokenizing audio is different.

"Tokenizing audio is basically breaking down continuous sound waves into smaller, manageable pieces that a model can understand and process. Unlike text, which has clear breaks between words, audio is continuous—there are no natural pauses in a raw waveform.

"So, the model needs to convert the sound into a sequence of discrete values, kind of like turning a long speech into tiny puzzle pieces. These smaller audio tokens can then be used to train the AI, and later, the model can reassemble them to generate speech that sounds natural."

This entire process was made possible by a wave tokenizer. Using resources from Hugging Face, Oute AI, and other Nigerian repositories, Azeez was able to create YarnGPT.


Publicising YarnGPT

Azeez might be a nerd, but he isn’t afraid to put himself in front of a camera to showcase his work. In a two-minute video, he explained YarnGPT and caught the attention of 138,000 people on X (formerly Twitter), including Timi Ajiboye, Co-founder of Hellicarrier (formerly BuyCoins).

Creating YarnGPT was difficult, but making the video was another hurdle.

"I called my friend and logistics manager, Aremu, and told him I wanted to make a video. We reached out to another friend who had a camera he wasn’t even using, and then we went to yet another friend’s house to record.

"We rearranged the whole house and used their TV as the background. His mum wasn’t too pleased when she returned."

The results were worth it. The video got thousands of views across social media, and people began testing YarnGPT. The model could not only pronounce English in a Nigerian accent but could also read Nigerian languages—Hausa, Igbo, and Yoruba.

It has various applications. Content creators can use it for voice-overs in Nigerian accents, Google Maps could provide directions in Nigerian languages, and it could even enhance accessibility for non-English speakers.


Nigeria and the AI race

While innovators like Azeez and American-born Ijemma Onwuzulike (creator of Igbo Speech) are developing exciting AI models, Nigeria remains far behind in the AI race. The industry has evolved beyond a hobbyist’s playground into a battleground for global superpowers, with the U.S. government committing $500 billion to AI development.

Meanwhile, AI breakthroughs like DeepSeek have shaken up Wall Street, causing giants like Nvidia to lose billions in market value due to new competition.

Even Azeez acknowledges Nigeria’s position.

"Honestly, we’re way off. We’re not even in the race. The big AI models today — like OpenAI’s or the ones from China — are trained on massive datasets with huge computational resources, things we don’t have here."

But he remains optimistic.

"I think there’s a way forward. Instead of trying to build from scratch, we can focus on localising AI for our own needs. We can take what’s already been built and adapt it for Nigerian languages and accents. That’s how we can start catching up."

Nigeria’s Minister of Communications and Digital Economy, Bosun Tijani, has been vocal about positioning the country as a key player in AI development. Perhaps, with talents like Azeez, there is hope.

By Bolu Abiodun, TechPoint Africa

Nigeria moves to restart oil production in vulnerable region after Shell sells much of its business

The Nigerian government is in talks with local communities to restart oil production in a region that’s previously suffered environmental damage after oil giant Shell’s sale of its onshore business in the country.

Shell’s $2.4 billion sale of its onshore business to a group of local companies was confirmed last week by Nigeria’s special advisor to the president on energy, Olu Verheijen. It marks the end of the of the London-based energy giant’s nearly century-long operations in the onshore Niger Delta region, where it faces long-running complaints of environmental pollution.

Now a potential restart of oil production Ogoniland region in southern Nigeria, where Shell halted its operations in 1993 following violent protests over allegations of widespread environmental damage and human rights abuses, has been earmarked by government officials as a potential way of increasing its foreign exchange earnings.

“The broad consensus in Ogoni is in favor of restarting production,” said Ledum Mitee, a veteran environmental activist and former president of the Movement for the Survival of Ogoni People.


Western oil companies are retreating from Nigeria

A number of Western oil companies, including ExxonMobil, Eni, Equinor, and TotalEnergies — and now Shell — are retreating from Nigeria.

They are mostly moving offshore and limiting their exposure in the West African nation’s Delta region where oil spills have fouled rivers and farms and exacerbated tensions in a region that has faced years of militant violence.

Shell’s sale was delayed following protests by communities and activist groups, including Amnesty International and the Dutch non-profit Centre for Research on Multinational Corporations (SOMO), demanding that Shell clean up first.

The terms of the deal on addressing the environmental damage left by Shell are not publicly available. Isaac Botti of Social Action, a Nigerian group that organized protests against Shell’s sale, said his organization had requested terms of the agreement the Nigerian Upstream Petroleum Regulatory Commission signed with Shell and the new owners, Renaissance Africa Energy Company. The regulator did not respond to The Associated Press’ request for comment.

Shell previously told AP that the transaction was designed to preserve the company’s role to “conduct any remediation as operator of the joint venture where spills may have occurred in the past from the joint venture’s operations.”


Environmental damage is still a concern

Scientific studies have found high levels of chemical compounds from crude oil, as well as heavy metals, in the delta, where the industry largely drives Nigeria’s economy but can leave communities’ water sources slick with contaminants.

A cleanup exercise in Ogoniland advised by the United Nations Environment Programme and largely funded by Shell is largely mismanaged, according to U.N. documents obtained by AP.

Activists say they want to see more dialog before any oil production in the region resumes. “I think the president got it right in not imposing solutions but insisting on” consultations on local terms and conditions to resume production, said Mitee, the environmental activist.

By Taiwo Adebayo, AP

Nigeria to block oil export permits for producers who do not fill refinery quotas

Nigeria's upstream oil regulator said on Monday it would deny export permits for oil cargoes from producers who fail to meet their stipulated supply quota to local refineries, including the Dangote Refinery, Africa's largest.

Nigeria's oil industry law, the Petroleum Industry Act, mandates oil producers, including international oil companies, to dedicate specific volumes of crude for domestic refineries before exporting, a requirement called the domestic crude supply obligation.

However, oil producers say they have not complied with this stipulation because refiners are not offering competitive prices. This has prompted the Dangote Refinery to call on the regulator to enforce the law.

A statement on Monday from the Nigerian Upstream Petroleum Regulatory Commission said Gbenga Komolafe, its head, wrote to oil exploration and production companies to remind them of their obligations and penalties for default.

The commission said it met last week with producers and refiners. In the meeting, refiners blamed producers for not honouring their obligations under the supply obligation, while the producers said refiners are offering insufficient prices, forcing them to explore other markets, Komolafe said in the statement.

Komolafe warned "the diversion of crude cargo designated for domestic refineries is a contravention of the law and the Commission will henceforth disallow export permits for designated crude cargos for domestic refining."

For the first half of 2025, Nigerian refineries say they will require 770,500 barrels of crude per day, with the Dangote Refinery forecast to require 550,000 bpd, according to a schedule published by the oil regulator.

By Camillus Eboh, Reuters

Monday, February 3, 2025

Video - Nigeria launches road safety review following oil tanker accidents



Thousands have died in oil tanker accidents in Nigeria over the years, with many of the victims being individuals who rush to the crash sites to scoop up spilled oil. In response, the Nigerian government has allocated 500 million U.S. dollars to improve the country’s road infrastructure.


THAI WOMAN’S ROLE IN NIGERIA FRAUD RING ENDS IN HAI YAI AIRPORT ARREST









Thai Police have apprehended a Thai woman at Hat Yai International Airport for her involvement in a massive romance scam operation run by her Nigerian husband, with fraudulent transactions totaling over 6.2 billion baht ($175.3 million USD). The arrest comes five years after the initial warrant was issued.

Crime Suppression Division officers arrested Ms. Orathai, 52, at the international arrival terminal of Hat Yai International Airport in Khlong Hoi Khong district, Songkhla province. She was wanted on an arrest warrant issued by the Criminal Court on May 7, 2020, on charges of criminal association, participation in transnational organized crime, conspiracy to commit fraud by impersonation, and money laundering.

The investigation revealed that in 2017, Orathai worked at a massage parlor in Malaysia, where she met her Nigerian husband through a colleague’s introduction. After their marriage, she lived with him while he covered all her expenses, allowing her to stop working.

Two years into their relationship, her husband claimed he wanted to establish a business in Thailand and requested her help in opening bank accounts for financial transactions, explaining that as a foreigner, he couldn’t do so himself. He offered 6,500 baht per account opened. The suspect proceeded to open multiple accounts and handed over complete control of deposits and withdrawals to her Nigerian husband.

Police investigators discovered that these bank accounts were linked to a major fraud case involving Ms. Chamanant and associates, who had illegally transferred money from a company through multiple transactions, causing damages totaling 6,223,872,674.31 baht.

Authorities arrested Orathai when they learned she was traveling from Malaysia to visit her children in Thailand. During questioning, the suspect made a full confession to all charges. The case has been transferred to investigators for further legal proceedings.




Huntsville man admits to laundering money for Nigerian sextortionists