I’m not sure if this has been making headlines in the U.S. (though I read it’s being discussed on Reddit), but in Japan, the news about Cleveland-Cliffs’ CEO criticizing Japan has certainly caught attention. I’d like to share my personal thoughts on the matter.
The acquisition of a major domestic company by a foreign firm often becomes a political issue in Japan as well, so I can easily imagine what’s happening in the U.S. right now.
In Japan, there was public shock when Sharp was acquired by the Taiwanese company Foxconn. More recently, there was a case where Nissan was nearly acquired by Foxconn, but when Nissan rejected the deal, Honda stepped in as a potential buyer. While the outcomes in these cases may differ, I think the situation with U.S. Steel, Nippon Steel, and Cleveland-Cliffs has some similarities.
I feel that the fact both companies have their respective countries’ names—Nippon (Japan) and U.S.—in their names makes this issue even more sensitive.
This issue has already sparked emotional debates that feel somewhat out of place in the business world, such as Nippon Steel’s chairman openly criticizing President Biden by name. Now, with recent statements from Cleveland-Cliffs’ CEO, it seems the situation has escalated even further.
In Japan, there’s a phrase used to describe the weak who survive by clinging to the powerful—”goldfish poop.” I believe Japan is essentially the “goldfish poop” of the United States.
Nowadays, even individuals in Japan are using the country’s tax-free investment program (NISA) to scrape together funds from their living expenses to invest in U.S. stocks and mutual funds. This reflects it’s likely why Nippon Steel chose to invest not in East Asian companies, where extreme aging and declining birthrates make the future uncertain, but in reliable American businesses instead.
After losing the war, the Germans may have reflected on the fact that they started the war. However, I think the Japanese reflected instead on the fact that they defied the United States.
Perhaps Lourenco, being from Brazil, doesn’t fully understand that this mindset in Japan has remained unchanged throughout history. Or maybe he does understand but deliberately chose to express himself that way. Either way, I think it was a poor move. Sure, acquiring rival companies at a low cost would be ideal, but judging from the reactions within the U.S., it seems to have backfired.
Cleveland-Cliffs CEOのLourenco氏は”You did not learn anything since 1945″と発言しましたが、むしろ逆で80年経った今もどんなことがあってもアメリカは世界の中心で、アメリカの付き従わなければ日本の存続は実現しないと小学校、中学校から教え込まれている日本人は、アメリカの戦敗国であることを日本のアイデンティティとして捉えているわけで、だからどんどん縮小することが決まっている国内経済からできるだけアメリカに向けて投資しているわけです。
This article is an English translation of an article written by the author in Japanese at pickerlab.net.
The rapid pace at which large language models (LLMs) and generative AI tools such as ChatGPT, Gemini, and Claude are being released has created what can only be described as a competitive arms race in AI development. As a Japanese professional working in data science and natural language processing (NLP), I often find myself discussing these advancements with clients during casual conversations. Inevitably, the question arises: “How do domestically developed Japanese LLMs compare?”
To put it bluntly, my perspective—and one that I believe is shared by many others deeply involved in AI and NLP—is that it is virtually impossible for Japanese-developed models to compete on a global stage. At least, that’s my honest assessment.
To provide some context, back in the era when BERT was the dominant language model, international models often struggled with Japanese language comprehension. This left room for Japanese developers to create language models specifically tailored for Japanese, which had practical value within the domestic market. However, with the release of models like GPT-3.5, the situation changed dramatically. These international models demonstrated an astonishingly high level of Japanese language understanding, leading many experts, myself included, to feel that the role of Japanese developers in LLM development has significantly diminished.
The underlying reason for this shift lies in the way LLMs process language. These models convert natural language—whether it’s Japanese, English, or any other language—into numerical vector representations. Once the text has been accurately encoded as vectors, the processing that follows is largely language-agnostic. In other words, the distinction between languages becomes irrelevant at the computational level.
As a Japanese professional observing these developments, it’s clear to me that the technological gap between domestic and international players in AI has widened.
The Reality of Domestic LLMs in Japan
Developing large language models (LLMs) requires an immense investment, with costs reportedly ranging from tens to hundreds of billions of yen, and daily operational costs nearing 100 million yen. Frankly, no Japanese IT company has the financial capacity to sustain this level of investment. Even the largest IT firms in Japan would quickly fall into the red if they attempted it (and it’s worth noting that even OpenAI operates at a staggering deficit).
As a result, the general consensus among professionals in the AI and data science industries is to avoid engaging with domestic LLMs and not to place any expectations on them.
That said, for users who have recently taken an interest in AI due to the rise of generative AI, it might seem natural to wonder if Japanese-made AI would be better suited for use in Japan. Personally, however, I find it troublesome when such expectations are placed on me. When asked, “What about domestic LLMs?” during work discussions, I usually respond immediately with, “There’s no need to consider them.”
At the same time, I’ve started to feel that dismissing domestic LLMs outright without even trying them is somewhat unprofessional. As someone who is paid for their expertise, it’s not entirely fair to judge without direct experience.
With that in mind, I decided to conduct a very simple test of “Tsuzumi,” a domestically developed LLM by NTT Data, which is accessible through Azure OpenAI Service.
Testing Tsuzumi 7B, GPT-3.5 Turbo, GPT-4.0, and GPT-4o with 13 Questions
I conducted a simple accuracy test using 13 questions I selected from various sources, including the Japanese university entrance exam (Center Test), employment aptitude test (SPI), and general knowledge questions (economics and law) as bentimark.
Using a scoring system where a correct answer earns 1 point, a partially correct answer earns 0.5 points, and an incorrect answer earns 0 points, I calculated the percentage of correct answers for each model. A perfect score of 13/13 would correspond to 100%.
Question Selection Criteria
The questions were chosen entirely at my discretion and designed to be challenging, particularly for LLMs. They included:
4 general knowledge questions (economics and law)
3 math questions from Japan’s university entrance exams (Center Test/University Common Test)
6 reading comprehension questions from employment aptitude tests (SPI)
The difficulty level was intentionally set so that GPT-3.5 Turbo would struggle, while GPT-4.0 might have a chance of achieving full marks.
Example Question
To give an idea of the type of problems used, here’s an example math question:
A theater group’s total number of members decreased by 40% from last year, leaving 480 members this year. By gender, the number of women decreased by 25%, while the number of men decreased by 62.5%. Calculate the number of women in the theater group this year. (Round to the nearest whole number if necessary.)
The correct answer is 360 women.
This kind of calculation question is representative of the math problems included in the test. Results for each model will be discussed in the next section.
As for the reading comprehension questions, they included typical problems like selecting the correct conjunction to fill in a blank or choosing a sentence that does not contradict the target passage. These are common in the Japanese Center Test and similar exams.
Tsuzumi performed worse than GPT-3.5 Turbo.
The results showed GPT-4o achieving a 77% accuracy rate, GPT-4.0 at 53%, GPT-3.5 Turbo at 12%, and Tsuzumi at just 4%. This means Tsuzumi’s performance was below GPT-3.5 Turbo.
Tsuzumi managed to score only 0.5 points by partially answering just one knowledge-based question. It struggled with reading comprehension questions, often failing to understand the instructions, making it impractical for use.
Additionally, Tsuzumi’s performance worsened as input prompts increased, suggesting that it’s not suitable for handling large volumes of text, such as in RAG (Retrieval-Augmented Generation) systems. Since my purpose was to evaluate whether Tsuzumi could be integrated into an RAG system, I concluded that it’s unlikely to be a viable option.
Even with input lengths of about 1,000 characters, Tsuzumi felt inadequate compared to GPT-4o, which can handle and understand texts as long as 10,000 characters with much greater accuracy.
The results are understandable given the model’s scale.
To be honest, the performance of LLMs (Large Language Models) largely depends on two factors: the size of the training dataset and the number of parameters in the model. The number of parameters corresponds to the “nodes” in the model’s neural network, and the dataset represents the amount of information the model has “studied.” Simply put, a model with more parameters and a larger dataset will naturally perform better.
Of course, training methods and model architecture also matter. However, major overseas models are developed by highly skilled engineers earning millions, so I assume their design and training processes are top-notch.
Creating datasets is also costly, and the computational resources required to train a model increase exponentially with the number of parameters. This significantly drives up development costs.
Tsuzumi, with its 7 billion parameters, is modest compared to recent models with over a trillion parameters. It seems to have been developed with a more constrained approach, likely avoiding the “arms race” of massive budgets. In that sense, achieving this level of performance with 7 billion parameters is impressive. For reference, GPT-3.5 Turbo is rumored to have hundreds of billions of parameters.
From a parameter perspective, Tsuzumi might be performing well. However, based on my experience using it, it doesn’t seem suitable for practical use cases.
In IT business applications, Japanese companies should focus on steady adoption of best practices rather than rushing to catch up.
This might be a side note, but Japan’s IT industry, which is several years behind global trends, doesn’t need to compete with overseas players. I believe domestic AI initiatives like Tsuzumi are not aiming to “win” but rather to gain insights from global leaders or use their development as a marketing tool.
For those of us in the IT field, this perspective may seem obvious, but I suspect many users might have a different view. There’s no need to be at the cutting edge. Instead, we can calmly observe overseas technologies and case studies, identify what needs to be done, and walk the well-paved paths that global pioneers have already struggled to create.
At this point, Japan is many laps behind, so there’s no point in trying to catch up. It’s enough to simply move forward at our own pace from where we currently stand.
That said, IT professionals are partly to blame for creating unrealistic expectations by using buzzwords like “cutting-edge” or “latest technology” as part of sales pitches. This is particularly evident in some of the commentary surrounding Tsuzumi. It’s important for Japan’s IT industry to maintain a humble attitude, appreciating the hard work and expertise of overseas leaders whose technologies and methods we are fortunate to utilize.
Still, thinking about the people involved in developing Tsuzumi leaves me with mixed feelings. It must have been a difficult challenge—delivering results within a limited budget in what felt like a losing battle. Perhaps the developers were purely driven by technical curiosity, which kept them motivated despite the odds.
AI projects often involve an overwhelming amount of uncertainty, requiring teams to push forward without clear answers about what’s meaningful or where the true value lies. It’s a mentally taxing process, almost like a form of spiritual training. I don’t know the details of how Tsuzumi was developed, but I wonder what the atmosphere was like in the development team.