Gemini vs ChatGPT

Which AI Wins the Cultural Understanding Challenge?

Banner-AI-88
Gemini vs ChatGPT: Which AI Wins the Cultural Understanding Challenge
9:14

Gemini and ChatGPT can mimic human conversation with impressive accuracy. Yet, when it comes to the uniquely human subtleties of culture, can these AI models truly grasp the nuances? This comparison dives into the cultural understanding of these language models, revealing strengths, weaknesses, and the ongoing challenges of teaching AI to navigate cultural complexities. Our findings offer insights for anyone interested in AI's potential, or limitations, when it comes to cultural competence.

Gemini vs ChatGPT: Quick comparison

At first glance, a side-by-side comparison might suggest one AI has a clear edge in cultural understanding, as you can see below.

 
  Overall Accuracy Nuance Consistency Depth Interactivity Ethical Considerations

ChatGPT 4

- -

Winner

-

Tie

Tie

Gemini Advanced

Winner

Winner

-

Winner

Tie

Tie

 

The table above provides a swift comparison of how Gemini and ChatGPT fare against each other across several critical dimensions of cultural understanding. At first glance, it might seem that Gemini generally outperforms ChatGPT, with notable wins in overall accuracy, nuance, and depth. However, from a practical point of view, the performance of both models was extremely poor, underscoring the importance of continued development in this area. Read on to understand just how lackluster the performance of AI is in cultural competence.

[Editor's Note: we completed a limited comparison between ChatGPT 4 and ChatGPT 4o in September 2024. The overall accuracy remained extremely poor.]

Gemini and ChatGPT: Overview

In the realm of artificial intelligence, two conversational language models stand out: Gemini, developed by Google AI, and ChatGPT created by OpenAI. Both models draw upon massive datasets of text and code, enabling them to interact with humans in surprisingly fluid and informative ways. Gemini excels in its ability to access and process real-time information, while ChatGPT is particularly skilled at diverse forms of creative text generation. These AI models are increasingly integrated into various applications, from search engines and customer support chatbots to creative writing tools.

While both Gemini and ChatGPT excel in many tasks, cultural understanding poses a unique challenge. Culture shapes not only how we speak but also the values, histories, and social norms that influence what we say and how we interpret the world around us. To be truly effective communicators, AI models must not only process language but also grasp the rich and nuanced context that culture provides. Imagine an AI-powered travel guide – an accurate description of a local landmark is useful, but to truly connect with a traveler, the AI would need to understand the site's deeper significance within the culture. For customer service, an AI insensitive to cultural norms may inadvertently cause offense rather than provide helpful resolutions.

How did we test the models?

Our comparison study commenced prior to the release of Google Gemini and initially involved only ChatGPT, which influenced some of our methodological choices, including the selection of countries. For this experiment, both AI models were tasked with simulating a typical person from specific countries, engaging them in a series of culturally nuanced interactions.

Country Selection: The countries chosen for this experiment were the United States, India, Finland, Austria, Canada, Germany, Nigeria, and the United Arab Emirates. Each was selected to showcase a broad range of cultural dynamics and values. Austria, for example, was included due to its notably low scores in Power Distance, which contrasts with the higher scores previously observed in ChatGPT’s responses— a pattern that also appeared in Gemini’s results. The United Arab Emirates presented a unique case: ChatGPT refused to assume this role, whereas Gemini completed the task without complications.

Experimental Procedure: In the experiment, each AI was given the task to respond to the Culture Compass, part of our Culture Portal platform. The Culture Compass is based on the 6-D Model of National Culture, and includes 42 questions designed to measure an individual’s cultural preferences.

Following the assessment, based on their responses, users receive personal scores on each of the six dimensions of National Culture. These scores illustrate how closely an individual's personal values align with those of the country they are representing, as well as the countries they are interested in. For a more nuanced assessment, we conducted a separate test where the AIs were provided with detailed background information on cultural dimensions before they answered the Culture Compass questions.
This methodology was designed to not only evaluate the general cultural knowledge of each AI but also their ability to apply this knowledge dynamically in culturally relevant scenarios. The results provide insight into the relative strengths and weaknesses of ChatGPT and Gemini in understanding and adapting to diverse cultural landscapes.

Detailed Comparison

1. Overall accuracy

The overall accuracy was assessed using Mean Absolute Error (MAE), which measures how close predictions or estimates are to actual values. MAE calculates the average difference between the Culture Compass scores and the actual Country Scores for each dimension, without considering the direction of the discrepancy. For ChatGPT, the most accurate scores were for Americans, with an average distance of 15.67. Austria scored the furthest at 41.5, and with an overall average distance of 30.69 for all countries tested. Gemini performed slightly better; its closest score was for Austria at 17, while the highest was for the United States at 32, with an overall average of 25.24.

Winner - Gemini

However, it's essential to stress that a 25-point discrepancy is not only significant; it's alarming. This severe inaccuracy reveals a severe failure in the cultural understanding of both AI models.

2. Nuance

Nuance was assessed by examining the depth provided in the answers to identical questions posed to both ChatGPT and Gemini. While ChatGPT typically provided straightforward answers, Gemini often included explanations, offering insights into its reasoning. [link to example]

Winner - Gemini

3. Consistency

We evaluated consistency based on how uniform the responses were across various queries. The simpler approach of ChatGPT, which lacks nuanced explanations, paradoxically served as a strength, maintaining consistency. In contrast, Gemini's responses, though more detailed, sometimes relied on stereotypes or took on a satirical tone, which could compromise the perceived accuracy of the information. [link to example]

Winner - ChatGPT

4. Depth

ChatGPT refused to provide answers for the United Arab Emirates, showing a limitation in handling all assigned cultural roles. Gemini, however, did not exhibit such restrictions.

Winner - Gemini

5. Interactivity

Both models showed challenges in maintaining the context of interactions. They responded adequately to direct questions but occasionally lost track of the initial instructions, leading to confusion. This issue manifested in both models, albeit in slightly different ways.

Winner - Tie

6. Learning

In an attempt to assess whether a deeper contextual understanding would enhance cultural accuracy, we conducted a specific test using a Finnish persona. For this experiment, both ChatGPT and Gemini were provided with extensive background information on the 6-D Model of National Culture, as well as example cases for country scores. The intent was to see if enriching the AI with detailed cultural dimensions and examples from various countries would result in more accurate responses when assuming the role of a Finnish individual.
The results were disappointing. Despite the additional information, both models' performances did not improve—in fact, their scores on the Finnish cultural accuracy test were slightly worse than in tests without this enriched context.

Winner - Tie

7. Ethical considerations

As AI technologies engage with diverse cultures, the imperative to avoid bias and ensure respectful interactions is paramount. Neither AI exhibited significant ethical missteps in this study. Although Gemini's approach to U.S. and Canadian contexts—sometimes satirical—could raise concerns, it was clear enough not to be mistaken for genuine bias.

Winner - Tie

Conclusion

Throughout our comparative analysis we've uncovered that while both Gemini and ChatGPT demonstrate the potential to mimic human conversation accurately, their ability to grasp the subtle intricacies of culture is severely lacking. Our detailed evaluations, which spanned various dimensions such as accuracy, nuance, and ethical considerations, revealed significant shortcomings in each model's approach to cultural competence. Despite Gemini showing slight advantages in some areas, both models exhibited profound deficiencies in cultural understanding, emphasizing the need for further refinement and development in AI technologies.

To read further, see our individual analysis for ChatGPT and Gemini.

Editor's Note: This post was originally published in May 2024, and last updated in September 2024.