ChatGPT and the Dangers of Overconfidence as a Leader

By: Patrick McKendry (and not by ChatGPT)

Last month, OpenAI - an artificial intelligence research and deployment company backed by a number of big names of Silicon Valley - released its latest product.  ChatGPT, the prompt based generative AI text bot, takes a request from a user and generates a novel text response in seconds.  So far, users of the product have leveraged its powerful capabilities to make memes, produce lecture notes and technical documentation, write plots to movies, and even create software code.  Within five days, more than one million people had created accounts to use the service.  Over the following weeks, ChatGPT has gone viral and become - per Google Trends data - as popular as Taylor Swift.    For OpenAI, ChatGPT is mostly a public relations play to generate interest in the underlying language models that they sell as a service - a service they forecast to surpass $1 billion in revenue in the next few years.

When reviewing the reactions of users of the product, the overwhelming sense is one of awe for the future and wonder at the astounding capability of ChatGPT to provide coherent and quality answers to questions.  There’s just one problem - a lot of them are wrong.  Per Sam Altman, CEO of OpenAI, “ChatGPT is incredibly limited, but good enough at some things to create a misleading impression of greatness.”  Overall, the model is designed to give high quality answers, be receptive to feedback on incorrect responses, and refuse prompts of malicious intent.  However, it is not designed to caveat its thinking, identify where its data source may have gaps, or highlight ideas where its “comprehension” is incomplete.  At least, that’s my understanding.  Noted Min Chen, VP at legal research firm LexisNexis, “In some cases, ChatGPT will give a very verbose answer that seems to make sense, but the answer is not getting the facts right.”

Conflating Confidence with Correctness

This may sound like a familiar problem.  It is, as this phenomena is not limited only to the world of AI-driven chat bots.  Most people I talk to can fairly easily remember a team experience where they thought “Why is this person in charge of this?” or “What are they even talking about?”  The world of organizations is rife with situations where we mistake the confidence we see from a person with the assumption of correctness around the topic of discussion.  We assume that a person is speaking up because they have deep expertise on a functional area or deep context on a problem set.  This is often not the case.  Further, we assume that leaders have greater access to knowledge, a more robust sense of strategy, or better intuition and experience in a certain area.  We expect that this will make them the most effective decision maker.  This is not a given outcome.

Organizations are human creations and are created through human decision making.  Unfortunately, human decision making is fraught with biases and heuristics.  These “adaptive tools” have been developed in our cognitive systems over millions of years.  Their purpose is to help us make decisions more quickly and with less effort.  This fast thinking system, described as System 1 thinking by Daniel Kahneman, is intended to help conserve mental resources and allow focus to be shifted towards more involved problems.  

Significant research has gone into the understanding of human bias.  As a result, we can actually describe fairly effectively the causal factors that lead to this potentially misplaced authority for decision making.

The Babble Hypothesis

In the context of ChatGPT, we can see how a ready response can trick our brains into assuming confidence.  If you ask a question about particle physics, you have to assume that the text response is pulling from relevant and accurate data sources.  Same with when you ask a question and get a response about the best flour to use for baking a cake.  However, we need to separate the presence of an answer from its quality.  Remember that ChatGPT is trained to be a model for language processing.  That means that it can string words together with clarity but says nothing about its ability to comprehend itself or validate accuracy.

When we consider the process of leader emergence and assignment, we see similar forces at play.  There are many factors that go into the selection of a team member for leadership positions, ranging from academic background to historical performance impacts.  However, a more dubious predictor is simply “how often do they speak up?”

The babble hypothesis is a large positive correlation that has been repeatedly found in studies, suggesting that the quantity of time spent speaking is a predictor of future emergence as a leader.  Across various test designs and when accounting for other factors - including intelligence, personality, gender, and the endogeneity of speaking time - the effects of the babble hypothesis hold firm1.  In short, the more you speak up, the more likely you are to be assigned a leadership role.

This particular bias seems so silly that it is almost hard to believe that our thinking could fall prey to it.  But similar to ChatGPT, when team members have something to contribute across discussions we kindly assume some basis of knowledge.  The more topics someone is able to speak towards and the greater their ability to present a thought with clarity, the higher competence we assume.  More speaking drives the further assumption of competence, which leads to assignment of leadership.  Unfortunately, those assumptions are just not necessarily true.

Overconfidence Bias

ChatGPT never caveats its response.  It provides its answer, flatly stated and without hesitation even when it is wrong.  Obviously, the technology has no feelings but it seems overly confident in itself regardless of the topic.  

This is another all too human bias.  Overconfidence bias describes a set of behaviors where people tend to overestimate their actual performance, place themselves further up on performance distributions relative to others, and prescribe greater accuracy to the truth of their views.  One classic example of overconfidence bias is the finding by Ola Svenson that 93% of Americans rate themselves as better at driving than the average driver2.  Similarly, studies in which respondents were asked their confidence on general knowledge questions found that when people were 100% certain of their answer, they were actually wrong 20% of the time3.

This bias occurs at all levels in an organization, but is particularly problematic when displayed by leaders.  The least damaging version of this bias is seen in the planning fallacy, when a project is forecasted to take less time and require less resources than what is actually required.  On the other end of the range, we can see the after effects of overconfidence in the tech sector as CEOs lay-off tens of thousands of employees after being wrong in their conviction that the COVID-19 pandemic was accelerating and cementing the shift towards a more digitally-centric world.  In nearly every case, it can lead to decision making being centralized and limited to a few key owners who overestimate their ability to make the right decision and underestimate the worth of having their team contribute to the discussion.

Extraversion

Without a doubt, extraverts make effective leaders and their leadership style can pair well with inherent personality traits.  Extraverted leaders tend to be more comfortable making spot decisions and often think out loud which can give others insight into their thought process.  Extraverted leaders also appear to be more confident due to their outgoing nature, while introverted leaders demonstrate that confidence in other ways4.

Introverts also make great leaders but tend to use different tactics to achieve successful outcomes, such as better active listening and more deliberate decision making processes.  The problem is that in western, individualistic societies, the impact of the overconfidence and babble biases is often compounded by out-of-date leadership frameworks that promote the importance of extraversion in effective leadership.  Because extraverted leaders have a higher tendency to think out loud, their behavior receives outsized rewards relative to its impact because of the babble hypothesis.  Similarly, because extraverted leaders tend to be more comfortable making snap decisions, their teams are at even greater risk of missing effective contributors at key moments as a result of the overconfidence bias.  Leadership capability is fairly evenly distributed across extraverted and introverted personality types, meaning that organizations can miss out on the benefits when authority and promotions are granted in a skewed framework.

How Can We Be Better?

Despite its technological brilliance, testers fairly quickly learned to be skeptical of ChatGPT.  User feedback has highlighted the benefit of using Google search to fact-check responses or asking questions like “what other options are there?” to responses that seemed incomplete.  The ready adoption of these verification practices seem to have become a nearly second-nature response to literature as misinformation, or at least the accessibility of it, has grown in recent years5.  

This is a healthy response and highlights what Daniel Kahneman identified as System 2 thinking - cognitive processes that are slower, more deliberate, and logical.  System 2 serves as a necessary counter-balance to the fast reacting System 1 thinking that relies on bias and heuristics.  Our brains rely on both and we have the power to decide how much we want to lean on one system.  As leaders, it is our responsibility to minimize the negative impacts of bias on our teams.  These tactics can help engage our System 2 thinking and lead our team to create better decisions.

Invite Others to the Conversation

One of the best things we can do to fight the babble hypothesis is re-balance the voices in the room.  When quantity is correlated to leader emergence, we can shift the distribution of time spent speaking back to a more even place.  As we do this, we increase the importance of the quality of contribution by diminishing the difference in quantity between the most and least frequent speakers.  

Many people struggle to create this space for themselves.  As a presenter or leader of discussion you have an outsized role to play in the value of a session.  In addition to reacting to the content being presented, you also should react to the room and how everyone is engaging with the topic.  As the presenter, make sure that your flow has pauses that give the audience a chance to process and react.  Best practice is to pause for 7 seconds, which feels longer than it actually is.  You can go further in creating that opportunity for others to speak up by asking for questions on your thought process.  By prompting your question 30 seconds or more in advance you can increase the probability of getting quality feedback too.  For example, start a new slide by stating “The question we are going to need to answer is whether our current go-to-market strategy actually reaches our target audience.  Let me outline our approach and I will then want feedback specific to that question.”  As you go through the slide content, this layout will create an anchor point for the audience.

Bringing myself into conversations is a place I have struggled in the past.  I still recall a meeting as a senior analyst where I watched the senior leadership team go back-and-forth on content I had prepared.  I had similar conclusions to them and felt I could articulate more precisely because of my preparation, but I never found a moment to step in and say my piece.  Afterwards, when discussing how the meeting went with a senior mentor, I mentioned this struggle.  She asked me if, in the future, I would want for her to speak up and invite me into the conversation if she noticed that it seemed like I had something to say.  This was amazing mentorship and I watched and learned from how she found ways to invite me to speak up.  I now pay that forward for others today.  Bonus points if you do this for others and have given them fair warning that you will be looking to invite them into the conversation in an upcoming meeting.

Trust, But Verify

The risk of overconfidence bias is that we never see it in our decisions and evaluations until it is too late.  There are a few heuristics we can use to counter this bias.  Time-and-a-half planning is one of those, recommending to add 50% to any forecasted timeline in order to get closer to what is the likely eventual reality.  However, fighting System 1 thinking with more System 1 thinking can only be so effective.  It is better to instead find ways to engage System 2.

Overconfidence is a bias of perception, so one of the most effective tools is to bring data to the core of any decision process.  This is generally a good idea as data can provide useful insights.  Data can also bring a dose of realism to our biased perceptions.  For example, if your business general manager is forecasting conversion rate growth of 5x for next year on a product, you could use data from historical product launches to show that previous growth rates for a year were only 1.5x to 2x.  Data can help ground perceptions that risk running off course otherwise by verifying what a realistic future may be.

Still, for some cases data may not be available.  It may be a new problem or the necessary measurement tools may not exist.  It can be tempting to trust your instincts without hesitation when there is nothing else off of which to evaluate.  Here, it can make sense to seek out and involve at least one other subject matter expert in the decision process.  By looping in a new perspective, you are highlighting for yourself and others that you do not have perfect information and could be at risk of being too confident in your assumptions.  This encourages people to speak up with context that they may have to share.  Further, outlining the solution will require you to work out your logic, helping to fire up System 2 thinking and highlighting potential miscalculations.  Bonus points if you can leverage a subject matter expert on your team.  Just make sure you have done the needed culture building so that they are comfortable potentially speaking truth to power.

Test Decision Processes

Different leaders have different personality types, and different personality types perform best in different operating models.  Testing a variety of decision processes can help you to find what works best for your specific team.  In one case, try a team debate.  In another, use a voting system.  In another, replace a presentation with a pre-read write-up to set context.  Your mileage will vary.

A post-mortem is a common tool that teams will use to review a project for ways to improve.  In most cases, teams reserve this review for when something did not work.  That can be a missed opportunity.  Instead, use a post-mortem for every decision and project.  Instead of pointing the focus at what worked to deliver the outcome, look at how the go-forward decisions were made and if changes could have been implemented to improve engagement and decision quality.

I find that the best decision processes are those which cater to multiple personality types and decision making styles.  For example, if you have a mix of introverts and extraverts, try to use a flow that splits the process into two parts.  First, have a ‘context meeting’ where information is shared, options are outlined, and the benefits are debated.  This is highly effective for extraverts and spot decision makers.  Second, have a follow-up ‘decision meeting’ where consensus is decided for the best option.  This is highly effective for introverts and deliberate decision makers who process options more slowly.  Bonus points if you put a gap of 24 hours or so between the meetings.

In Conclusion

For better or worse, technological changes happen much faster than human evolution.  ChatGPT has its flaws, but it has advanced dramatically against the language processing tools of only a few years ago.  While it is highly confident in its highly wrong answers today, we can expect that it will learn to - or be paired with a model that can - fact check responses and correct for accuracy.  Critical thinking will remain important as ever, but the frequency with which you will need to harness its skepticism will likely diminish.

For our part, our cognitive software bugs will likely continue long into the future.  The biases built from the perception of confidence will be ever present in our lives as individuals, teammates, and leaders.  But that does not mean that these biases need to have final say.  Stay aware of the environment you and your team operate in for potential causes of bias.  Stay diligent of what you can do to create a more equitable operating model for everyone.

And, of course, stay humble.

References

  1. https://www.sciencedirect.com/science/article/pii/S1048984320300369
  2. https://www.sciencedirect.com/science/article/abs/pii/0001691881900056?via%3Dihub
  3. https://pubmed.ncbi.nlm.nih.gov/13681411/
  4. https://www.psychologytoday.com/us/blog/lifetime-connections/202210/effective-leaders-arent-just-the-extraverts-among-us
  5. https://www.nytimes.com/2020/02/20/education/learning/news-literacy-2016-election.html

Struggling with a personal development challenge?  Looking for management insights on a certain topic?
Share your work-related questions and dilemmas with us for upcoming blog post consideration.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
← View all