

Prabhanshu Mishra
AI Consultant
Prabhanshu Mishra, a senior leader in the tech industry with extensive experience in AI systems, architecture, and strategic innovation. He brings a practical lens to how emerging technologies are reshaping enterprise and consumer applications.

Anmol Satija
Host
Anmol Satija is driven by curiosity and a deep interest in how tech impacts our lives. As the host of The Unthinkable Tech Podcast, she breaks down big tech trends with industry leaders in a way that’s thoughtful, clear, and engaging.
Episode Overview
In this episode of The Unthinkable Tech Podcast, we explore the rise of Small Language Models (SLMs), why they matter, how they differ from Large Language Models (LLMs), and the industries they’re revolutionizing. From cost efficiency to data privacy and edge deployment, SLMs are emerging as a transformative force in AI. Tune in as we demystify this shift with deep insights from a seasoned tech expert.
Chapters Covered:
- Understanding large vs. small language models
- The evolution of language models: From N-Grams to transformers
- Transformers explained: The core of modern language model
- Algorithmic bias in AI and language models
- Industry Use Cases: BFSI, Healthcare, Robotics, and More
- Key challenges in developing small language models
- Build vs. Buy: Choosing the right SLM strategy
- The future of small language models and their potential
Transcript
Anmol: Hi everyone, welcome to a new episode of the Unthinkable Tech Podcast. I am your host, Anmol Satija, and as always, I am thrilled to have you join us on this exciting journey where we unravel technologies that are shaping our future.
Today’s episode is particularly exciting as we are going to cover another buzzword in the tech industry, which is as intriguing as it is important. I am talking about small language models. You know, it’s just a matter of recent times when LLMs, that is, large language models, captured headlines filled with their impressive capabilities. However, their sheer size came with a heavy cost, storage, and resource requirements, which limited their accessibility and applicability.
You know there is a saying, “Less is more.” And this minimalistic approach has made its way into the tech industry. It’s an effective, compact alternative to LLMs to democratize AI for its diverse needs. It promises to shine on efficiency, accessibility, and customization. Even big players like Apple and Meta think of it as an integral part of the future ahead.
So in today’s episode, we are going to unpack the nuts and bolts of small language models, what they are, how they work, and why they might just be the next big thing in tech. So whether you are a tech leader, tech enthusiast, or just someone who’s curious about the future of AI, stick around; it’s going to be an enlightening conversation.
And now, to get started on this conversation, I would like to invite Prabhanshu Mishra, who is a senior leader in the tech space and has a fair understanding of AI. Welcome, Prabhanshu. How are you doing today?
What are Small Language Models and why are they gaining attention?
Prabhanshu: All right, so I believe small language models and large language models are topics in debate right now because initially, what we called small language models were based on the parameters.
Even today, small language models have their parameter size. So before actually diving deep into the small language models, let’s try to understand what the large language models are, right? So, if I have to talk about the small language model, or the language, particularly in artificial intelligence and machine learning.
They are designed to understand, generate, and manipulate human languages. So these models range from small ones, like small language models, to complex architectures like GPTs. Now, let’s understand the history behind the evolution of those models. Generally, what I do is I just bifurcate this history into three stages.
Understanding the evolution of language models in AI
The first stage of this evolution of the language model is the statistical language model. Previously, artificial intelligence and machine learning, you know, had been derived or had been developed with all such statistical models. So initially, the early models such as Ngram’s model and the hidden Markov models were based on the probability of the words. Initially, these models tried to predict the probability of the next word.
However, the major problem or concern with these models was their inability to capture the longer dependencies or the context of whole sentences. So let’s consider a scenario. I have a huge paragraph. Probably, these models might not be able to predict the next word accurately. Now, the other step or another stage is where all such machine learning algorithms are getting into place.
When we talk about machine learning algorithms, probably those are, I’m probably talking about, neural networks. Or specifically, the recurrent neural network. Because the recurrent neural network takes account of the previous datasets, you know, if I have to talk about, particularly these neural networks, they are better at understanding sequences and could remember information for a longer period of time.
So probably, one of the use cases for these neural networks is predicting, let’s suppose, stock analysis. So because stocks are time series analysis, right? So if I would like to predict the stock probably in the next second, these neural networks are going to be the best fit for this particular use case. And specifically, considering this particular property of those neural networks, people are utilizing the language model itself.
And then there are what we are calling the transformer models, right? So if I talk about the transformer model, this architecture is based on the self-attention mechanism. And they have been performing very well when it comes to actually predicting the next word, I would say, right? In the overall language itself.
Breaking down transformers: The heart of modern NLP
Anmol: That actually sounds interesting, Prabhanshu. Can you help our audience understand the transformer model? Just a brief explanation.
Prabhanshu: Yeah, sure Anmol. So if I have to talk about transformers, transformers are a type of model widely used in processing languages, images, and other types of data. The key feature that makes transformers effective is their ability to handle data in chunks. So if you know, at the heart of the transformer is the self-attention mechanism. For that, there’s been a paper that has been published in the year 2017 named “Attention is All You Need.”
So if you have to take an example, imagine you are in a room with a group of people who are talking about a complex topic. Now, instead of listening to one particular person, or I mean instead of listening to the overall group, consider that you are listening to one particular person or maybe a word that one particular person is speaking. This is basically how the transformer or self-attention works. Similarly, it helps the model decide which part of the input data is most important, thus enabling it to understand and generate information more effectively.
Now, I have to talk a little bit more about the steps of the transformer model, but just I would not go into much detail here. But the first step of the transformer depends basically on input representation. So in the transformer, if the input data, let’s suppose the text, is first converted into a numerical form called embedding. It’s a vector, basically. Now the second step is basically the self-attention mechanism. Now, the self-attention mechanism allows the input to interact with every other input, which is very, very important. And it basically determines how much focus it should give to the other parts of the data.
Now, this is basically done with some matrix calculations such as query, key, and value calculations. Once this has been done, the next thing that it does is basically calculate the self-attention scores. And by comparing each query, it can calculate the self-attention scores.
And then the next step is positional encoding. Now, in positional encoding, you consider that since transformers don’t process the data in sequence like older models, they need to find a way to understand the order of the words in sentences. So this has been basically done through the positional encoding, which is added to the input embedding to provide that information.
Now, another layer of the transformer. So as I was mentioning, the transformer model is composed of multiple layers, and each layer contains two sub-layers. One is the self-attention layer, which we already talked about. And the second is the feed-forward neural network layer, which basically processes the output of the self-attention layers.
Now, each of these sub-layers is followed by normalization. And then after data passes through several layers of attention and these feed-forward network layers, the transformer output is generated. Now, these outputs are vectors, which are basically used to predict the next word. So in simple words, if I have to explain, all such calculation being done is generally to predict the next word, but by taking all the input into consideration, which is a very, very important part of it.
LLMs vs. SLMs: A game of Parameters
Now, since we have been talking about large language models and small language models, if I have to explain the difference, which you asked about previously, and I have taken so much time explaining the other basics, but the one very important differentiating factor is the parameters.
So, how do you define the parameters for any models? Basically, they are the variables that the model uses to make predictions. So if I have to explain the parameters in simple words, we know a certain equation like Y equals MX plus C, where M and C are nothing but the parameters. Now, the parameters basically represent a concrete part of the model that can change or adapt based on the data it has been trained on.
Now, if I have to differentiate between the parameters that large language models use and small language models use, consider GPT-4, which we all are using today. It is using 1.8 trillion parameters. Now, if I talk about Gemini Ultra, which is from Google, it is utilizing 1.5 trillion parameters. Now again, as a counterpart to LLM, if you consider a small language model, which is Phi 2, it utilizes 2.7 billion parameters. So you consider the drop in the parameters from 1.8 trillion to 2.7 billion.
Anmol: That’s a huge difference.
Prabhanshu: And specifically, if I talk about Lama from Meta, basically it is utilizing 7 billion parameters. Now, you might be wondering about that reduction in the parameters, and what difference it is going to make in the life of the business user, basically.
So the first important change that this decrease in the parameters brings to the table is one, of the computational resources that are required for the training and as well as the generation of the desired output. We call it inference, right?
Now, since the computational resources required are less, then it brings two benefits. One is on the cost front, and the second is the deployment on CPUs as well. So you consider the edge devices. Let’s suppose I have smartwatches. I would like to deploy a small model on my smartwatches. Or probably there could be cases where I would like to utilize this model in a remote area as well, where cloud connectivity or internet connectivity is not that much. There, these models are going to be very useful. All right.
Now, again, definitely, we will be talking a lot more about those models. But if I have to talk about one very, very interesting topic with all such large language models, there is also a risk of algorithmic biases. I mean, I believe you might have heard about this particular term, which is being introduced via datasets that are not sufficiently diverse. So that’s also a very interesting thing to understand because the way these models are generating the output, sometimes those are not as expected.
The role of algorithmic bias in language models
Anmol: Yeah, this is actually a common concern. So maybe you can help us understand what algorithmic biases are and what their impact is.
Prabhanshu: All right, Anmol. Let’s deep dive into algorithmic biases on a brief level. If I have to explain algorithmic biases, the bias can manifest in machine learning models, like those used in AI, and is often a result of the data used to train these models. So, in very simple terms, algorithmic biases occur when the model outputs something that is not required or that is somehow not true.
Now, to explain algorithmic biases, first, consider that the data on which the model is being trained is not representative of the real-world distribution or is skewed toward certain groups.
Let’s consider an example. A company built a facial recognition system to identify customers in a store and offer personalized deals. Now, if the training data predominantly consists of images of people from a certain ethnic group, the system might be less accurate for people who are not represented in the dataset.
This is the problem. That is why, whenever we start training these models, we always take care of the input data. If the input data is correct, our system is more likely to give you the desired output. It has to be normalized, basically.
A second possible fault is on the data labeling side. The process of labeling the data can introduce human biases or errors, which are then learned by the model. That can also be a possible problem.
The third, and more prominent problem, is the feedback loop. Continuous learning from user interaction means we are giving feedback to the model. Let’s consider a scenario where we tell the model, “This is not supposed to be like this,” or the model is generating a desired output, and we keep reinforcing that this particular output should be generated in a specific tone.
After giving a lot of input or reinforcements to a particular model, sometimes we don’t even recognize how the model is being transformed. Then the model will start giving the output, which is probably not required. Right? So this is what I would say algorithmic biases are.
Definitely, there’s a lot more that goes into fine-tuning those models and generating the desired output. We take care of those algorithmic biases by using various techniques and strategies. Probably we’ll discuss this in our later podcast as well.
Use cases: How SLMs are transforming key industries
Anmol: Right. That was quite an explanation, Prabhanshu. Thank you for that. Moving further, I want to know how these SLMs are impacting certain industries. Maybe you can explain with some examples.
Prabhanshu: Absolutely, Anmol. The way developers are utilizing Small Language Models (SLMs) is even blowing my mind right now. Developers are using SLMs to perform various tasks that were once unimaginable.
For instance, some developers are using SLMs to play certain video games that require a detailed level of human cognitive ability. And if I speak about the BFSI domain or specific industries, let’s talk about the BFSI domain, where on-demand, on-device processing for mobile banking applications is required.
The challenges with Large Language Models (LLMs) include significant processing power, bandwidth, and constant connection with cloud servers. But when it comes to Small Language Models, they are lightweight, can be deployed directly on mobile devices, and allow for real-time processing without needing constant server communication.
This improves both the user experience and data security, as sensitive financial information does not need to be transmitted frequently to generate inferences. For example, such models can offer advice on spending or answer transactional queries.
Anmol: Okay. It seems like SLMs can act as a catalyst for customers in areas with limited connectivity or those concerned with data privacy.
Prabhanshu: Yeah, absolutely. And I’m even more excited about their application in healthcare. Personal health assistants are now booming. Most health tech companies are integrating their products with large language models, chatbots, personalized recommendations, and treatment plans.
However, the challenges of implementing such chatbots or personal health assistants in remote areas can be very difficult because of the need for cloud communication. These problems can be easily solved with Small Language Models because they can be integrated into local systems or devices with limited connectivity and still perform tasks like symptom checking or basic health monitoring effectively.
In one pilot study, a health device using a Small Language Model handled daily patient queries and provided basic diagnostic support with over 90% accuracy using only the data available on the device, without any internet access.
Anmol: Okay. The fact that these models can operate independently of a high-speed internet connection means that personal health assistants can be more accessible to those who might otherwise be left out of infrastructure challenges.
Prabhanshu: Correct. Absolutely. Not only in terms of utilizing those models in areas where direct consumers are exposed but let’s consider the example of robotics. Now, in the field of robotics, what happens—specifically in the challenges with the LLM, operating the robots in an isolated or extreme environment like underwater conditions, or maybe somewhere where spaces are very restricted—often faces challenges due to the lack of reliable high-speed internet connectivity, which is required for the cloud-based LLM.
Now, the advantage of the SLM, specifically when it comes to robotics, is that it can be embedded directly into the robotic system, allowing for autonomous operations even in environments with limited or no connectivity. This will allow the robots to perform complex tasks in confined environments as well.
Anmol: Yeah, right. It means that organizations using robotics in remote areas, like you mentioned, deep-sea exploration or space missions, won’t have to worry about the connectivity issues that typically plague cloud-dependent technologies.
Prabhanshu: Correct. Absolutely.
Challenges in building and deploying Small Language Models
Anmol: Now that we have talked a lot about what SLMs are and where they can be used, there must be some hurdles associated with them, right? Could you share some insights into the main challenges that companies face during the development and deployment of SLMs?
Prabhanshu: Yeah, absolutely. I think there are many, but generally, what I do is just bifurcate those challenges into three major categories. The first is the technical challenges related to the technicalities of development. The second is the ethical challenges. And the third one relates to the business challenges.
So, if I talk about technical challenges, one, which is very crucial, is data quality and availability. SLM, as we discussed, requires high-quality data. So, poor data quality, such as biased or noisy data, can lead to models that perform sub-optimally.
Second is availability on the data quality side, which is access to diverse and representative datasets is crucial. Because if it is not there, there is going to be bias in the model performance.
And second, technical challenges would be a model generalization. So, we might have trained our small language model for one set of data, which probably will not perform well in terms of generalized tasks, which you can consider as queries that have been asked on unseen data.
The third technical challenge would be resource constraints. As we all know, deploying the SLM in a resource-constrained environment, such as mobile devices or a computing platform, requires an efficient model that uses less computational power and memory. Which is very difficult to attain. And currently, there is also a growing demand for an energy-efficient model that can reduce the overall carbon footprint as well.
So if I talk about technical, these are a few challenges that are currently coming to my mind. Now, when it comes to the second category of challenges, which is the ethical challenges, which we talked a lot about, it’s the biases and fairness.
SLM can inherit or even amplify the biases present in the training data. So let’s consider a scenario where I have trained my SLM model in giving customer query responses from a set of customers that come from a particular community. They’ll give you the answer, you know, which is biased toward that particular community. So that’s the ethical challenge, I would say.
And the third category of challenges is business challenges. One, which is very crucial for any business, is scalability. Balancing the cost of scaling small language models is very important. Maintaining performance and speed is a business challenge that directly impacts profitability and user satisfaction.
So, we need to figure out, you know, if I would like to scale a particular small language model, what is the ROI that I will be getting out of it? Right? Because any investment decision has to be backed by ROI calculations. So this is the business challenge, I would say.
Anmol: Right. Given these obstacles, I’m really interested to know what strategies or approaches you recommend to effectively address these challenges.
Build vs. Buy: How to choose the right SLM strategy
Prabhanshu: I mean, again, the answer is there are many. But one of the key factors that I generally take into consideration is data curation and enrichment. As in AI, we always say, the better the data, the better the output you will get. Right. So, I think the data curation process that ensures the quality and diversity of the datasets used for training SLM can actually enhance the performance of any particular SLM.
Anmol: I would agree with that, Prabhanshu. And you know, there is one more challenge that companies usually face. That is the dilemma of choosing an off-the-shelf small language model or investing in custom development. So, could you walk us through the framework of the key factors that one should weigh when investing in custom development? And deciding between adapting an existing model versus building one from scratch. What factors should businesses consider in this decision?
Prabhanshu: Absolutely. So, generally, we follow a five-step approach.
So, let’s talk one by one. The first step is basically defining the business and technical requirements, which we generally do in any development project.
The first important aspect that we need to consider is the business need. So, when we say business needs, what is the scope and scale of that particular business problem? So, we need to define the specific business problem that the SLM is intended to solve and the scale at which this model needs to be scaled, probably in the future, let’s say, four to five years down the line.
Second is the budget constraint because that is very, very important. So, we need to also factor in the budget that I have allocated for this particular project from my higher management.
And the third, which is very important, is the timeline. So, we need to identify whether it’s a very urgent need or if I can consider getting it developed or rolled out in the near future. Let’s suppose eight months or nine months or one year down the line. So, this is the business need of it. This is the first thing that we need to assess.
The second is the technical specification. So, when we talk about technical specifications, we need to identify what the performance expectations from the model are. So, we need to set clear performance metrics such as accuracy, and response time so that we can evaluate whether we should choose the existing model or not.
Second is integration complexity because the majority of these models need to be integrated with your existing system or probably a future build system. What flexibility do these models provide for integration?
And third, which is data sensitivity and security. Determining the sensitivity of the data involved and the associated security requirements and risks is very crucial when deciding this particular factor. So, this is the first step.
The second step is evaluating existing SLM frameworks. Now, since you know that there are so many people or the development community who keep building these models and providing further fine-tuning, you can utilize those models. Few of them are open source; few of them are not. How should I go about evaluating these models? So, this is the second step.
The first thing is feature matching. So, when we say feature matching, it’s as simple as that. You know, we need to evaluate if there are any existing SLM models that probably solve that particular problem. So, let’s suppose I would like to build an SLM model that can summarize my image. Is there any specific SLM that has been trained to perform this particular task that I can fine-tune, but you know it will basically consume less effort?
The second is customizability. So, there are so many models that are available. So many frameworks are available, but we need to find out whether customization is available on that particular framework or not.
An important factor is cost and resources, as I mentioned we need to figure out what the development cost and the operational cost associated with that particular model selection are.
The third step is figuring out the possibility of custom development as well since there is definitely a possibility where you need to rebuild the model or probably refine that model for any unique feature that you want for any choosing the existing model. So, how do I go about it? Probably I need to figure out what kind of unique features I need in that particular model, probably in the future, not right now.
Then the third, which is a very important part, and this we actually do very rigorously with so many of our clients, is performing the cost-benefit analysis. So, when we say performing the cost-benefit analysis, we perform those analyses into two different segments. One is a quantitative analysis where we try to figure out, by utilizing certain data-driven models, to project the ROI for both options, factoring in the development cost, operational cost, and potential revenue increases. Then we try to analyze the break-even for that; this is the quantitative value. And the second, which is also very important. Sometimes the quantitative factor does not fit in, but the overall solution is actually creating value for the future.
So, such as the brand value, the customer satisfaction, or maybe aligning with a particular future strategic goal, right? So, this is what we call a qualitative factor analysis. Then, by considering all such factors, we define a very customizable decision tree or decision framework where we help our client in order to make the decision: how could they go about building such models?
Anmol: Right, that was a really comprehensive breakdown, Prabhanshu, and I think now, hearing this, companies might be able to understand and decide in a better way.
Prabhanshu: Yeah.
The future of Small Language Models in AI
Anmol: Now, as we wrap up our discussion, would like to know your opinion on the horizon for small language models and where you see them heading in the next few years. And what kind of impact do you think they will have on AI and our daily lives as well?
Prabhanshu: Yeah, absolutely, Anmol. As we all know, small language models are gaining a lot of traction as a valuable alternative to their large model counterparts because of their efficiency, scalability, and lower resource demand.
Over the next few years, what we can expect is these models to significantly impact AI. So the first factor that I’m thinking about is enhancing the accessibility and efficiency for smaller and medium-sized organizations because these models require less computational power, probably in the future, and can be operated more cost-effectively.
So, making advanced AI capability more accessible to medium-sized or small-sized enterprises is one factor. And the second, which is very important, is specialization and customization. So one of the major advantages that we were discussing is their ability to customize for specific tasks without the intensive resource investment required for large models.
Recent advancements in AI have shown that small language models can achieve performance better than their large language model counterparts. So this is another feature that I can see for the small language models.
And the third part is definitely sustainability because these models will help us reduce the carbon footprint, especially if I have to compare it with the large language models’ training and deployment part as well.
Another important thing is industry-specific applications, as what we were thinking is that these models can be trained to perform very specific tasks. So we can see significant implications for industries such as education, healthcare, and customer services, where SLMs can be integrated into existing systems to provide enhanced functionality without the overhead of large language models. So if I have to talk about future trends, the field’s focus will likely shift toward optimizing these models for even better performance and lower resource use, and this is what we are doing here at Unthinkable.
Techniques like model quantization, which requires less memory, and innovative training approaches, which will actually help us optimize and improve those models, are what it is about. And I think the future holds good for the development and the use of small language models, and this is what we are doing. We keep on experimenting and innovating here at Unthinkable Solutions, specifically for small language models and other AI models as well.
Anmol: Right, great insights, Prabhanshu. Thank you for that. And that, dear listeners, brings us to the end of a truly engaging episode. Thank you, Prabhanshu, for joining us today and shedding light on such a new, unexplored, and captivating topic. It’s conversations like these that remind us how technology continues to shape our future in the most unexpected ways.
Prabhanshu: Yeah, thank you so much, and thanks for having me here on the podcast.
Anmol: Now, I want to turn the mic over to you, our amazing audience. What are your thoughts on SLMs? Are you as intrigued by the potential as we are? Maybe you’ve got a perspective we haven’t considered, or you’re skeptical about something we discussed. Whatever it is, we’re all ears, and we would love to hear from you through the comment section below. And if you enjoyed today’s episode, don’t forget to follow the Unthinkable Tech Podcast. We’ve got a lot more trending tech topics lined up, and trust me, you won’t want to miss what’s next. Catch you on the next episode; till then, take care and keep listening.