This article is part of our Hugging Face series.
Academia is all about downloading the PDFs you never read.
That joke never gets old because it's true. There's a certain tragedy about it. Given the ever-growing text data (even the blog post I'm writing will only add to the text snowball), it's not feasible to sieve through the vast number of documents to gather the required information. There must be a way to process this information.
Question-answering systems are really helpful in this domain, as they answer queries on the basis of contextual information. So, in this article, I'm going to show you how to use Hugging Face's question-answering pipelines.
Question answering pipelines with Hugging Face
We can import the default question-answering pipeline in Hugging Face simply as:
<pipeline name> = pipeline("question-answering")
Let’s try it with some text. Since question answering is a cognitive task (no wonder university entry tests often include this part), it's challenging for an AI model, and it's something that always excites me.
{'score': 0.9731054902076721, 'start': 6, 'end': 14, 'answer': '146 days'}
Let's try a slightly longer text and a few less direct questions:
We need to push the model to its limits, so let's give it a convoluted context and some cross-questions.
As we can see, it tries hard enough to retrieve the required information. Although unable to imitate human power and precision, it provides sufficient information that is good enough for human reasoning.
Other models
All models are wrong, but some are useful.
- George E. P. Box
There isn’t any model that is suitable for every type of problem, so we have to get familiar with as many models as we can. The default model for QA tasks is Hugging Face. RoBERTa (you'll keep finding these different acronyms on BERT) is good enough, but there are a number of other models, too - some of them trained in languages other than English as well. For example, XLM-RoBERTa is a model trained in 100+ other languages too. Naturally, I'm curious to try it.
Since I am not a polyglot, I'm trying to be smart here and use the translation model to get some foreign phrases. Oh, by the way, we're not going to use the first model anymore, so let's save some RAM by deleting it.
Since the context is in French, I'll use the English-to-French model (more details on the translation model can be found in the respective post) to translate my queries into French.
Awesome. Let’s start trying the XLM model.
That’s correct. Let’s try another query:
That’s also correct. Villefort, in this scene, is like a man who triumphed over an inner struggle. There are countless examples like this.
We've checked it for just a single language, but there are plenty of other languages too. Please feel free to play around with them.
Now, we'll leave Villefort and his gold alone and switch to another related but even more interesting task: conversation. But before that, don’t forget to clean up the memory.
Conversational models
A similar but even more challenging task is to design the dynamic conversational models, which keep on adapting to the script as the conversation proceeds.
I still remember my first-ever computer back in 2002. It wasn’t computer games that appealed to me (the console was there for that purpose) but programming silly stuff (like a circle which changes its color and radii, thanks to the for loop) and especially the chatbot, Eliza.
But this chatbot had limitations and very soon began to lose the context. 20 years later, in the era of ChatGPT, LLaMa, and other chatbots, it's hard to believe how far these conversational models and voice assistants have come.
The Hugging Face pipeline
There are a number of conversational models available on Hugging Face. We can call the default model (blenderbot by Facebook) as:
It takes the conversation in the form of text passed to Conversation()
and we have to import it first:
Finally, we can kick off the conversation:
Let's keep the ball rolling.
The chatbot fell into the trap:
With a decline in quality philosophers since the early 20th century, we're in dire need of a few. And apparently, we've found one.
Let's try another conversation. To ensure it starts fresh, let’s try another model:
Oops! But to be fair, before hastily passing judgment, we need to realize that conversational systems are challenging, and apart from chatGPT, the majority don’t have enough training or reinforcement feedback data.
Let’s put the chatbot out of its misery and round up the article with another interesting (and quite relevant) pipeline. These examples so far highlight a couple of facts:
- Chatbots are still challenging
- This is an open area inviting us to contribute
Document question answering
The academic anecdote I shared at the beginning may have made you wonder if it was just a joke or if we have some ways to address the problem. Luckily, document question answering is available, thanks to the exceptional performance of Vision Transformers and the ever-growing tasks of Hugging Face.
Since we need to have the image instead of a text, we'll import the respective library (and pipeline).
Having imported the model and image, we can now fetch some information for the respective queries. For example:
docImage
above.
Accurate. Let’s try another one.
As we saw earlier for the QA models, even if it’s not entirely accurate, it gives good enough pointers to draw further information. Since the image I used was one with bigger fonts and easier to read, let me try a more practical example by pulling an excerpt from a research paper.

Nice to see it works across different fonts (and especially on Comic Sans).
It’s neither a failure of the OCR nor the QA capabilities of the model.
Since the sentence “34th Conference on…..” was just a statement/footer, it was hard for the model to pick the context there. As we can confirm here by making it a bit contextual.
What I can infer from this is the OCR capabilities of these models are good enough, but their semantic inference and reasoning power are still an active area of research.
And with that, I'd like to conclude here. It was quite fun to explore these useful models and both their power and limitations. We covered traditional QA models, followed by conversational ones, and in the end, we touched on the visual QA models a bit too.
See you in the final installment of this Hugging Face series: computer vision and image classification.