Assessor bot,
- Electron
- LangChain
- Ollama
- shadcn/ui
Large Language Models (LLMs) in education
Ever since the release of OpenAI's ChatGPT there has been a lot of controversy surrounding the use of Large Language Models (LLMs) in education.
At the Master Digital Design we ask students to reflect on their work based on a set of competencies and indicators. Since the introduction of these LLMs (ChatGPT in particular) we have seen an increase in amount of usage of these models in the documents that are submitted by students.
From a assessors' perspective, we could see the generic (and mostly mediocre) generated text which was submitted and held an intervention to have an open conversation on the use of these models.
Guidance instead of banning
As we do see the value of generative technologies in the creative field and are sure they are here to stay. We prefer to guide students in the use of these technologies, and allow for them to develop a critical view, instead of banning it all together1.
By hosting a full week dedicated to experimenting with the use of AI in the creative field, with the question to the students to create a prompt-based product that uses LLM prompting techniques, the product must do only one thing but do it well, we explored and reflected opon the field of generative technologies together.
Portfolio checker
One of the projects which came out of this week was the Portfolio checker by Jaap Hulst, Niloo Zabardast and Elena Mihai.
Their project, using a prompt, tried to solve:
- Getting designers to reflect, get insight in their competencies, give direction
- Take off pressure for reviewing design work
- Make the feedback loop easier and faster
The concept and design of this project were made by Jaap Hulst, Niloo Zabardast and Elena Mihai.
This article will go into the technical details of implementing such designs to a working product.
For the non-technical aspects, I would like to refer you to the students themselves.
Feedback takes about a minute to be generated
The game plan
As this was my first time incorporating a Large Language Model (LLM) into a product I only had a rhough idea on how to approach this.
By making the students upload their documents, use Retrieval-augmented generation (RAG) to find relevant infromation, and combine the relevant data through a custom prompt to generate the feedback.
After some research I found this post by LangChain
which made me confident enough that I could build something similar and give a go at the portfolio checker.

I should be less confident
While most of the application was build rather quickly, “surprisingly” enough I struggled getting the RAG properly set-up and running to provide valuable feedback.
As I do not really have a clue on how to make a proper RAG implementation I followed the documentation and came up with something like:
This vectorstore would then be populated with documents uploaded by the student.
When the student would ask for feedback, it would create a Runnable
for each of the indicators using a FEEDBACK_TEMPLATE
and the following prompt:
what grade ('novice', 'competent', 'proficient', or 'visionary') and feedback would you give the student for given the competency ${competency} and indicator ${indicator.name}?
The secret sauce of the product, the prompt
This all generated quite some reasonable sounding feedback 🥳!
However, when digging deeper this system would either:
- not get the correct information from the competencies and indicators, due to the RAG not working as expected. It would therefor give completely wrong/nonsense feedback.
- Hallucinate so bad that it would make up content that was not provided by the student and then give feedback on that.
Use the large context-windows
I tried to over-engineer the system where it was not needed.
As we ask students to reflect upon their work within a set word limit and with the current models context windows of 1024 tokens, there was no need to split all documents into smaller chunks.
By removing the splitting of the documents and using the full document as context, most of the hallucinations were surpressed and nearly no content was being made up anymore!
Most (modern) Large Language Models are capable of handling all the documents content in their context window.
It is your data
Two thing which was important to me was that 1) the data of the students was not stored on any server, but only on the device of the student and 2) I do no want to force the student into a 200$ per month plan to get feedback on their work.
The assessor bot is utulising Ollama as it's core to interact with the Large Language Model. Even though it is not be as plug and play and requires the student to have a local installation of Ollama, it does give me a more peace of mind.
Structured output
While most models are capable of generating structured output and according to the documentation of LangChain
it should be possible to generate structured output using Ollama
. In @langchain/ollama 0.1.0
, the version available when building this tool, this interface was not availalbe.
I could hovever make the models give me back a JSON
response
And do some rudimentary post-processing to get the feedback in the format required for the student.
This would give me the required output in about 80% of the cases, which is more than plenty for an experimental tool.
Over-rule the design(ers)
As a small nudge to this research paper “On the Dangers of Stochastic Parrots”, I designed the entity you get feedback from to a parrot. Jaap Hulst made another itteration to get the style more in line with the Ollama
llama.