Module I: Selecting an LLM for a Task

Introduction

With the widespread public adoption of large language models, the number of available models has ballooned. Thus, it becomes increasingly difficult to select an ideal model, often leading to decisional procrastination. That said, the selection of LLM is crucial. Different LLMs are better suited for different tasks.

Large language models (LLMs) are artificial intelligence (AI) systems trained on textual data that can generate human-like text. LLMs can support educational tasks by aiding in content creation, personalization, feedback, and administrative support.

Selection Criteria for Educational LLMs

When selecting an LLM for educational purposes, you should analyze your tasks across the following dimensions:

Complexity

How complex is the subject matter? Does it require specialized knowledge?

Audience

What is the age/educational level of the students?

Purpose

Is this for content creation, student interaction, or administrative work?

Subject Area

Which disciplines are involved? Are they factual or interpretive in nature?

Customization Needed

How much adaptation to your specific context is required?

Privacy

What data/information will you provide to the agent? How important is the privacy of the data?

Modern Relevance

How up-to-date must information be? Is your question likely to have changed in the past several months, or is it relatively static?

Available LLM Options

GPT-4o (OpenAI)

OpenAI's flagship model continues to lead with its exceptional language and visual processing skill. Its strengths include a remarkable ability to reason across academic subjects, code generation, and handling complex prompts with nuanced instructions.

Educational Applications:

Excels in creating differentiated learning materials, providing detailed feedback on student work, and generating complex explanations that require multistep reasoning. Its multimodal capabilities allow it to create unique and niche visualizations.

Ideal for: Advanced subject matter, multidisciplinary content, and visual-textual processing tasks.

Claude 3.7 Sonnet (Anthropic)

Emphasizes helpfulness, harmlessness, and honesty. It provides high accuracy in factual knowledge and mathematical reasoning. Claude models are notable for handling extremely long contexts (up to 200,000 tokens), allowing educators to provide substantial background information, student work, or curriculum documents in prompts.

Educational Applications:

This extended context window makes Claude markedly valuable in the creation of materials which align with long curriculum documents. Claude also demonstrates strong capabilities in the nuanced discussion of complex concepts.

Ideal for: Tasks requiring lengthy documents, ethics-sensitive content, and situations where factual reliability is paramount.

Gemini 1.5 Pro (Google)

Represents Google's advanced multimodal model, with exceptional integration of textual, visual, and auditory understandings. Technologically, Gemini incorporates Google's vast knowledge graph.

Educational Applications:

Excels at creating multimedia learning experiences, analyzing audiovisual recordings of student performances, and generating visually-rich educational content. Its integration with Google products makes it useful for educators working within the Google ecosystem.

Ideal for: Multimedia-based learning activities and scenarios requiring strong integration with Google Workspace tools.

Perplexity

Distinguishes itself through real-time internet access with search-generative AI integration. This distinguishes it from pre-trained models, allowing it to access up-to-date information. It cites internet sources for major statements in its response, reducing hallucination risk.

Educational Applications:

Exceptional for research tasks that require current information. It also allows for information literacy development of students due to the embedded sources.

Ideal for: Inquiry-based learning, literature reviews, and teaching source evaluation.

Grok (xAI)

Developed by Elon Musk's xAI, emphasizes personality, humor, and conversational intelligence. Technologically, Grok can access real-time information (including both webpages and X posts) and excels at explaining complex technical concepts in accessible language.

Educational Applications:

Grok's engaging personality makes it useful for motivating learners by making traditionally dry material more engaging. Its explanatory capabilities are strong for STEM subjects, where it can break down complex concepts into understandable components.

Ideal for: Student engagement, promoting accessibility of concepts, and informal contexts where personality enhances learning.

Command R+ (Cohere)

Specializes in enterprise-level language with strong retrieval augmentation. Technologically, it excels at processing niche documents and domain-specific knowledge. Its retrieval augmentation capabilities make it excellent for the creation of custom knowledge bases from textbooks, research papers, or institutional context.

Educational Applications:

Excellent for creating custom knowledge bases from textbooks, research papers, or institutional context.

Ideal for: Professional education, specialized academic programs, and situations requiring integration with proprietary educational content.

Le Chat (Mistral)

Combines efficiency with strong performance, with the latest models offering strong reasoning capabilities while minimizing compute cost. The model can be implemented locally to allow for easier access in geographic locations which may have weaker internet connectivity, allowing students in rural and remote areas similar access to AI.

Educational Applications:

Despite the minimalism and efficiency, Le Chat still provides a strong model with marked reasoning capabilities.

Ideal for: Institutions with limited infrastructure, local deployment needs, and speed-critical scenarios.

Qwen (Alibaba)

Alibaba's open-source LLM family featuring controllable "thinking modes" that allow users to toggle between fast responses and step-by-step reasoning on a sliding scale. Released under Apache 2.0 license, enabling local deployment and customization without proprietary constraints.

Educational Applications:

Useful for developing metacognitive skills as students can observe the model's explicit reasoning processes through controllable thinking modes.

Ideal for: Metacognitive skill development, differentiated instruction, and institutions requiring customizable open-source solutions.

DeepSeek

An innovative LLM backed by the Chinese hedge fund, High-Flyer. It aims at advancing artificial general intelligence (AGI) with groundbreaking cost-efficiency and performance. It stands out technologically by leveraging a mixture-of-experts (MoE) architecture, allowing it to distribute tasks among a series of experts in various task types.

Privacy Considerations:

There are notable privacy concerns with the data use due to the location of servers in mainland China.

Ideal for: Tasks which are less privacy-sensitive, providing deep explanations of STEM topics, and providing detailed feedback on assignments.

Test Your Understanding

Question 1

When selecting an LLM for educational purposes, what are three key dimensions you should analyze your tasks across? Explain why each dimension is important for making an effective model selection.

Question 2

Compare and contrast Claude and Perplexity in terms of their strengths and ideal educational use cases. Provide specific examples on when you would choose one over the other.

Question 3

A high school teacher wants to create differentiated learning materials for a unit on climate change that includes visual components. They need the most current information available and want to integrate the materials with existing Google Classroom resources. Which LLM would you recommend they use? Justify your recommendation based on specific features mentioned in the module. What limitations might they encounter with your recommended LLM?

Question 4

You are a professor at a rural college with unreliable internet access. You need to develop a comprehensive literature review for a research paper on quantum physics, using a large corpus of academic papers you have stored locally. The review must incorporate complex mathematical thinking and maintain high factual accuracy. Which LLM would be most appropriate for this task? Explain your choice. What workarounds might you need to implement to address any limitations of your chosen model?

Question 5

A middle school English teacher is developing an engaging creative writing unit. They want students to interact with an AI that can make traditionally dry grammar and storytelling concepts more appealing through humor and personality. The teacher also wants to use this interaction to teach students about evaluating sources critically. Recommend an LLM combination that would serve these dual purposes. Justify why you would combine these specific LLMs based on their strengths. How would this combination address the dimensions of "Audience" and "Purpose" mentioned in this module?

Continue to Module II

Prompt Engineering →