Developer Trains 235 Million Parameter LLM From Scratch

An anonymous developer has trained a 235 million parameter Large Language Model (LLM) from scratch. This is much smaller than industry giants but could help make AI development more accessible.

A developer, identified only by the handle "FareedKhan-dev" on GitHub, has put forth claims of training a 235-million parameter Large Language Model (LLM) from scratch. The details, disseminated through repositories titled "train-llm-from-scratch" and "create-million-parameter-llm-from-scratch," suggest a focus on fundamental model architecture and training processes. This development, if substantiated, could represent a notable stride in making sophisticated language model development more accessible.

The developer's work appears to explore various architectural components fundamental to modern LLMs. One repository details a Transformer model class, outlining core elements like embedding layers and a training loop structure. Another repository, "create-million-parameter-llm-from-scratch," delves into specifics of model construction, including the integration of RMSNorm, Rotary Positional Embeddings (RoPE), and Masked Multi Head Attentions. This specific exploration resulted in a model with approximately 60,000 parameters, demonstrating iterative refinement.

Parameter Count and Model Scale

The claimed 235-million parameter count places this development in a distinct category, though still considerably smaller than some industry giants. For comparison, Alibaba's Qwen3‑235B‑A22B‑Instruct‑2507, released in July 2025, is an instruction-tuned LLM with 235 billion parameters, though only 22 billion are activated per token. This highlights a significant disparity in scale, with the anonymous developer's model being orders of magnitude smaller.

Read More: New phone chips may use magnetic tricks for faster speed

Large Language Models, as described in a recent survey, are neural network-based statistical models trained on vast text corpora. They are categorized as 'Foundation', 'Instruction', or 'Chat' models based on their fine-tuning. The concept of parameters, essentially the learned values within a neural network, is central to a model's capacity. A publication from January 2026 notes that a smaller model trained on substantial data can potentially outperform a larger one with less data. The anonymous developer's assertion implies a potential pathway for achieving robust performance without requiring colossal parameter counts.

The practical implications of such a development hinge on the actual performance and capabilities of the trained model, which remain undisclosed. While the code repositories offer a glimpse into the technical underpinnings, the efficacy of the trained 235-million parameter model against established benchmarks is yet to be verified. The journey from raw data to a functional LLM, as outlined by the developer's work, involves stages from data preparation to text generation, encompassing fundamental neural network operations.

Read More: Claude Code Integration Lets Users Run AI Coding Assistant Locally or in Cloud

Frequently Asked Questions

Q: Who trained a 235 million parameter LLM from scratch?
An anonymous developer, known as 'FareedKhan-dev' on GitHub, has claimed to train a 235 million parameter Large Language Model (LLM) from scratch. They shared code on repositories like 'train-llm-from-scratch'.
Q: What is special about this 235 million parameter LLM training?
This development could make advanced language model training more accessible. The developer focused on fundamental architecture and training processes, showing how to build models without massive parameter counts.
Q: How does the 235 million parameter LLM compare to others?
The 235 million parameter model is significantly smaller than major industry LLMs, which can have billions of parameters. However, smaller models trained on good data can still perform well, according to recent research.
Q: What technical details were shared about the LLM training?
The developer shared code for a Transformer model with embedding layers and a training loop. It also included details on RMSNorm, RoPE, and Masked Multi Head Attentions, leading to a smaller 60,000 parameter model first.
Q: What are the next steps for this 235 million parameter LLM?
The actual performance and capabilities of the trained 235 million parameter model are not yet known. Its effectiveness against standard tests needs to be verified to understand its practical use.