LLMs, Vibe Coding, and software development

Alejandro Pena

Chapter Lead Backend & QE

May 28, 2025

The rise of LLMs is redefining entire industries, and software development is no exception. Their ability to generate code, empower self-training or assist in debugging tasks has opened up a range of possibilities that we are just beginning to explore. While its enormous potential is evident, this new era also presents challenges and risks that are worth reflecting on.

The idea for this text arose after reading the article "How to stop your human from hallucinating", by Shrivu Shankar. The parallel he draws between LLMs' hallucinations and human errors was extremely interesting to me. In it, Shrivu argues how competent professionals can make mistakes (hallucinations) if processes or communication are inaccurate, similar to how an LLM fails due to lack of context or ambiguous data. The key would be not to blame the individual (or the AI), but to improve the systems, processes and information they operate with.

In this article, starting from one of the trendy concepts (the famous Vibe Coding), we will address challenges, risks and best practices that we consider crucial for the responsible and effective use of LLMs in the context of software development.

The goal is to prevent our tools, and especially ourselves, from freaking out in the process.

Vibe Coding

Concept and origin

The term was popularized by Andrej Karpathy, one of the most influential figures in the field of artificial intelligence during February 2025. In a series of tweets, in addition to introducing the term, he emphasizes how fun the practice is and how useful he sees it for "throwaway weekend projects."

The technique consists of providing the AI with natural language prompts and accepting the resulting code based more on a "feel" or "vibe" that it might work, than on rigorous analysis or knowledge of its inner workings. Accept the LLM response, apply without testing the diff, and keep throwing prompts until the target feature works: "Just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works".

The psychological appeal is tremendous, based on two very powerful factors: the illusion of accelerated productivity, and instant gratification. The promise of "being able to create" without requiring learning or effort is hard to ignore. Especially when combined with the current fascination with AI capabilities.

However, all this hides important trade-offs that we analyze below, and about which Karpathy already made spoilers in his famous tweet of February 2025.

Some advantages, but many limitations

Let's start by highlighting its main virtue, and in my opinion, its real utility: the quick creation of POCs, the semi-instantaneous exploration of ideas, concepts... even by profiles without a strong technical background. It is no longer necessary to go deep into languages or frameworks to validate "that thing that has crossed your mind". In this sense, it can act as a catalyst for creativity, allowing to materialize ideas in a preliminary way with a speed previously unthinkable.

But everything changes when it comes to transferring this work style to a productive environment. This promise of speed quickly runs the risk of becoming an obstacle. Without an overall strategy and oversight, it is easy to end up in "technical kaos": difficult to maintain, extend and even understand, compounded by the inherent difficulties of models to understand a context that is too broad, and the interdependencies between large-scale projects and systems.

Some specific problems:

Code quality: At low level, inconsistency, lack of solid design patterns, and perpetuation of erroneous solutions¹ (learned by the LLM from its training data) make the software difficult to maintain and scale. But most of all, I personally worry at a high level, the lack of a coherent architecture derived from multiple isolated AI interventions seriously compromises the integrity of the system.
Vulnerabilities: They arise from the incorrect or unsupervised use of libraries and functionalities, as well as from the LLMs' own hallucinations. They are already an identified and important risk vector². For example, the exploit of hallucinations in common imports has given rise to a new form of typosquatting³, renamed "slopsquattin"⁴.
Inefficient or suboptimal solutions: Caused by the natural tendency of LLMs to use brute force in their solutions. Inefficient algorithms that negatively impact performance and maintainability. This effect is accentuated in large contexts, in real projects⁵.
Excessive dependency on the AI: This situation can even lead to a form of vendor lock-in with the AI itself. Due to the issues listed above, it is easy to get into a dependency model, where only an AI seems to be able to understand the code, creating a snowball effect and making a long-term solution unsustainable.

After the negative part, it is now time to emphasize that these problems are not an inevitable doom, but are directly related to the absence of a framework for responsible use and constant critical oversight. Vibe Coding, as Karpathy said, is fun and useful, but adoption in productive AI environments requires a different approach. An approach that is often compromised by urgency and market pressures.

The mirage of FOMO and competitive pressure

Nowadays FOMO (Fear of Missing Out) is well known all over the world. Industries such as those related to entertainment (video games, movies and series, music...) revolve around this concept: if you are not there NOW and NOW, "you don't enjoy".

But this fear is not exclusive to them, it also exists in software development, and together with the competitive pressure of the current context, it is very difficult to ignore. The problem is that it often leads to hasty decisions, often without due reflection or solid strategic planning.

There are questions and doubts that echo in anyone's mind, "If the whole team used AI, would there be people left over?", "If you didn't use it, would you still be competitive?", "Can you even be competitive in the long run without using it?"

In my opinion, it would not so much be a matter of "too many people", but rather that team productivity would skyrocket exponentially if adopted with the right approach. On the other hand, a talented and well-organized team I firmly believe can remain competitive without adopting AI, no doubt, but would be giving up a significant force multiplier and innovation. With the impact that can have when we talk about the long term.

The general feeling is that there is no option to "get off this boat". The real question is not whether or not to adopt LLMs, but how to responsibly and strategically integrate a tool of this caliber. Let's not get carried away by FOMO, but do it with good timing.

The paradigm shift

The process, by its disruptive nature, goes beyond a standard adoption of a new tool. There must be a fundamental paradigm shift. Not only will it change our processes and ways of working, but it will require an evolution in how we, as humans, interact with and make use of these tools. To navigate this transformation successfully, it is crucial to address both the human and psychological dimension as well as the technical and organizational aspects involved.

The human and psychological approach

Three key concepts, which any LLM user should keep in mind.

Epistemic humility: The basis for critical judgment

The first pillar for responsible interaction with virtually any source of information, but especially with LLMs, is epistemic humility. The concept invites acknowledgement of the limitations and biases of any knowledge (even one's own). Applied to LLMs it implies assimilating that, despite their impressive ability to generate coherent text and functional code, they do not "understand" in the human sense. I sincerely believe that calling it Artificial Intelligence plays against us in that sense. It is important to understand that it is nothing more than a statistical generation process. Terribly complex and advanced, but not intelligent in the strict sense of the word.

Therefore, we should systematically question the answers provided by AI instead of blindly accepting them⁶ (Hello again, Vibe Coding!). Verify, validate, contrast... is the key in a context where generating content is so easy. This humility must also extend to us: we must be aware that judging the quality and correctness of LLM answers requires real cognitive effort and often expert domain knowledge.

Only from this recognition of mutual limitations (AI's, and our own) can we exercise truly effective critical judgment.

Overcoming indifference: the commitment of the technical person

Directly linked to the previous concept, it is necessary to highlight the need to overcome what is called epistemological indifference ("Epistemic Insouciance ")⁷. This term refers to an attitude of disinterest or disregard for the truth or accuracy of knowledge. It is a relatively common situation in the current social context, where the explosion of fake news and false content floods networks and media. Generating in the user an attitude of rejection or indifference, where veracity loses importance in front of contents that confirm their expectations.

The key here is that a "citizen" can afford such an attitude on a day-to-day basis, but a "technical" person has an active responsibility towards knowledge . An indifferent approach is not only unsustainable but dangerous for a professional in any field, and software development is no different. A developer cannot afford to be indifferent to the quality, correctness or implications of the code he/she writes, integrates or reviews.

It's not just about "making it work", it's about "knowing how it works". And above all ensuring that it works safely and efficiently. Epistemological indifference, in modern AI-assisted software development, is not an option. It is the most direct path to the "technical kaos" discussed in previous points.

Strategic cognitive offloading

The concept of cognitive offloading⁸ basically refers to "reducing the mental effort required to execute a task through the use of external resources". In the context we are dealing with, the relationship is obvious. We delegate code generation, information search and other tasks, freeing our mental capacity for other objectives. And now you can imagine where the following reasoning is going: this apparent comfort entails the risk of atrophy of our own cognitive abilities⁹ ¹⁰ if we fall into an indiscriminate and, above all, passive delegation.

It is precisely at this point where strategic cognitive offloading becomes key. It does not imply resignation, but it is advisable to be careful and selective about which tasks we delegate and which we reserve for our "mental muscle". The tendency that a priori sounds most logical is precisely to reserve our energy for those activities that require high critical judgment, creativity, or a deep knowledge of the domain and context. Some of these activities might include:

Architectural design.
Code review, especially AI-generated code.
Complex problem solving, requiring a holistic understanding of the system.
Strategic decision making on technologies to be used or strategies to be applied.

This has an important positive aspect, and that is that by focusing our efforts on higher level aspects, we not only mitigate unlearning, but we can even refine and deepen new or more valuable competencies.

The technical and organizational approach

Such crucial issues as security, privacy and model selection have already been covered in many articles, so here we will focus on other aspects that we consider equally relevant.

Prompts as Code (PaC): Traceability and Reproducibility

As LLMs are integrated into the development process, and code generation becomes more prevalent, the management of the prompts that feed these models becomes critically important. A best practice is to go for a approach*"prompts-as-a-code*"¹¹ , treating the instructions we feed into the models with the same rigor and best practices that we apply to traditional source code.

This paradigm includes versioning and documentation of each prompt. But it goes beyond simply saving different versions. It involves full traceability of changes (what changed and why), the ability to revert to previous versions, testing of prompts before deployment, A/B variation management or version tracking by environment. The goal is to make any changes related to prompts or models auditable and reproducible, allowing you to track how and why the output of an LLM has changed over time. This is crucial in environments with high automatic code generation. A small change to a prompt, or the underlying model, can have a significant impact on the functionality, consistency or safety of the final product.

Beyond versioning, it should also frame a broader prompts management. Consider them part of the application infrastructure. This can include everything from the ability to perform hot upgrades, monitor performance and outputs, manage access controls or coordinate changes between different services.

The key differentiator with traditional version control is the non-deterministic nature of LLMs. Unit testing on the prompt is not enough; continuous monitoring of the models' output is required to manage this variability. There are a multitude of specialized tools to facilitate implementation, although the direct use of Git and its integration in pipelines is always a valid option.

Governance and shared responsibility

The process of integrating such tools requires the establishment of clear and, above all, robust governance frameworks. As highlighted in "Responsible artificial intelligence governance: A review and research framework"¹², they should be based on a set of structural, procedural and relational practices designed to ensure that the systems operate ethically, transparently and in line with human values. These are essential aspects to ensure not only a legal, but also a safe and efficient use of this type of technology.

These frameworks should explicitly define roles and responsibilities within the organization. This topic is discussed in detail in the article "Governance of Intelligence Artificial"¹³ . Although management sets the vision, and legal ensures compliance, governance is a shared responsibility involving multiple levels and specialized areas. This could even include the assignment of key figures as AI Stewards. What is clear is that the assignment of responsibilities is critical for each stage of the lifecycle, and internal audits should focus on helping to verify these established policies.

Although it may sound like just another bureaucratic process, well-implemented governance acts as a catalyst and should enhance innovation precisely by ensuring that it occurs in a safe and sustainable manner. Cases such as the AI Center of Excellence powered by Sngular and Fakeeh Care Group¹⁴ are a clear example. It enables corporations to exploit and maximize the potential of AI with greater confidence, minimizing risks and ensuring that the benefits are realized in line with the organization's goals and values. Achieve a responsible innovation model.

Quality Engineering: a strategic pillar in the era of LLMs

Based on the above points, it is clear that the already key figure of Quality Engineering will evolve to become an even more strategic role.

New challenges are introduced that demand an evolution in the traditional QA/QE practices, the clearest of which is the non-deterministic nature of LLM-based systems. It is no longer enough to test functionalities with predictable inputs and outputs, it is necessary to adapt to this variability and possible unexpected behaviors. The new scope is not limited only to the validation of the generated code, it should also cover aspects such as:

Validation of the generation process: Quality and robustness of prompts and configurations that generate code. Evaluate reactions to unexpected or malicious inputs.
Detection of hallucinations and biases: Identify when the LLM invents or reproduces unwanted biases.
Regression testing on models: Ensure that new versions of LLMs or retrainings do not degrade or break previous functionalities.

The relevance of these practices and challenges is evident. Our colleague Fran Moreno had already commented on similar challenges¹⁵ that he faced in the actual implementation of projects. In his case, a regression strategy on trained models to use another model as a validator, solving the problem of nondeterminism in the response.

Continuing education: Upskilling & Reskilling

Finally, no technical or organizational strategy is effective if it does not involve a process of improvement and acquisition of these new skills by the teams. This training should be aligned with what has been discussed so far, it is not only about AI management, it is also about assimilating effective prompt engineering concepts and developing those critical review capabilities that have been discussed.

In essence, it's the same as always, a constant in the industry: fostering a culture of learning and adaptation. Something more complex in this case, due to the nature of the embedded tool, but with a greater potential for reward if that sweet spot of efficiency, accountability and innovation in the use of AI is reached.

Conclusion

The LLM era is one of promise and peril (like almost all of them, anyway...). If Vibe Coding works perfectly as a warning of what can happen with blind adoption, the real opportunity lies in integrating these tools with human judgment, accountability and a clear strategy.

Use AI as a "co-pilot", with the human always driving. Balance speed and robustness, encourage skepticism, and master the ability to use (and not use) LLMs effectively.

Epistemic humility, technical commitment, prompts-as-a-code strategy, the definition of a solid governance or the importance of an adequate quality strategy are just the ingredients to achieve the objective we pointed out at the beginning: successful adoption, and to avoid that both machines and humans "hallucinate" in the process.

Notes:

Alejandro Pena

Chapter Lead Backend & QE

With over 16 years of experience as a Software Engineer, I specialize in Backend, Quality Engineering, and DevOps. My passion for exploring new technologies is matched by my love for video games. I enjoy hiking with my family, watching movies, reading, or listening to music in my free time.

Previous Next

Our latest news

Interested in learning more about how we are constantly adapting to the new digital frontier?

The ultimate guide to applying AI with insight: from theory to real-world cases that transform your business

July 1, 2025

The ultimate guide to applying AI with insight: from theory to real-world cases that transform your business

Madrid pulses with the new era of Artificial Intelligence at the Google Cloud Summit 2025

May 26, 2025

Madrid pulses with the new era of Artificial Intelligence at the Google Cloud Summit 2025

Sngular and Fakeeh Care Group join forces to create an AI Center of Excellence in KSA

April 22, 2025

Sngular and Fakeeh Care Group join forces to create an AI Center of Excellence in KSA

AB360: A Pioneering Platform of Collaborative Intelligence for Sustainable Building

April 21, 2025

AB360: A Pioneering Platform of Collaborative Intelligence for Sustainable Building