Finetunio - Model Optimization

Clients & Ventures

Product/systems design that turns AI tech into business results

I'm Gil, an AI Product Design Engineer. I take complex workflows and turn them into AI-native systems, from discovery through shipped product.

My north star is the complete practitioner: the idea that the best AI-native products are shaped by one mind that can hold design, engineering, and AI logic simultaneously, not handed off through layers of interpretation.

When the same person who defines the workflow also builds the prototype and wires the AI, translation loss disappears. Teams move faster and the gap between what we designed and what shipped closes.

Fidelity Life

Days to ~20 minutes

Compressed a high-friction insurance workflow into a faster, clearer operating path for teams and customers.

SunPower

16+ internal tools

Carried a multi-year product ecosystem across sales, design, installation, and monitoring workflows.

Microsoft

5,000+ developers

Helped technical audiences adopt new platforms through demos, workshops, and practical product education.

What I Bring

Enterprise Systems

Enterprise End-to-End Systems Builder

I've designed complex product ecosystems at enterprise scale — multi-product platforms, multi-role workflows, and long-running transformation initiatives as a solo designer. Systems thinking, not just screen design.

AI Product Engineering

AI-Native Product Design Engineer

I design and prototype AI-native product systems — RAG pipelines, agentic workflows, human-in-the-loop interfaces, structured outputs, and context-aware products. AI product work grounded in real systems, not demos.

Education & Evangelism

Technical Educator & Evangelist

I help people understand, adopt, and build with new technologies — through demos, workshops, tutorials, and product systems thinking. Teaching as adoption; adoption as product design.

A consistent practice across technology cycles

Tech Speaker Microsoft · 5,000+ developers

Co-founder 8th Light · Digital transformation

Startups + Enterprise Engine Yard · ThoughtWorks · Dev tools

Founder Skowak · Design engineering · Teaching

Enterprise Focus SunPower · Digital transformation · Ecosystems

AI Eng Focus Craftal · CareDash · Multi Agent Systems

Now Contextus · Craftal v.2 · AI-native systems

How I Work

I don’t start with AI. I start with how work actually happens.

The method works in cycles. Each one scoped small enough to prove value fast, wide enough to compound across the whole system.

Each cycle makes the next one faster, safer, and more valuable. Real workflows before AI use cases. No baseline, no proof. No proof, no scale.

Steps 01 — 03

Understand

Discover
Model
Baseline

Know before you build

Steps 04 — 06

Intervene

Design
Adapt & Script
Measure

Build small, prove value

Steps 07 — 08

Promote

Promote
Orchestrate

Make it durable and connected

Steps 09 — 10

Scale

Govern
Reuse & Expand

Govern and compound

Where most AI projects break down

Most AI failures aren't technical. They're structural: siloed teams, AI built without real workflows, ideas that can't survive contact with production. The method closes those gaps.

Design, engineering, and AI in separate rooms

Work lands in three different hands. Every handoff is a translation, and translation is where ideas lose fidelity.

One practitioner across all three layers

I lead design, write code, and wire the AI as one strategic partner across all three disciplines. No handoffs, no interpretation layers, no gap between what was designed and what shipped.

AI built without a workflow to plug into

Teams rush to launch LLM features with minimal ROI. Without mapping real workflows or solving real problems, the result is unused chatbots and shallow automation.

Bridge between LLMs and real work

I map how work actually happens first, then wire the AI stack into those real decisions. The result is automation that runs, not a chatbot no one uses.

Technical Toolkit

The full stack, from workflow model to deployed AI system.

My stack spans the full system: AI engineering (OpenAI & Anthropic APIs, LangChain, LangGraph, DSPy, RAG pipelines, vector stores, structured outputs, agent orchestration), frontend and prototyping (React, Angular, Vue, TypeScript, API-connected interfaces), backend and tooling (Python, FastAPI, Rails, PostgreSQL/pgvector, Docker), and product and design (systems modeling, workflow mapping, information architecture, interaction design, design engineering).

As a product generalist, I choose the right tool for each layer (design, front end, data, or model) so ideas travel from workflow model to shipped system without unnecessary handoffs.

Getting personal

Things I do when I'm not building

Still-life sketching, coastal hiking, PC teardowns, and slow travel through cities with good architecture.

I draw to stay sharp. I travel to stay curious. I take things apart to stay honest about how they work.

The Narwhale Pod

Welcome to the Narwhale Pod! In the tech world, designers who code are often called "unicorns": mythical creatures of legend. But I believe these skills are very real and increasingly essential. The ability to bridge design and development creates more cohesive and innovative digital experiences.

Narwhals, the unicorns of the sea, are a perfect metaphor: they're real, they're remarkable, and they navigate their world with unique capabilities. This space is dedicated to exploring the intersection of design, code, and AI, sharing insights, projects, and practical magic.

Designing the App Layer for Agents

A post-mortem from the Berkeley Agentic AI MOOC, written for hybrid designer-developers working in the application layer. It reframes agents as long-running, distributed systems rather than chat UIs, and shows how evaluation design, test-time strategies, memory, and visible state and recovery flows are the real product-design problems behind agentic products.

Read on Medium

Why Governance Is the Missing Layer in AI Product Design

A post-mortem from the Harvard Data Science Initiative Agentic AI Intensive. It argues governance is not a compliance afterthought but a core design layer: as agentic systems break the deterministic software model, designers must own accountability, guardrails, and human-in-the-loop validation.

Read on Medium

Revolutionizing Medical Intake with AI Agents

Explores CareDash, an AI-driven chatbot built at a Berkeley hackathon to simplify medical intake for clinics and patients. It uses a RAG pipeline over questionnaires and policy documents, with patient-friendly touches like medical-term lookup and secure e-signatures.

Read on Medium

Unlock the Power of AI — No Internet Required

Introduces the benefits of an offline AI assistant: remote work with no internet, productivity on long flights, or keeping sensitive data private. It sets up a hands-on series on building your own local, RAG-powered assistant that runs entirely on your laptop.

Read on Medium

Coming Articles

Drafts in the pipeline, distilled from my current build notes and design narratives across Craftal and Contextus OS.

CWM → CIR: two artifacts, “what” vs “how”
Design-first: earn the DSL against real domains
Two natures: domain “meat” vs instrumentation “wiring”
Dual-runtime: human/CWM vs machine/CIR
How we know the notation is right: convergent evidence
The Neufert vision: a shared repository of business domains
Why user flows weren’t enough: adding the system substrate
Blueprint vs Specification: a human source that compiles to a machine-ready artifact
What we added to ATOS, in plain terms
Auditing AI builders: read the repo, not the report
The boundary held: what design-first bought us
Carrying a project across AI threads: a context-continuity system
Earn the shape before you build it
Every AI-builder handoff should be a committed file
Decide, distill, explain: three genres for project knowledge
Rationale docs: a thinking workspace before the ticket
Let each slice tell you where the next risk is
A stage is green, or it’s stopped: nothing in between
The classifier seam: two provider modes, one pure core
Two layers of undo: portable history vs the personal timeline
Loop engineering: gates between phases, autonomy within them
Determinism is the moat: narrow the LLM to judgment
One artifact, three faces: layering sophistication over one core
What an inbox pass actually delivers vs. the vision
Not everything is an atomic note: genres, homes, and schemas
Is it real, or did we dream it? Telling a valuable system from a green one

Dive deeper into the pod and explore all articles on Medium at The Narwhale Pod.

Back

Finetunio - Model Optimization

An intent-first workbench for planning, executing, evaluating, and deploying model improvements.

Skills applied:

Interaction Design Frontend Dev Backend Dev

Improving an AI model often means navigating disconnected scripts, datasets, training frameworks, evaluation tools, and deployment systems without a reliable way to decide what should happen first.

I designed Finetunio as a human-centered optimization workbench that turns behavioral intent into a connected path through strategy, data preparation, execution, evaluation, and deployment.

Outcome

A complete optimization story instead of a collection of technical steps.

Finetunio turned a fragmented model-improvement process into a coherent product lifecycle.

A team can now begin with a behavioral goal, compare plausible interventions, prepare and inspect its data, construct a workflow, preserve every experiment, evaluate outputs against the base model, register the result, and launch it through a guided deployment process.

diagram finetunio 03 — One connected lifecycle preserves the reasoning behind every experiment, model decision, and production release.

The resulting MVP demonstrates several practical shifts:

Optimization strategy becomes an explicit decision rather than a hidden assumption.
Cost, risk, time, and infrastructure tradeoffs are visible before execution.
Dataset quality becomes a first-class stage instead of an afterthought.
Successful and failed runs remain reproducible and comparable.
Evaluation connects metrics with observable model behavior.
Deployment preserves lineage and configuration through production launch.
Technical concepts remain accessible without removing expert control.

These are product-level outcomes demonstrated by the working prototype, not measured customer results. The interface includes representative project data and illustrative evaluation scores to show how the system would operate.

The deeper outcome was a clearer product category. Finetunio began as a guided fine-tuning workbench, but the design exposed a more valuable opportunity: an intelligence layer that helps teams determine how model behavior should change, not merely how a training job should run.

That direction connects directly to my broader work in workflow architecture, orchestration, and AI-native product design. The interface is not decoration around the model pipeline. It is where intent becomes an inspectable, executable system.

The Problem

Teams could run training jobs, but choosing the right intervention was still guesswork.

The difficult part of model optimization is rarely starting a training process. It is deciding whether training is the right response to the problem in the first place.

A model may be hallucinating because it lacks current knowledge, follows instructions inconsistently, retrieves the wrong context, produces an unreliable format, or has learned the wrong behavioral pattern. Each diagnosis points toward a different intervention. Fine-tuning, RAG, prompt optimization, preference alignment, tools, and guardrails are not interchangeable.

Yet most systems begin by exposing technical mechanisms. Users are asked to select a framework, configure a learning rate, choose a LoRA rank, and set epochs before the product has established what behavior is actually broken.

This creates an expensive form of trial and error. Teams can consume compute, prepare the wrong dataset, or interpret a higher benchmark score as success while the model continues failing in the situations that matter.

The surrounding workflow compounds the problem. Strategy decisions live in one place, datasets in another, training jobs in scripts or notebooks, evaluation in separate frameworks, and deployment in another operational layer. Experiment lineage and decision context are easily lost between them.

diagram finetunio 01 — Behavioral goals became fragmented across technical choices before teams could determine what problem they were actually solving.

The product therefore had to solve two related problems: translate human intent into a defensible optimization decision, then preserve that reasoning throughout execution.

Solution

Designing the missing decision layer between intent and execution.

I structured Finetunio around a seven-stage lifecycle:

Intent → Strategy → Datasets → Workflow → Runs → Evaluation → Deployment

This lifecycle became the product’s primary mental model. It prevents optimization from being treated as an isolated training job and keeps each decision connected to the goal that produced it. The project state is driven by actual progress data rather than by whichever screen the user happens to be viewing.

Starting with the outcome, not the parameters.

The Intent Builder asks users what they need the model to do better, how much risk they can tolerate, and what infrastructure and time they have available.

Instead of forcing users to begin with “LoRA rank” or “learning rate,” the experience begins with goals such as reducing hallucinations, improving reasoning, increasing domain knowledge, or producing more reliable structured outputs. This reflects the broader UX principle behind the product: declare the desired behavioral change first, disclose implementation complexity only when it becomes useful.

Optimization begins with the intended behavior, acceptable risk, and real operational constraints.

The right-side summary keeps the emerging intent visible while the system identifies candidate strategies. That persistent context is important because the strategy should remain traceable to the goal, not become a detached configuration exercise.

Converting technical alternatives into decision support.

The Strategy Explorer compares optimization approaches through expected fit, cost, risk, duration, and infrastructure requirements.

I used a cost-versus-performance visualization to make tradeoffs legible before asking users to commit. The accompanying table preserves the detail needed by technical users, while recommendations, scoring, and contextual explanations keep the experience accessible to product leads and team members who do not specialize in model training.

Cost, risk, duration, infrastructure, and expected fit turn strategy selection into an explicit decision.

The current MVP recommends individual strategies including prompt optimization, RAG, QLoRA, full fine-tuning, and DPO. The larger product direction extends this into an Optimization Decision Engine that diagnoses the failure pattern and composes several interventions when necessary.

For example, reducing hallucinations may require RAG first, prompt optimization as a quick secondary layer, and light QLoRA only if a deeper behavioral change remains necessary. The system should also be able to explain why fine-tuning is not recommended.

diagram finetunio 02 — Finetunio separates the behavioral goal from the replaceable technical interventions used to achieve it.

Making the optimization plan visible and executable.

Once a strategy is selected, the Workflow Canvas converts it into a visual pipeline.

I designed the canvas to sit between an abstract recommendation and the underlying execution system. Users can inspect data preparation, training, evaluation, and model registration as connected states rather than treating each as an unrelated tool or script.

The visual pipeline exposes how data, training, evaluation, and registration behave as one system.

The canvas maintains technical depth without presenting all of it at once. Each node can expose configuration when selected, while the main view remains focused on sequence, status, dependencies, and execution progress.

The important decision was not to hide complexity completely. Model optimization is too consequential for a deceptive one-click abstraction. Instead, I used progressive disclosure to make the default experience understandable while preserving the parameters, validation, and execution state experts need.

Treating experiments as evidence, not disposable runs.

The Runs experience records the strategy, model, dataset, duration, status, cost, evaluation score, configuration, logs, and output artifact associated with every experiment.

Completed and failed experiments remain part of the evidence behind each model decision.

Users can inspect individual runs and compare two experiments directly. This changes the workflow from “try another configuration” into a reproducible process where teams can identify whether an improvement came from the dataset, training configuration, intervention type, or model choice.

Evaluating behavior where humans can understand it.

A higher aggregate score does not guarantee that the model changed in the desired way.

The Evaluation Arena therefore compares the base model and optimized model side by side using real prompts. It combines relevance, accuracy, completeness, citations, human voting, flags, comments, prompt history, and win rates.

Behavioral comparison connects quantitative scores with the outputs people actually need to trust.

This makes evaluation more interpretable and creates an auditable record of why one model was preferred. It also provides the foundation for future automated behavioral evaluation, regression detection, boundary testing, and feedback loops that revise the optimization plan.

Carrying the decision through deployment.

I designed deployment as the final stage of the same lifecycle rather than a separate administrative task.

The workflow supports Hugging Face, local inference, managed REST endpoints, SageMaker, Azure ML, and Vertex AI. Before launch, the user can review model lineage, target configuration, quantization, precision, concurrency, authentication, monitoring, latency, throughput, cost assumptions, and preflight checks.

The final review keeps model lineage, serving constraints, safeguards, and estimates visible before launch.

The underlying execution backend can change, but the human-facing decision path remains consistent. That separation between intent and execution is central to the product architecture and its long-term ability to support multiple optimization and serving systems.

Related Projects

Contextus

A context-native workspace that turns scattered AI interactions into persistent, inspectable know...

Health

Craftal

An AI-native workspace for generating, inspecting, and validating complex product workflows.

Infra & DevOps

NextGen Biosurveillance

In-App Conversational Interface

EDDiE - Solar Design Tool

SOL - Homeowners AI Chatbot

Workflow Builder

Consumer-Direct - Life Insurance

Praedico

Craftal

Contextus

Finetunio - Model Optimization

Contact Us

Project Detail

Summary

Outcome

The Problem

Solution

Related Projects

Contextus

Craftal