Metadata-Version: 2.4
Name: p53-nightdesk
Version: 0.1.0
Summary: Point 53 Nightdesk — iterative metric optimization with AI-assisted hypothesis generation.
Author: Point 53 LLC
License-Expression: MPL-2.0
License-File: LICENSE
License-File: NOTICE
License-File: THIRD_PARTY_LICENSES.md
Keywords: anthropic,autoresearch,local-llm,ollama,optimization,point53
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: platformdirs>=4.3.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: tomli>=2.0.0; python_version < '3.11'
Provides-Extra: all
Requires-Dist: anthropic>=0.40.0; extra == 'all'
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.40.0; extra == 'anthropic'
Description-Content-Type: text/markdown

# Point 53 Nightdesk

Licensed under the Mozilla Public License, v. 2.0 — see `LICENSE`.

> *"Point 53" and "Point 53 Nightdesk" are trademarks of Point 53 LLC. MPL-2.0 does not grant rights to use these marks. The bare `nightdesk` CLI name is a functional identifier, not a trademark claim on that word alone.*

A flag-based CLI for AI-driven iterative metric optimization. Describe what
you want to optimize. Nightdesk generates the project, runs experiments,
tracks scores, keeps improvements, reverts regressions, and accumulates
knowledge — across as many projects as you want.

Works for anything with a measurable number: ML training, trading strategies,
system hardening, prompt engineering, game balance, compiler flags, API
latency, infrastructure cost, scientific parameter tuning, and more.

## Seven Commands, Zero Memorization

```
nightdesk new                    Create a project from a description (AI-scaffolded)
nightdesk list                   Show all projects
nightdesk edit <name>            Chat with AI, configure, or manage lifecycle
nightdesk test <name>            Verify the project's scoring command works
nightdesk loop <name>            Run the autonomous optimization loop
nightdesk split <src> <new>      Fork a project as a new variant
nightdesk help                   Show the menu
```

That's the whole thing. No "active project" to track, no chat mode to enter,
no subcommand hierarchy. Every command takes an explicit project name.
Scriptable, agent-friendly, overnight-safe.

## Install

```bash
# Prerequisites: python >= 3.11, uv, git

# From the repo
cd point53-nightdesk
uv tool install -e ".[all]"       # Installs `nightdesk` globally
# Or editable in local venv:
uv pip install -e ".[all]"
```

### AI Setup

Nightdesk uses three independently configurable LLM roles, defined in
`~/.config/point53/nightdesk/research.toml`:

- **`new`** — `nightdesk new` scaffolding (file generation + Courier
  web research). Frontier-tier capability is helpful here.
- **`edit`** — `nightdesk edit` chat (assess issues, propose file
  edits). Capability matters here too.
- **`loop`** — `nightdesk loop` autonomous iteration. Runs many times;
  cheap/local is usually the right call.

A typical setup pairs Anthropic for `new` and `edit` with Ollama for
`loop`. The default config is Ollama for all three so a fresh install
runs fully local.

### Web research (Courier)

If `p53-courier` is installed and `[courier].enabled = true` in
`research.toml`, the LLM running in any stage can emit
`GATHER: <query>` lines mid-response to fetch live web context. The
system pauses, runs Courier, and resumes the conversation with the
results appended. Capped at `[courier].max_rounds` per turn. Per-stage
control:

```toml
[courier]
enabled = true                # master kill switch — overrides everything below
new  = "on_demand"            # GATHER: marker available during scaffolding
edit = "on_demand"            # GATHER: marker available in nightdesk edit chat
loop = "on_demand"            # GATHER: marker available per loop iteration
# Set any stage to "off" to disable Courier there without affecting other stages.
```

Defaults are on-demand for all three. The model decides per turn whether
it needs to gather; most loop iterations skip it entirely.

**Ollama (local or network, no API key):**
```bash
# Local
systemctl enable --now ollama
ollama pull gemma4:31b

# Network — set [llm.<role>].base_url in research.toml, or per-project:
nightdesk edit <project> --role loop --url http://10.0.0.3:11434 --model gemma4:31b
```

**Anthropic (cloud):**
```bash
export ANTHROPIC_API_KEY=sk-ant-...
# Or, suite-scoped:
export P53_ANTHROPIC_API_KEY=sk-ant-...
```

API keys live in environment variables only — never written to config
files. Configure roles by editing `research.toml` (the file is created
on first run with sensible defaults), or per-project via `nightdesk edit
<name> --role {edit|loop} --provider X --model Y`. The `new` role is
global-only; edit `[llm.new]` in `research.toml` directly.

## Quick Start

```bash
# 1. Create a project (interactive: gather details, AI generates files, you iterate)
nightdesk new "optimize Python web API response latency with locust"

# 2. Verify it runs
nightdesk test my-api-project

# 3. Let the AI drive the loop
nightdesk loop my-api-project --max-iterations 20
```

You'll see each iteration stream by: the AI's hypothesis, the file edit,
the experiment run, the score, keep or revert. Ctrl+C to stop early, or:

```bash
nightdesk loop my-api-project --stop     # graceful stop after current experiment
```

For overnight runs, detach it:

```bash
nightdesk loop my-api-project --detach --outfile ~/api.log
```

Wake up to a log of experiments and a better score.

## Managing Projects

```bash
# List all active projects (stacked format, full info)
nightdesk list

# Detailed view of one project (experiments, knowledge, loop history)
nightdesk list --name my-api-project

# Include hidden/archived projects
nightdesk list --all
```

**Lifecycle** (all via `edit`):

```bash
nightdesk edit my-project --hide          # soft-delete (keep files)
nightdesk edit my-project --show          # restore from hidden
nightdesk edit my-project --delete        # hard-delete (wipe files)
```

**Reconfigure LLM** (takes effect immediately). `--role` selects which
role you're updating; defaults to `loop` if omitted:

```bash
# Loop role (default — what you change most often)
nightdesk edit my-project --model qwen3.5:122b
nightdesk edit my-project --role loop --provider ollama --model qwen3.5:122b

# Edit-chat role independently
nightdesk edit my-project --role edit --provider anthropic --model claude-opus-4-7

# Different Ollama server for the loop
nightdesk edit my-project --role loop --url http://10.0.0.3:11434

# To change the `new` role, edit ~/.config/point53/nightdesk/research.toml
# directly — no project exists at scaffold time, so it's global-only.
```

**Fork** to try a different approach without losing your current progress:

```bash
nightdesk split trading-v1 trading-v2-aggressive
```

## Chatting With the AI About a Project

```bash
nightdesk edit my-project
```

Drops you into a chat session with the project's configured AI. The AI has
full context: program.md, tunable files, results history, knowledge base.
You can:

- Ask questions about results, patterns, what to try next
- Request file edits — the AI proposes, you approve
- Record insights or dead-ends to guide future iterations
- Reason about strategy, debug failures, plan experiments

The AI uses structured markers in its responses:
- `EDIT: <filename>` + code block → wrapper detects and asks to apply
- `INSIGHT: <text>` → auto-recorded in knowledge.json
- `DEAD-END: <text>` → auto-recorded in knowledge.json

Type `done`, `quit`, or press **Ctrl+D** to exit.

## Use Cases

**Machine Learning** — Tune hyperparameters, architecture choices, training
schedules. Score: validation loss, accuracy, F1.

**Trading Strategies** — Optimize entry/exit logic, position sizing, risk
parameters. Score: PnL, Sharpe ratio, win rate.

**System Hardening** — Tighten OS configs, firewall rules, kernel parameters.
Score: Lynis hardening index, CIS benchmark score.

**Prompt Engineering** — Iterate on system prompts, few-shot examples,
output formats. Score: task accuracy, response quality metric.

**Game Balance** — Tune difficulty curves, economy parameters, spawn rates.
Score: simulated completion rate, player satisfaction proxy.

**Infrastructure** — Optimize cloud configs, caching strategies, database
tuning. Score: latency p99, cost per request, throughput.

**Scientific Research** — Optimize reaction parameters, simulation configs,
experimental conditions. Score: yield, efficiency, convergence rate.

**Compiler/Build Optimization** — Tune compiler flags, build configurations,
link-time optimizations. Score: binary size, execution time, compile time.

If there's a number you want to maximize or minimize and a way to measure
it from a shell command, Nightdesk can loop on it.

## Command Reference

### `nightdesk new`
Interactive: gather details (scoring method, constraints, acceptance
criteria), AI generates all files (program.md, scoring scripts, configs,
requirements.txt), you iterate until satisfied. Auto-installs Python deps
into a project-local `.venv`.

### `nightdesk list [flags]`
| Flag | Effect |
|---|---|
| *(none)* | Active projects, stacked format |
| `--all` | Include hidden/archived (adds Status row) |
| `--name <name>` | Single project, detailed view (experiments, knowledge, loop runs) |

### `nightdesk edit <name> [flags]`
| Flag | Effect |
|---|---|
| *(none)* | Enter AI chat session, uses the project's `edit` role (Ctrl+D to exit) |
| `--hide` | Soft-delete: status → archived, files kept |
| `--show` | Un-hide a hidden project |
| `--delete` | Hard-delete: unregister + wipe files |
| `--role <edit\|loop>` | Which LLM role the next provider/model/url flags apply to (default: `loop`) |
| `--provider <ollama\|anthropic\|openai-compatible>` | Change LLM provider for the chosen role |
| `--model <name>` | Change LLM model for the chosen role |
| `--url <url>` | Change provider endpoint URL for the chosen role |

The `new` role (used by `nightdesk new`) is global; edit `[llm.new]` in
`~/.config/point53/nightdesk/research.toml` directly.

### `nightdesk test <name>`
Runs the project's scoring command, verifies the score parses correctly,
shows diagnostic suggestions on failure. Does **not** record an experiment —
this is a sanity check, not a commit.

### `nightdesk loop <name> [flags]`
| Flag | Effect |
|---|---|
| *(none)* | Foreground streaming until done or Ctrl+C |
| `--detach` | Run in background, log to `.nightdesk/loop.log` |
| `--outfile <path>` | Override log destination |
| `--stop` | Graceful halt (sentinel file) |
| `--max-iterations N` | Default 50 |
| `--stop-plateau N` | Default 5 — stop if last N experiments within 1% of best |

Auto-baselines on first run if no experiments exist yet.

### `nightdesk split <source> <new-name>`
Fork a project: copies files (skipping `.git`, `.venv`, `results.tsv`),
creates a fresh git repo, registers as a new project. Use for parallel
experiment tracks without losing the original.

## Concepts

**The Karpathy Loop:** Hypothesis → edit → commit → run → measure → keep
or revert → record → repeat. One variable at a time. No grid searches.

**Scaffolding:** AI generates everything needed for a loop from a plain
English description. Iterative: review, give feedback, regenerate until
right. Context is saved for later use.

**Registry:** SQLite database at `~/.nightdesk/registry.db` tracks all
projects and loop run history. It's an index — project directories are
always the source of truth.

**Knowledge:** Dead ends and insights accumulate in each project's
`knowledge.json`. The AI reads them before suggesting or generating
hypotheses. Never re-explores a dead end.

**Guardrails:** Bounds on metrics that prevent degenerate optimization.
Score explosion detection (>100x previous best) plus configurable min/max
metric limits. The autonomous loop auto-reverts on guardrail violations.

**Two-Pane Workflow:** Control plane (Claude Code or human, thinking side)
+ Execution plane (Nightdesk, bookkeeping side). For autonomous loops,
both merge into one process.

## Project Layout

```
your-project/
├── .nightdesk/
│   ├── config.json           # Run command, score field, LLM config
│   ├── knowledge.json        # Dead ends + insights
│   ├── scaffold_context.txt  # Saved scaffold context
│   ├── loop.log              # Background loop output (--detach only)
│   └── loop.stop             # Sentinel file (created by loop --stop)
├── .venv/                    # Auto-created for Python projects
├── program.md                # Research directions (AI-generated)
├── results.tsv               # Experiment log
├── requirements.txt          # Python deps (auto-installed)
├── <tunable files>           # The knobs — edited by loop or chat
└── .gitignore
```

## Using With Claude Code

Nightdesk and Claude Code complement each other:

- **Nightdesk** tracks experiments, enforces guardrails, manages projects,
  handles the bookkeeping.
- **Claude Code** edits files, reasons about hypotheses, runs commands,
  does the thinking.

In a split terminal, Claude Code can read project state and drive Nightdesk:

```bash
# In the Claude Code pane:
nightdesk list
nightdesk list --name my-project
nightdesk test my-project
# (reason about results, propose changes)
nightdesk edit my-project               # drop into AI chat for deep refinement
nightdesk loop my-project --detach      # hand off to autonomous mode
```

See CLAUDE.md for architecture details, the Ollama API reference, and full
integration documentation.

## License and Attribution

- **Nightdesk source code:** Mozilla Public License, v. 2.0 — see `LICENSE`.
- **Third-party runtime dependencies and external tools:** see `NOTICE` for a summary and `THIRD_PARTY_LICENSES.md` for full attribution.
- **Ollama models and Anthropic models:** not distributed with Nightdesk; each carries its own license from its respective publisher.
- **Trademarks:** "Point 53" and "Point 53 Nightdesk" are trademarks of Point 53 LLC. MPL-2.0 section 10.4 excludes trademark rights from the copyright/patent grant — nothing in the license authorizes use of these marks. The CLI command `nightdesk` is a functional identifier, not a trademark claim on the generic English word.
