Metadata-Version: 2.4
Name: p53-collector
Version: 0.1.0
Summary: Structured retrieval, summarization, and classification of data through RSS feeds and web pages
Author: Point 53 LLC
License-Expression: MPL-2.0
License-File: LICENSE
License-File: NOTICE
License-File: THIRD_PARTY_LICENSES.md
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.10
Requires-Dist: anthropic>=0.75.0
Requires-Dist: click>=8.3.1
Requires-Dist: dateparser>=1.2.2
Requires-Dist: feedparser>=6.0.12
Requires-Dist: httpx>=0.27
Requires-Dist: mcp>=1.0.0
Requires-Dist: ollama>=0.6.1
Requires-Dist: pdfplumber>=0.11
Requires-Dist: platformdirs>=4.0
Requires-Dist: protobuf>=6.33.2
Requires-Dist: pydantic>=2.11.0
Requires-Dist: selenium>=4.39.0
Requires-Dist: tomli>=2.0; python_version < '3.11'
Requires-Dist: webdriver-manager>=4.0.2
Description-Content-Type: text/markdown

# Collector

**Your sources. Your summaries. Your machine.**

Collector is a CLI tool that pulls articles from RSS feeds and web pages, summarizes them using local AI, stores everything in a local SQLite database, and outputs curated Markdown digests on demand. No cloud accounts required. No data leaves your network unless you choose to let it.

It is part of the Point 53 Suite — open source software built on the belief that the tools of intelligence work belong in the hands of individuals, not institutions.

> **License:** [Mozilla Public License 2.0](https://mozilla.org/MPL/2.0/) — see `LICENSE`, `NOTICE`, and `THIRD_PARTY_LICENSES.md` for full details.

## Why Collector

The information landscape is enormous and accelerating. Staying informed across dozens of sources takes real time — time most people don't have. Collector narrows the firehose into a structured, searchable, AI-summarized feed that you control end to end.

- **Local-first AI**: Summaries are generated on your hardware via [Ollama](https://ollama.com). Your reading habits, your interests, and the content itself never touch a third-party server by default.
- **Vendor flexibility**: Swap between Ollama models freely, or switch to Anthropic's API when it makes sense. The choice is always yours.
- **Structured output**: Every article is categorized, timestamped, and queryable. Filter by date, category, or keyword. Output to Markdown or pipe to stdout.
- **Chat your news**: After generating a digest, open an interactive chat session to ask questions about the articles using the same local models.
- **PDF-aware**: Collector can download, scan, and summarize linked PDFs directly from feeds like arXiv.

## Install

Collector uses [uv](https://docs.astral.sh/uv/) for dependency management and distribution.

### Linux / macOS

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install Collector as a global CLI tool
uv tool install git+https://github.com/p53systems/p53s_collector

# Or clone and run directly
git clone https://github.com/p53systems/p53s_collector.git
cd p53s_collector
uv run p53s-collector --help
```

### Windows

```powershell
# Install uv
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Install Collector as a global CLI tool
uv tool install git+https://github.com/p53systems/p53s_collector

# Or clone and run directly
git clone https://github.com/p53systems/p53s_collector.git
cd p53s_collector
uv run p53s-collector --help
```

### Prerequisites

- **Python 3.10+** (uv will manage this for you)
- **Ollama** running on your network with at least one model pulled (e.g. `ollama pull cogito:14b`)
- **Chrome or Firefox** installed (Selenium uses it to scrape article content)

## Quick Start

1. Edit `config.json` to point `ollama-location` at your Ollama instance and configure your feeds.

2. Fetch and summarize new articles:
   
   ```bash
   p53s-collector --update
   ```

3. Generate a Markdown digest of unread articles:
   
   ```bash
   p53s-collector --dstl
   ```

4. Search and filter:
   
   ```bash
   p53s-collector --dstl --category "Cyber Security" --string_search "zero day"
   ```

5. Chat about what you've read:
   
   ```bash
   p53s-collector --dstl --chat "What are the most critical vulnerabilities this week?"
   ```

## Configuration

All settings live in `config.json`. Key sections:

| Section            | Purpose                                                                                   |
| ------------------ | ----------------------------------------------------------------------------------------- |
| `webdriver`        | Browser choice (`chrome`, `firefox`, `undetected-chrome`), timeouts, certificate handling |
| `update.feeds`     | List of RSS and SEARCH sources with categories and scraping hints                         |
| `update.blocklist` | URLs to skip during updates                                                               |
| `generative-ai`    | Ollama server location, model selections for summarization / search / chat, context sizes |

## The Suite

Collector is one of six tools in the Point 53 Suite:

| Tool          | Purpose                                                              |
| ------------- | -------------------------------------------------------------------- |
| **Collector** | RSS/web aggregation, AI-summarized Markdown briefings                |
| **Courier**   | Headless/non-headless search agent with configurable depth and scope |
| **Intercept** | Desktop + mic audio capture, transcription, and structured notes     |
| **Handler**   | Orchestrator + REPL/TUI + optional web UI; aggregates MCP servers    |
| **Nightdesk** | Iterative overnight autoresearch engine with pluggable backends      |
| **Monitor**   | iOS/Android call screener (coming soon)                              |

All projects are written in Python and released under the Mozilla Public License 2.0.

## License and Attribution

Point 53 Collector is licensed under the [Mozilla Public License 2.0](https://mozilla.org/MPL/2.0/).

- `LICENSE` — full MPL-2.0 text
- `NOTICE` — copyright and trademark notice, third-party summary
- `THIRD_PARTY_LICENSES.md` — per-dependency attribution and obligations

"Point 53" and "Point 53 Collector" are trademarks of Point 53 LLC. The bare
word "collector" is a functional identifier, not a trademark claim.
