Files in direct/ named _*.txt (e.g. _baseline.txt) are conceptless
neutral prose — they should not appear as positive training signal,
but are useful as shared negatives across every concept.
Previously _*.txt files were silently skipped. Now:
* they're loaded like any other description file;
* concepts (the positive label set) filters them out;
* their descriptions are concatenated into neg_pool_extra and
extended onto every concept's neg_pool alongside the cross-concept
negatives.
A concept's negative pool is thus "other concepts' descriptions +
everything from _*.txt files". The extra pool is announced at startup
so the user can see how many neutral samples are active.
Co-Authored-By: Proof of Concept <poc@bcachefs.org>
|
||
|---|---|---|
| .. | ||
| amygdala_stories | ||
| amygdala_training | ||
| apollo_plugin | ||
| research | ||
| DESIGN.md | ||
| pyproject.toml | ||