multireward-grpo

0.1.0

Decoupled & conditioned multi-reward GRPO advantage estimators, a generalized trainer, and the Theorem-3 verification harness from the paper 'When and Why Decoupling and Conditioning Beat Reweighting in Multi-Reward GRPO'.

License Sources Match

MIT confirmed by 2 independent sources — Python registry metadata and the LICENSE file in the package source — as of June 23, 2026.

SourceLicenseClass
Licensie (detected)
MITPermissive
PyPI (reported)
MITPermissive
Loading dependencies…
License File
Added Removed Expected
Versions
1 version
VersionLicensePublishedStatus
0.1.0 Latest ViewingMITJun 23, 2026 Scanned