Things in this PR:
– Jamba modeling file, supporting Jamba variations
– Mamba cache management (also benefit other Mamba-based models)
– Send more request related properties to the forward pass, for model-specific implementations (including things needed for speculative decoding)
Jamba Modeling and Mamba Cache Management Updates
By
–
Leave a Reply