Fix/improve token predictor test mocking #20415
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR improves the token counting mock in the
test_token_predictortest by replacing the TODO comment with a proper mock implementation. The test now uses a predictable mock tokenizer function instead of relying on the real tokenizer, making the test more reliable and consistent.Motivation
The previous test had a TODO comment indicating that token counting should be mocked more carefully. The test was relying on the real tokenizer, which could lead to inconsistent test results and dependencies on external tokenizer behavior. This change improves test reliability by providing a deterministic mock tokenizer.
Changes
mock_tokenizer_fnfunction that splits text by spaces and newlines for predictable token countingget_tokenizerusing@patchdecorator to return the mock functionFixes # (issue) - N/A (test improvement, no issue filed)
New Package?
Did I fill in the
tool.llamahubsection in thepyproject.tomland provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.tomlfile of the package I am updating? (Except for thellama-index-corepackage)llama-index-core, no version bump needed)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Your pull-request will likely not be merged unless it is covered by some form of impactful unit testing.
Note: This change improves an existing test (
test_token_predictor) by adding proper mocking. The test itself validates that the token predictor works correctly with the mocked tokenizer. The improvement ensures:Suggested Checklist:
uv run make format; uv run make lintto appease the lint gods (Should be run before final submission)Additional Notes
mock_tokenizer_fn) splits text by spaces and newlines, providing a simple but effective tokenization strategy for testingget_tokenizerlevel, ensuring all code paths that use tokenization will use the mock