Skip to content

Conversation

@google-labs-jules
Copy link
Contributor

Implemented robustness features for DSPy inference including retry logic with exponential backoff, fallback to a secondary model, handling of rate limits and timeouts, and configurable request timeouts. Added unit tests to verify these behaviors.


PR created automatically by Jules for task 10722760892984632640 started by @Miyamura80

…eout

- Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`.
- Update `common/global_config.yaml` with default values.
- Refactor `utils/llm/dspy_inference.py` to:
    - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff.
    - Implement fallback to a secondary model if the primary fails.
    - Pass `timeout` to the LLM provider.
- Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.
@google-labs-jules
Copy link
Contributor Author

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

…eout

- Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`.
- Update `common/global_config.yaml` with default values.
- Refactor `utils/llm/dspy_inference.py` to:
    - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff.
    - Implement fallback to a secondary model if the primary fails.
    - Pass `timeout` to the LLM provider.
- Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.
@Miyamura80 Miyamura80 marked this pull request as ready for review January 17, 2026 19:57
@Miyamura80
Copy link
Owner

@greptileai

…eout

- Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`.
- Update `common/global_config.yaml` with default values.
- Refactor `utils/llm/dspy_inference.py` to:
    - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff.
    - Implement fallback to a secondary model if the primary fails.
    - Pass `timeout` to the LLM provider.
- Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.
@greptile-apps
Copy link

greptile-apps bot commented Jan 17, 2026

Greptile Summary

This PR adds robustness features to DSPy inference: retry logic with exponential backoff for RateLimitError, Timeout, and ServiceUnavailableError, fallback model support, and configurable request timeouts.

  • Extended DefaultLlm config with fallback_model (optional) and default_request_timeout (300s default)
  • Modified DSPYInference to initialize fallback LM and pass timeout to dspy.LM instances
  • Implemented _run_inference helper with @retry decorator supporting 3 retry attempts with exponential backoff (1-5s)
  • Added fallback logic in run() method that switches to fallback model if primary exhausts retries
  • Fallback model also benefits from retry decorator since it goes through _run_inference
  • Added comprehensive unit tests for retry, timeout, and fallback scenarios

Critical Issue: Tests mock inference_module_async directly, bypassing the actual retry decorator on _run_inference. This means tests don't verify the exponential backoff, wait times, or actual tenacity retry behavior - they only verify the mocking works. Tests should mock _run_inference instead to test the real retry logic.

Missing: __init__.py files in tests/utils/ and tests/utils/llm/ directories per project test standards.

Confidence Score: 3/5

  • Safe to merge with test improvements needed - implementation is solid but tests don't validate actual retry behavior
  • Core implementation correctly adds retry and fallback logic with proper exception handling. However, tests bypass the retry decorator by mocking at wrong level, missing __init__.py files violate project standards, and tests don't verify exponential backoff timing. Implementation works but test coverage is superficial.
  • Pay close attention to tests/utils/llm/test_dspy_inference_robustness.py - tests need to mock _run_inference instead of inference_module_async to properly verify retry behavior

Important Files Changed

Filename Overview
utils/llm/dspy_inference.py Added retry logic for RateLimitError and Timeout, fallback model support, and request timeout configuration. Logic issue found with retry decorator placement.
tests/utils/llm/test_dspy_inference_robustness.py New test file for robustness features. Missing init.py files in test directory structure and tests bypass the retry decorator.
common/config_models.py Added fallback_model and default_request_timeout fields to DefaultLlm model. Clean implementation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant DSPYInference
    participant _run_inference
    participant RetryDecorator
    participant PrimaryLM
    participant FallbackLM

    Client->>DSPYInference: run(**kwargs)
    DSPYInference->>_run_inference: _run_inference(lm=primary, **kwargs)
    _run_inference->>RetryDecorator: Enter retry logic
    
    alt Primary Success
        RetryDecorator->>PrimaryLM: inference_module_async()
        PrimaryLM-->>RetryDecorator: Success
        RetryDecorator-->>_run_inference: Result
        _run_inference-->>DSPYInference: Result
        DSPYInference-->>Client: Result
    else Primary Fails (RateLimitError/Timeout/ServiceUnavailable)
        loop Retry max_attempts (3) times
            RetryDecorator->>PrimaryLM: inference_module_async()
            PrimaryLM-->>RetryDecorator: Error
            RetryDecorator->>RetryDecorator: Wait exponentially (1s, 2s, 4s...)
        end
        RetryDecorator-->>_run_inference: Exception after retries
        _run_inference-->>DSPYInference: Exception
        
        alt Fallback Model Available
            DSPYInference->>DSPYInference: log.warning("Switching to fallback")
            DSPYInference->>_run_inference: _run_inference(lm=fallback, **kwargs)
            _run_inference->>RetryDecorator: Enter retry logic
            
            alt Fallback Success
                RetryDecorator->>FallbackLM: inference_module_async()
                FallbackLM-->>RetryDecorator: Success
                RetryDecorator-->>_run_inference: Result
                _run_inference-->>DSPYInference: Result
                DSPYInference-->>Client: Result
            else Fallback Also Fails
                loop Retry max_attempts (3) times
                    RetryDecorator->>FallbackLM: inference_module_async()
                    FallbackLM-->>RetryDecorator: Error
                    RetryDecorator->>RetryDecorator: Wait exponentially
                end
                RetryDecorator-->>_run_inference: Exception
                _run_inference-->>DSPYInference: Exception
                DSPYInference->>DSPYInference: log.error("Fallback failed")
                DSPYInference-->>Client: Raise Exception
            end
        else No Fallback Model
            DSPYInference->>DSPYInference: log.error("Error in run")
            DSPYInference-->>Client: Raise Exception
        end
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +1 to +7
import asyncio
from unittest.mock import AsyncMock, patch
from tests.test_template import TestTemplate
from utils.llm.dspy_inference import DSPYInference
from litellm.exceptions import RateLimitError, ServiceUnavailableError, Timeout
import dspy

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Missing __init__.py files in tests/utils/ and tests/utils/llm/ directories. Per .cursor/rules/tests.mdc, new test directories require __init__.py files so Python can recognize them.

Create:

  • tests/utils/__init__.py
  • tests/utils/llm/__init__.py

Context Used: Context from dashboard - On how to write tests (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/utils/llm/test_dspy_inference_robustness.py
Line: 1:7

Comment:
**syntax:** Missing `__init__.py` files in `tests/utils/` and `tests/utils/llm/` directories. Per `.cursor/rules/tests.mdc`, new test directories require `__init__.py` files so Python can recognize them.

Create:
- `tests/utils/__init__.py` 
- `tests/utils/llm/__init__.py`

**Context Used:** Context from `dashboard` - On how to write tests ([source](https://app.greptile.com/review/custom-context?memory=d6a5ba7a-9b07-47ff-8835-a1b5ee599b6f))

How can I resolve this? If you propose a fix, please make it concise.

error,
dspy.Prediction(output="Success")
])
inference.inference_module_async = mock_method
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Test bypasses the actual retry logic by mocking inference_module_async directly. The @retry decorator on _run_inference is never executed since you're replacing the method it calls.

Mock _run_inference instead to test the actual retry behavior:

mock_method = AsyncMock(side_effect=[
    error,
    dspy.Prediction(output="Success")
])
inference._run_inference = mock_method

Context Used: Context from dashboard - On how to write tests (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/utils/llm/test_dspy_inference_robustness.py
Line: 33:33

Comment:
**logic:** Test bypasses the actual retry logic by mocking `inference_module_async` directly. The `@retry` decorator on `_run_inference` is never executed since you're replacing the method it calls.

Mock `_run_inference` instead to test the actual retry behavior:
```
mock_method = AsyncMock(side_effect=[
    error,
    dspy.Prediction(output="Success")
])
inference._run_inference = mock_method
```

**Context Used:** Context from `dashboard` - On how to write tests ([source](https://app.greptile.com/review/custom-context?memory=d6a5ba7a-9b07-47ff-8835-a1b5ee599b6f))

How can I resolve this? If you propose a fix, please make it concise.

error,
dspy.Prediction(output="Success")
])
inference.inference_module_async = mock_method
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Same issue - test bypasses retry logic. Mock _run_inference instead of inference_module_async to test actual retry behavior with exponential backoff.

Context Used: Context from dashboard - On how to write tests (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/utils/llm/test_dspy_inference_robustness.py
Line: 59:59

Comment:
**logic:** Same issue - test bypasses retry logic. Mock `_run_inference` instead of `inference_module_async` to test actual retry behavior with exponential backoff.

**Context Used:** Context from `dashboard` - On how to write tests ([source](https://app.greptile.com/review/custom-context?memory=d6a5ba7a-9b07-47ff-8835-a1b5ee599b6f))

How can I resolve this? If you propose a fix, please make it concise.

else:
raise ValueError(f"Unknown model: {lm.model}")

inference.inference_module_async = AsyncMock(side_effect=side_effect)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Mocking inference_module_async bypasses both retry and fallback logic. Mock _run_inference to properly test the retry-then-fallback flow.

Context Used: Context from dashboard - On how to write tests (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/utils/llm/test_dspy_inference_robustness.py
Line: 92:92

Comment:
**logic:** Mocking `inference_module_async` bypasses both retry and fallback logic. Mock `_run_inference` to properly test the retry-then-fallback flow.

**Context Used:** Context from `dashboard` - On how to write tests ([source](https://app.greptile.com/review/custom-context?memory=d6a5ba7a-9b07-47ff-8835-a1b5ee599b6f))

How can I resolve this? If you propose a fix, please make it concise.

else:
raise ValueError(f"Unknown model: {lm.model}")

inference.inference_module_async = AsyncMock(side_effect=side_effect)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Same mocking issue. Test should mock _run_inference to verify the actual retry and fallback behavior.

Context Used: Context from dashboard - On how to write tests (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/utils/llm/test_dspy_inference_robustness.py
Line: 129:129

Comment:
**logic:** Same mocking issue. Test should mock `_run_inference` to verify the actual retry and fallback behavior.

**Context Used:** Context from `dashboard` - On how to write tests ([source](https://app.greptile.com/review/custom-context?memory=d6a5ba7a-9b07-47ff-8835-a1b5ee599b6f))

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants