Enhance DSPy Inference Robustness #63

google-labs-jules · 2026-01-17T17:35:10Z

Implemented robustness features for DSPy inference including retry logic with exponential backoff, fallback to a secondary model, handling of rate limits and timeouts, and configurable request timeouts. Added unit tests to verify these behaviors.

PR created automatically by Jules for task 10722760892984632640 started by @Miyamura80

…eout - Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`. - Update `common/global_config.yaml` with default values. - Refactor `utils/llm/dspy_inference.py` to: - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff. - Implement fallback to a secondary model if the primary fails. - Pass `timeout` to the LLM provider. - Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.

google-labs-jules · 2026-01-17T17:35:11Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

…eout - Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`. - Update `common/global_config.yaml` with default values. - Refactor `utils/llm/dspy_inference.py` to: - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff. - Implement fallback to a secondary model if the primary fails. - Pass `timeout` to the LLM provider. - Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.

Miyamura80 · 2026-01-17T19:57:14Z

@greptileai

…eout - Update `LlmConfig` and `DefaultLlm` in `common/config_models.py` to support `fallback_model` and `default_request_timeout`. - Update `common/global_config.yaml` with default values. - Refactor `utils/llm/dspy_inference.py` to: - Implement retry logic for `ServiceUnavailableError`, `RateLimitError`, and `Timeout` using `tenacity` with exponential backoff. - Implement fallback to a secondary model if the primary fails. - Pass `timeout` to the LLM provider. - Add `tests/utils/llm/test_dspy_inference_robustness.py` to verify retry and fallback mechanisms.

greptile-apps · 2026-01-17T20:03:44Z

Greptile Summary

This PR adds robustness features to DSPy inference: retry logic with exponential backoff for RateLimitError, Timeout, and ServiceUnavailableError, fallback model support, and configurable request timeouts.

Extended DefaultLlm config with fallback_model (optional) and default_request_timeout (300s default)
Modified DSPYInference to initialize fallback LM and pass timeout to dspy.LM instances
Implemented _run_inference helper with @retry decorator supporting 3 retry attempts with exponential backoff (1-5s)
Added fallback logic in run() method that switches to fallback model if primary exhausts retries
Fallback model also benefits from retry decorator since it goes through _run_inference
Added comprehensive unit tests for retry, timeout, and fallback scenarios

Critical Issue: Tests mock inference_module_async directly, bypassing the actual retry decorator on _run_inference. This means tests don't verify the exponential backoff, wait times, or actual tenacity retry behavior - they only verify the mocking works. Tests should mock _run_inference instead to test the real retry logic.

Missing: __init__.py files in tests/utils/ and tests/utils/llm/ directories per project test standards.

Confidence Score: 3/5

Safe to merge with test improvements needed - implementation is solid but tests don't validate actual retry behavior
Core implementation correctly adds retry and fallback logic with proper exception handling. However, tests bypass the retry decorator by mocking at wrong level, missing __init__.py files violate project standards, and tests don't verify exponential backoff timing. Implementation works but test coverage is superficial.
Pay close attention to tests/utils/llm/test_dspy_inference_robustness.py - tests need to mock _run_inference instead of inference_module_async to properly verify retry behavior

Important Files Changed

Filename	Overview
utils/llm/dspy_inference.py	Added retry logic for RateLimitError and Timeout, fallback model support, and request timeout configuration. Logic issue found with retry decorator placement.
tests/utils/llm/test_dspy_inference_robustness.py	New test file for robustness features. Missing init.py files in test directory structure and tests bypass the retry decorator.
common/config_models.py	Added fallback_model and default_request_timeout fields to DefaultLlm model. Clean implementation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant DSPYInference
    participant _run_inference
    participant RetryDecorator
    participant PrimaryLM
    participant FallbackLM

    Client->>DSPYInference: run(**kwargs)
    DSPYInference->>_run_inference: _run_inference(lm=primary, **kwargs)
    _run_inference->>RetryDecorator: Enter retry logic
    
    alt Primary Success
        RetryDecorator->>PrimaryLM: inference_module_async()
        PrimaryLM-->>RetryDecorator: Success
        RetryDecorator-->>_run_inference: Result
        _run_inference-->>DSPYInference: Result
        DSPYInference-->>Client: Result
    else Primary Fails (RateLimitError/Timeout/ServiceUnavailable)
        loop Retry max_attempts (3) times
            RetryDecorator->>PrimaryLM: inference_module_async()
            PrimaryLM-->>RetryDecorator: Error
            RetryDecorator->>RetryDecorator: Wait exponentially (1s, 2s, 4s...)
        end
        RetryDecorator-->>_run_inference: Exception after retries
        _run_inference-->>DSPYInference: Exception
        
        alt Fallback Model Available
            DSPYInference->>DSPYInference: log.warning("Switching to fallback")
            DSPYInference->>_run_inference: _run_inference(lm=fallback, **kwargs)
            _run_inference->>RetryDecorator: Enter retry logic
            
            alt Fallback Success
                RetryDecorator->>FallbackLM: inference_module_async()
                FallbackLM-->>RetryDecorator: Success
                RetryDecorator-->>_run_inference: Result
                _run_inference-->>DSPYInference: Result
                DSPYInference-->>Client: Result
            else Fallback Also Fails
                loop Retry max_attempts (3) times
                    RetryDecorator->>FallbackLM: inference_module_async()
                    FallbackLM-->>RetryDecorator: Error
                    RetryDecorator->>RetryDecorator: Wait exponentially
                end
                RetryDecorator-->>_run_inference: Exception
                _run_inference-->>DSPYInference: Exception
                DSPYInference->>DSPYInference: log.error("Fallback failed")
                DSPYInference-->>Client: Raise Exception
            end
        else No Fallback Model
            DSPYInference->>DSPYInference: log.error("Error in run")
            DSPYInference-->>Client: Raise Exception
        end
    end

greptile-apps

_{7 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-17T20:03:42Z