Skip to content

Add batch size parameter to prevent context leaking of translator notes/hints #1733

@thomasaull

Description

@thomasaull

Problem

This is a follow up to #1728, since hints currently are not included in the request to a model at all (not when using the lingo.dev platform I think)

Given the following example:

{
  // Short form for "Year", used to display a date format, example in `en`: "DD-MM-YYYY". Don’t translate for CJK (Chinese, Japan, Korean) languages
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  // Short form for "Month", used to display a date format, example in `en`: "DD-MM-YYYY"
  "DATE_FORMAT_SPECIFIER_MONTH": "M"
}

in my tests, using openai/gpt-oss-120b (openAI compatible provider, see #1729) there is some context leaking going on, even when trying to use a prompt like:

Translate the provided text from source: {source} to target: {target}. For each translation key, use only the hint with identical key. If no hint exists for a key, do not reuse any other hints

and translate to korean, the result it get is:

{
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  "DATE_FORMAT_SPECIFIER_MONTH": "M"
}

even though only DATE_FORMAT_SPECIFIER_YEAR has the hint to not translate to CJK languages this get’s applied to DATE_FORMAT_SPECIFIER_MONTH aswell.

Looking at the reasoning it becomes clear, that the model is not able to follow the instruction to treat each hint for each translation completely individually. I tried many things, but could not find a reliable solution.

The way out is probably to translate each string one-by-one in individual requests to the LLM, preventing context leaking between translations completely. This of course has other trade-offs (increased API cost, slower, …) but it reliably translates to:

{
  "DATE_FORMAT_SPECIFIER_YEAR": "Y",
  "DATE_FORMAT_SPECIFIER_MONTH": "월"
}

Solution

Add a batch size parameter to the CLI to control how many translations strings are send to a LLM at once. A batch size of 1 would send each translation individually.

Visuals

Image

Workarounds

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions