Skip to content

Possible tokenization error: bpe_get_score_for_pair sometimes picks wrong merge #10

@ianand

Description

@ianand

the current formula for the named function bpe_get_score_for_pair is case insensitive and may pick the wrong merge during the tokenization process. This happens because the FILTER() formula in Excel is case insensitive. The implementation should be changed to use EXACT() with FILTER() to enforce case sensitivity.

That being said, I'm reconsidering the use of FILTER() because (a) it is not available in older versions of Excel and (b) it can be less performant and slow the sheet down (filter has to search the entire range for ALL matches and in most cases that it's used in the sheet we only need a single match). It does however make the formulas more readable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions