-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
from sacremoses import MosesTokenizer
print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
I got the following error. And I found changing lang='en'
to lang='zh'
doesn't solve the problem.
Traceback (most recent call last):
File ".../scratches/test.py", line 3, in <module>
print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
File ".../python3.9/site-packages/sacremoses/tokenize.py", line 423, in penn_tokenize
text = regexp.sub(substitution, text)
AttributeError: 'str' object has no attribute 'sub'
I think the problem is here, since it is a str
, not a compiled pattern
sacremoses/sacremoses/tokenize.py
Lines 93 to 96 in 65543c3
INTRATOKEN_SLASHES = ( | |
r"([{alphanum}])\/([{alphanum}])".format(alphanum=IsAlnum), | |
r"$1 \@\/\@ $2", | |
) |
Metadata
Metadata
Assignees
Labels
No labels