Regular expression error of penn_tokenize

```python3
from sacremoses import MosesTokenizer

print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
```

I got the following error. And I found changing `lang='en'` to `lang='zh'` doesn't solve the problem.

```
Traceback (most recent call last):
  File ".../scratches/test.py", line 3, in <module>
    print(MosesTokenizer(lang='en').penn_tokenize("-LRB- This is very nice -RRB-"))
  File ".../python3.9/site-packages/sacremoses/tokenize.py", line 423, in penn_tokenize
    text = regexp.sub(substitution, text)
AttributeError: 'str' object has no attribute 'sub'
```
I think the problem is here, since it is a `str`, not a compiled pattern

https://github.com/hplt-project/sacremoses/blob/65543c34baf589f30260488d882d0060abaa4087/sacremoses/tokenize.py#L93-L96


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regular expression error of penn_tokenize #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	INTRATOKEN_SLASHES = (
	r"([{alphanum}])\/([{alphanum}])".format(alphanum=IsAlnum),
	r"$1 \@\/\@ $2",
	)

Regular expression error of penn_tokenize #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions