Skip to content

Corpus form: updating corpus title orphans index #1824

@lukavdplas

Description

@lukavdplas

When you set the title for a corpus in the corpus form, the name and es_index properties are derived from the title. So when user updates the corpus title, the corpus will expect a different index name.

That means the corpus must be re-indexed after a name change. This is cumbersome, but not too bad - some changes to corpora require re-indexing, after all.

However, as it is, the corpus definition will just change the index name in the database and "forget" the old one. The old index does not get deleted, but is also not linked to any corpus, so it would never get cleaned up.

Possible solutions:

  • Store the current index name and the "preferred" index name separately.
  • For user-created corpora, generate the index name from the corpus ID instead of the title.

I prefer the second option since it does not increase the complexity of the code. Users don't access elasticsearch directly anyway, so there is little benefit to a more "readable" index name.

Related issues: #1689 (corpora and indices are linked implicitly) and #1749 (comment) (the current naming scheme requires unique names for private corpora).

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendchanges to the django backendbugsomething isn't working right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions