You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The website crawler feature is a great feature and can be used to ingest webpages in the workspace. Just wondering, what is the best way to update the workspace when some of the webpages now have updated content after initial crawling is done.
I believe we would need to crawl the website again, would that result in duplicate documents in the workspace and vector database?
How to avoid the duplication and update the workspace with updated webpages?
Should we create a new workspace and crawl the website again? This doesn't seem scalable when the website content is being updated frequently.
What should be the best approach in this situation?