Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with articles where content extraction fails #205

Open
jemrobinson opened this issue May 20, 2019 · 1 comment
Open

Dealing with articles where content extraction fails #205

jemrobinson opened this issue May 20, 2019 · 1 comment
Labels
enhancement New feature or enhancement

Comments

@jemrobinson
Copy link
Member

For some articles, content extraction fails. We should decide what to do with these:

  1. keep them in the database but with some incorrect/invalid content
  2. skip them when writing to the database

We should also log how often and where this occurs.

@jemrobinson jemrobinson added the enhancement New feature or enhancement label May 20, 2019
@jemrobinson
Copy link
Member Author

Thoughts @martintoreilly ? I am leaning towards (2). Our current solution is (1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant