-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 normalizar columnas nuevas #3
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit includes new test cases for the function `_normalizar_numero_clasificacion_dewey` in the `test_limpiar_tablas.py` file. The new tests cover a range of inputs, including standard, edge, and complex cases to ensure the function can accurately normalize Dewey numbers in various formats.
The method `_normalizar_numero_clasificacion_dewey` was updated to correctly handle raw Dewey numbers that contain a semicolon. It now splits the input string at the semicolon and uses the second part for normalization, if a semicolon is present. Additionally, the argument name was changed from `dewey_number` to `raw_dewey_number` to reflect the fact that the input may not be a normalized Dewey number.
…not handled correctly, and replaced variable name `anos_encontrados` with `años_encontrados` for better readability.
This commit adds dewey number and period normalization function to the data cleaning script. It also checks for the availability of these columns in the data and applies the corresponding normalization function if available.
…n clean tables script
…tion This commit includes two main improvements to the `limpiar_tablas` script related to period normalization: 1. For periods that include multiple years (e.g., "1800-1900"), the script now correctly identifies the most recent year and converts it to a century in Roman numerals. 2. When periods include multiple centuries in Roman numerals (e.g., "XVIII-XXI"), the script now correctly identifies the most recent century. This is achieved by introducing a new helper function `valor_siglo_romano` that converts a Roman numeral century to its numeric value, which is then used to find the max value in the list of centuries. This enhancement should significantly improve the accuracy of period normalization in the data cleaning process.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Normalización de datos bibliográficos y ampliación de funcionalidades de limpieza
¿Qué?
Se implementó la normalización de dos nuevos campos bibliográficos y se mejoró la normalización de campos existentes:
¿Cómo?
Se utilizó un enfoque TDD (Desarrollo Dirigido por Pruebas) para implementar las nuevas funcionalidades:
Número Dewey:
Periodo cronológico:
Mejoras en normalizaciones existentes:
¿Por qué?
Estandarización de datos:
Mejora en la calidad de datos:
Mantenibilidad:
Notas adicionales