Replies: 1 comment 4 replies
-
Thanks @mrekh If the main reason is to reduce the size of the output file, then it should be easy to delete the unwanted columns right after you finish the crawl. import os
import pandas as pd
df = pd.read_json("output_file.jl", lines=True)
df[[col_1, col_3, col_10, col_14]].to_csv('output_file.csv', index=False) # or .parquet
os.remove("output_file.jl") Also:
Would that work? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
It be great, if it's possible to customize the default crawler columns for reducing the output JSON file size.
Beta Was this translation helpful? Give feedback.
All reactions