You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training an embedding model on a KG, I am getting the following error stack:
Reading with pandas.read_csv with sep ** s+ ** ...
Traceback (most recent call last):
File "/scratch/hpc-prf-dsg/sshivam/.conda/envs/dice/bin/dicee", line 33, in<module>
sys.exit(load_entry_point('dicee', 'console_scripts', 'dicee')())
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/scripts/run.py", line 137, in main
Execute(get_default_arguments()).start()
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/executer.py", line 218, in start
self.load_indexed_data() if self.is_continual_training elseself.read_preprocess_index_serialize_data()
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/executer.py", line 88, in read_preprocess_index_serialize_data
self.knowledge_graph = self.read_or_load_kg()
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/executer.py", line 53, in read_or_load_kg
kg = KG(dataset_dir=self.args.dataset_dir,
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/knowledge_graph.py", line 74, in __init__
ReadFromDisk(kg=self).start()
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/read_preprocess_save_load_kg/read_from_disk.py", line 28, in start
self.kg.raw_train_set = read_from_disk(self.kg.path_single_kg,
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/read_preprocess_save_load_kg/util.py", line 125, in read_from_disk
return read_with_pandas(data_path, read_only_few, sample_triples_ratio)
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/read_preprocess_save_load_kg/util.py", line 31, in timeit_wrapper
result = func(*args, **kwargs)
File "/scratch/hpc-prf-dsg/WHALE-output/dice-embeddings/dicee/read_preprocess_save_load_kg/util.py", line 83, in read_with_pandas
df = pd.read_csv(data_path,
File "/scratch/hpc-prf-dsg/sshivam/.conda/envs/dice/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
return _read(filepath_or_buffer, kwds)
File "/scratch/hpc-prf-dsg/sshivam/.conda/envs/dice/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 626, in _read
return parser.read(nrows)
File "/scratch/hpc-prf-dsg/sshivam/.conda/envs/dice/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, inread
) = self._engine.read( # type: ignore[attr-defined]
File "/scratch/hpc-prf-dsg/sshivam/.conda/envs/dice/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, inread
chunks = self._reader.read_low_memory(nrows)
File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
Initially, I thought it was an issue with the input file, however, after adding engine='python' in pandas.read_csv method in dicee/read_preprocess_save_load_kg/util.py, the error no longer persists.
The text was updated successfully, but these errors were encountered:
When training an embedding model on a KG, I am getting the following error stack:
Initially, I thought it was an issue with the input file, however, after adding
engine='python'
inpandas.read_csv
method indicee/read_preprocess_save_load_kg/util.py
, the error no longer persists.The text was updated successfully, but these errors were encountered: