You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I've been trying to select select data from a large corpus that includes 21M sentences using representative corpus with 100k sentences and met a "KeyError: '@@'" exception.
I ran the script with following parameters:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 47, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "./cynical-selection.py", line 695, in main_loop
unadapted_squish = squish_corpus(unadapted_data, replace)
File "./cynical-selection.py", line 297, in squish_corpus
squished.append(' '.join([replace[token] for token in line.split()]))
File "./cynical-selection.py", line 297, in
squished.append(' '.join([replace[token] for token in line.split()]))
KeyError: '@@'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./cynical-selection.py", line 819, in
main()
File "./cynical-selection.py", line 798, in main
selected = threading_wrapper(task_data, unadapted_data, args)
File "./cynical-selection.py", line 775, in threading_wrapper
zip(repeat(task_data), parts_list, repeat(args)))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 296, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 670, in get
raise self._value
KeyError: '@@'
I also tried to run the program with a 10M general corpus but it didn't resolve the issue. It executes perfectly well on corpus with 2M sentences or less.
Hello! I've been trying to select select data from a large corpus that includes 21M sentences using representative corpus with 100k sentences and met a "KeyError: '@@'" exception.
I ran the script with following parameters:
./cynical-selection.py --task ../ds/100k.google --unadapted ../ds/10M.os.en --no-lower --batch
Full text of the exception:
I also tried to run the program with a 10M general corpus but it didn't resolve the issue. It executes perfectly well on corpus with 2M sentences or less.
10M.os.en-100k.google-20190226_2152.log.zip
The text was updated successfully, but these errors were encountered: