You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the SCASL demo to check if my installations are fine. However, I am facing an error with the normalization and imputation step.
(scasl) lab@server2:/media/SCASL_splicing/SCASL-main$ python main.py -y configs/srr_star_demo.yaml
=============Preprocessing=============
Loading site names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 35.25it/s]
Reading and processing junction files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 30.40it/s]
=============Filtering=============
reading file...
done.
executing repeat and initial threshold filter...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17851/17851 [00:14<00:00, 1261.41it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17818/17818 [00:13<00:00, 1273.85it/s]
done
executing sites quality filter by threshold
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40297.000000
mean 7.669926
std 6.271053
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40138.000000
mean 7.703000
std 6.293347
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
done.
remove the duplicated site starts and ends...
done.
executing sample quality filter...
the sample histogram is saved at process_result/20241108114532/img/sample_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40.000000
mean 14823.125000
std 17859.287926
min 1407.000000
0% 1407.000000
10% 3657.600000
20% 4658.000000
30% 5990.200000
40% 8024.400000
50% 9009.000000
60% 12487.600000
70% 13943.100000
80% 17898.800000
90% 22226.700000
max 73079.000000
dtype: float64
done.
saving...
done.
=============Normalization & Imputation=============
reading data from process_result/20241108114532/filtered_matrix...
Traceback (most recent call last):
File "/media/SCASL_splicing/SCASL-main/main.py", line 10, in
scasl.fit()
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 95, in fit
run_cluster(self.cfg)
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 66, in run_cluster
df_final, mat = normalize(filter_path, cfg.impute.num_iteration, cfg.impute.knn)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 92, in normalize
dfs = norm_only(df_path, 'start')
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 40, in norm_only
df_prob = to_prob(df, groupby=groupby)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 22, in to_prob
sums = sums.drop(columns=['start', 'end'])
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop
return super().drop(
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop
raise KeyError(f"{labels[mask].tolist()} not found in axis")
KeyError: "['end'] not found in axis"
Any help would be useful.
Thanks.
The text was updated successfully, but these errors were encountered:
Hi there,
I am running the SCASL demo to check if my installations are fine. However, I am facing an error with the normalization and imputation step.
(scasl) lab@server2:/media/SCASL_splicing/SCASL-main$ python main.py -y configs/srr_star_demo.yaml
=============Preprocessing=============
Loading site names: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 35.25it/s]
Reading and processing junction files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 37/37 [00:01<00:00, 30.40it/s]
=============Filtering=============
reading file...
done.
executing repeat and initial threshold filter...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17851/17851 [00:14<00:00, 1261.41it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17818/17818 [00:13<00:00, 1273.85it/s]
done
executing sites quality filter by threshold
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40297.000000
mean 7.669926
std 6.271053
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
the site histogram is saved at process_result/20241108114532/img/site_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40138.000000
mean 7.703000
std 6.293347
min 3.000000
0% 3.000000
10% 4.000000
20% 4.000000
30% 4.000000
40% 4.000000
50% 5.000000
60% 6.000000
70% 8.000000
80% 11.000000
90% 17.000000
max 40.000000
dtype: float64
done.
remove the duplicated site starts and ends...
done.
executing sample quality filter...
the sample histogram is saved at process_result/20241108114532/img/sample_hist.png
the descriptions of the non-NaN data of sites are shown below
count 40.000000
mean 14823.125000
std 17859.287926
min 1407.000000
0% 1407.000000
10% 3657.600000
20% 4658.000000
30% 5990.200000
40% 8024.400000
50% 9009.000000
60% 12487.600000
70% 13943.100000
80% 17898.800000
90% 22226.700000
max 73079.000000
dtype: float64
done.
saving...
done.
=============Normalization & Imputation=============
reading data from process_result/20241108114532/filtered_matrix...
Traceback (most recent call last):
File "/media/SCASL_splicing/SCASL-main/main.py", line 10, in
scasl.fit()
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 95, in fit
run_cluster(self.cfg)
File "/media/SCASL_splicing/SCASL-main/scasl/splice.py", line 66, in run_cluster
df_final, mat = normalize(filter_path, cfg.impute.num_iteration, cfg.impute.knn)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 92, in normalize
dfs = norm_only(df_path, 'start')
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 40, in norm_only
df_prob = to_prob(df, groupby=groupby)
File "/media/SCASL_splicing/SCASL-main/scasl/normalize.py", line 22, in to_prob
sums = sums.drop(columns=['start', 'end'])
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/frame.py", line 5581, in drop
return super().drop(
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4788, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/generic.py", line 4830, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/home/lab/miniconda3/envs/scasl/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 7070, in drop
raise KeyError(f"{labels[mask].tolist()} not found in axis")
KeyError: "['end'] not found in axis"
Any help would be useful.
Thanks.
The text was updated successfully, but these errors were encountered: