Updates for Locus Code

General Updates

I further optimized the spatial processing part, the calculation of weights is now even faster, here is the Github Commit Link.

Current times are (in my laptop for ohio): 18 sec weights calculation; 95 sec (1.6 min) timeseries extraction; <.05 sec AMS and PDS identification; 40 sec clustering; 2.5 sec plotting. With total of 2.5 minutes. That time comes down to just 2 minutes when processing trinity, and 1.5 minutes being spent on the timeseries extraction. So, this is disc bound, and we could probably speed it up with removing the disc IO into the netcdf files, but it’d require a lot of rewrite.

I’ve looked at the performance profiles and I don’t think we can further optimize it in python. Or at least will take serious effort. The code is still slow if we want to process thousands of basins (e.g. there are 2,413 HUC 8 basins, the number becomes very large if we go even smaller).

INTERACTIVE MAP: https://atreyagaurav.github.io/locus/index.html

I’ll run it for at least HUC01-HUC18 and put it here. Possibly HUC4s too.

Normalizing before Clusters

Changed Code

I changed this part of the code to normalize each rows. Here I removed the normal assumption of the data as the precipitation data is heavily skewed (not close to normal). And I just divided all the precipitation values of cells with the total basin weighted precipitation on that event (day).

#+RESULTS[63095cb5352fd288f7fbad30f64aa361a4521cdf]:

commit 4c141e264d3ce656772411fec74000456f3ceea4
Author: Gaurav Atreya <allmanpride@gmail.com>
Date:   Sun Apr 30 22:30:31 2023 -0400

    Normalize by total rain volume before clustering

diff --git a/images/05/ams_1dy.png b/images/05/ams_1dy.png
index e98d9f7..a3ebca6 100644
Binary files a/images/05/ams_1dy.png and b/images/05/ams_1dy.png differ
diff --git a/images/05/pds_1dy.png b/images/05/pds_1dy.png
index ba44651..ecf3ed2 100644
Binary files a/images/05/pds_1dy.png and b/images/05/pds_1dy.png differ
diff --git a/src/cluster.py b/src/cluster.py
index 8b5731c..f9267d4 100644
--- a/src/cluster.py
+++ b/src/cluster.py
@@ -7,7 +7,6 @@ from sklearn.decomposition import PCA
 from sklearn.cluster import KMeans
 from kneed import KneeLocator
 
-from src.livneh import LivnehData
 from src.huc import HUC
 import src.precip as precip
 
@@ -21,7 +20,8 @@ def storm_centers(df: pd.DataFrame):
 
 def dimensionality_reduction(df: pd.DataFrame):
     pca = PCA(n_components=20)
-    return pca.fit_transform(StandardScaler().fit_transform(df.to_numpy()))
+    df_norm = df.apply(lambda row: row / row.sum(), axis=1)
+    return pca.fit_transform(df_norm.to_numpy())
 
 
 def clustering(m: np.ndarray):

Observations

The plots of the clusters before that can be seen in Figure fig:clus-old and the one from new algorithm can be seen in Figure fig:clus-new.

The old one has distinct preference of higher magnitude of rain in certain clusters while lower in other, but new one doesn’t show that bias. So it’s not as overwhelmed by the magnitude of rain like the previous one.

Note: The new ones show a few more points which was my mistake originally as I didn’t include 2011 (I forgot python range is exclusive, i.e. I used range(1915,2011) instead of range(1915,2012)).

Clustering

huc	5	1203	2070002
	ohio-region	trinity	north-branch-potomac
ams_1day_avg_silhouette	0.12818	0.22129	0.16736
ams_1day_cluster_counts	32 21 21 16 8	45 23 21 9	38 26 22 12
ams_1day_conversed	True	True	True
ams_1day_lta_silhouette	1	2	3
ams_1day_neg_silhouette	0	0	0
ams_1day_num_cluster	5	4	4
pds_1day_avg_silhouette	0.14791	0.15337	0.14963
pds_1day_cluster_counts	120 117 99 96	192 140 102 68	167 100 91 74 65
pds_1day_conversed	True	True	True
pds_1day_lta_silhouette	2	3	3
pds_1day_neg_silhouette	0	0	0
pds_1day_num_cluster	4	4	5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes.org

notes.org

Updates for Locus Code

General Updates

Normalizing before Clusters

Changed Code

Observations

Clustering

Files

notes.org

Latest commit

History

notes.org

File metadata and controls

Updates for Locus Code

General Updates

Normalizing before Clusters

Changed Code

Observations

Clustering