You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run through the process with the data subset (~6000 points)
Load the model in Deeplens
Prep ALL the data (.lst, and upload to S3)
Make SageMaker model with full data set
Deploy model to Deeplens
Trees
Identify the species for some for some life samples
Test Deeplens+Model on real leaves
plant photo data set
idea 1
idea 2
<50>
<50>
initial
20GB
44M
unzipped
22GB
350M
images
included
download
data point
jpg + xml
a line in CSV (but need to download the corresponding image)
number of data points
0.25 mil
1.1+ mil
image size
~100K
~500K
effort needed
need code to parse the data points to produce .lst file and organize corresponding images
need code to take a subset of points, download and organize corresponding images and produce a .lst file
subset size
200
# <Image># <FileName>248738.jpg</FileName># <Species>Justicia nyassana Lindau</Species># <Origin>eol</Origin># <Author>mark hyde, bart wursten and petra ballings, bart wursten, flora of zimbabwe</Author># <Content></Content># <Genus>Justicia</Genus># <Family>Acanthaceae</Family># <ObservationId>174381</ObservationId># <MediaId>248738</MediaId># <YearInCLEF>PlantCLEF2017</YearInCLEF># <LearnTag>Train</LearnTag># <ClassId>2365</ClassId># </Image># https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html# A .lst file is a tab-separated file with three columns that contains# a list of image files. The first column specifies the image index,# the second column specifies the class label index for the image, and# the third column specifies the relative path of the image file. The# image index in the first column must be unique across all of the# images.frombs4importBeautifulSoupimportosimportos.pathlst_filename="all.lst"lst_contents= []
index=1defparseme(f):
r=open(f).read()
x=BeautifulSoup(r, features="xml")
fn=x.Image.FileName.contents[0]
klass=x.Image.ClassId.contents[0]
return [fn, klass]
forroot, dirs, filesinos.walk("."):
forfinfiles:
extension=os.path.splitext(f)[1]
ifextension==".xml":
fn, klass=parseme(f)
lst_contents.append(str(str(index) +"\t"+klass+"\t"+"./"+fn+"\n"))
index+=1withopen(lst_filename, "w") asf:
forlineinlst_contents:
f.write(line)