Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
costalferz committed Nov 4, 2021
0 parents commit 82875ce
Show file tree
Hide file tree
Showing 11 changed files with 7,820 additions and 0 deletions.
18 changes: 18 additions & 0 deletions .gitignore.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Created by https://www.toptal.com/developers/gitignore/api/jupyternotebooks
# Edit at https://www.toptal.com/developers/gitignore?templates=jupyternotebooks

### JupyterNotebooks ###
# gitignore template for Jupyter Notebooks
# website: http://jupyter.org/

.ipynb_checkpoints
*/.ipynb_checkpoints/*

# IPython
profile_default/
ipython_config.py

# Remove previous ipynb_checkpoints
# git rm -r .ipynb_checkpoints/

# End of https://www.toptal.com/developers/gitignore/api/jupyternotebooks
188 changes: 188 additions & 0 deletions .ipynb_checkpoints/Province-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c7d079e5",
"metadata": {},
"source": [
"# import pandas และอ่านข้อมูลไฟล์ทั้งหมดที่ต้องการ"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "566fde0d",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"Subdistrict = pd.read_excel('ข้อมูลตำบลหรือแขวง.xls')\n",
"Province = pd.read_excel('ข้อมูลจังหวัด.xls')\n",
"District = pd.read_excel('ข้อมูลอำเภอหรือเขต.xls')\n",
"ThepExcel = pd.read_excel('ThepExcel-Thailand-Tambon.xlsx',sheet_name=1)"
]
},
{
"cell_type": "markdown",
"id": "be712d63",
"metadata": {},
"source": [
"# เลือก Column , เรียงลำดับค่าที่ต้องการจะใช้ และ DROP ตัวที่มีค่าซ้ำกันทิ้ง"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f14aa43",
"metadata": {},
"outputs": [],
"source": [
"provinces_en = ThepExcel[['ProvinceThai','ProvinceEng']].sort_values(by=['ProvinceThai']).drop_duplicates(['ProvinceEng'],ignore_index=True)\n",
"ThepExcel2 = ThepExcel[['TambonID','TambonThaiShort','TambonEngShort']].sort_values(by=['TambonID']).drop_duplicates(['TambonID'],ignore_index=True)\n",
"Subdistrict = Subdistrict.sort_values(by=['PROVINCE_ID'])"
]
},
{
"cell_type": "markdown",
"id": "ab44cac2",
"metadata": {},
"source": [
"# เลือกเฉพาะ Column ที่ต้องการ และ จัดการเรื่องรหัสจังหวัด อำเภอและตำบลให้สามารถ Merge เข้ากันได้"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ced4198b",
"metadata": {},
"outputs": [],
"source": [
"Province['PROVINCE_CODE_select'] = Province['PROVINCE_CODE'].apply(lambda x: str(x)[:2])\n",
"District['DISTRICT_CODE_select'] = District['DISTRICT_CODE'].apply(lambda x: str(x)[:2])\n",
"District['DISTRICT_CODE_select_Sub'] = District['DISTRICT_CODE'].apply(lambda x: str(x)[:4])\n",
"Subdistrict['SUBDISTRICT_CODE_select'] = Subdistrict['SUBDISTRICT_CODE'].apply(lambda x: str(x)[:4])\n",
"Subdistrict['SUBDISTRICT_CODE_select6'] = Subdistrict['SUBDISTRICT_CODE'].apply(lambda x: int (str(x)[:6]))"
]
},
{
"cell_type": "markdown",
"id": "53a49c89",
"metadata": {},
"source": [
"# Merge ข้อมูลแต่ละส่วนเข้าด้วยกัน โดยใช้ ชื่อและรหัสจังหวัด อำเภอและตำบล "
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "20f217d2",
"metadata": {},
"outputs": [],
"source": [
"Province = Province.merge(provinces_en, left_on='PROVINCE_NAME',right_on='ProvinceThai')\n",
"df = Province.merge(District, left_on='PROVINCE_CODE_select',right_on='DISTRICT_CODE_select')\n",
"df1 = df.merge(Subdistrict, left_on='DISTRICT_CODE_select_Sub',right_on='SUBDISTRICT_CODE_select')\n",
"df2 = df1.merge(ThepExcel2, left_on='SUBDISTRICT_CODE_select6', right_on='TambonID' , how='left')"
]
},
{
"cell_type": "markdown",
"id": "ffa24f59",
"metadata": {},
"source": [
"# เรียงลำดับ Column โดยเลือกเฉพาะ Column ที่ต้องการ และเปลี่ยนชื่อ Column ให้เข้าใจง่ายขึ้น"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e33c0782",
"metadata": {},
"outputs": [],
"source": [
"df2 = df2[['PROVINCE_CODE_select','PROVINCE_NAME','ProvinceEng','DISTRICT_NAME','DISTRICT_NAME_ENG','SUBDISTRICT_NAME',\n",
"'TambonEngShort','TambonID']].rename(columns={'PROVINCE_CODE_select':'PROVINCE_CODE','ProvinceEng':'PROVINCE_NAME_ENG','TambonEngShort':'SUBDISTRICT_NAME_ENG'})"
]
},
{
"cell_type": "markdown",
"id": "082cca53",
"metadata": {},
"source": [
"# เช็คค่า null หรือหาว่า Column ไหนมีค่าว่างบ้าง"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "12cd25cc",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"PROVINCE_CODE 0\n",
"PROVINCE_NAME 0\n",
"PROVINCE_NAME_ENG 0\n",
"DISTRICT_NAME 0\n",
"DISTRICT_NAME_ENG 3\n",
"SUBDISTRICT_NAME 0\n",
"SUBDISTRICT_NAME_ENG 2\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.isnull().sum()"
]
},
{
"cell_type": "markdown",
"id": "a1883112",
"metadata": {},
"source": [
"# สร้างไฟล์ ชื่อ MasterProvince โดยใช้ Encoding UTF-8-SIG"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "75171c18",
"metadata": {},
"outputs": [],
"source": [
"df2\n",
"df2.to_csv('MasterProvince.csv',encoding='utf-8-sig',index=False)"
]
}
],
"metadata": {
"interpreter": {
"hash": "4eef419ba9ceb6eca7081c30889866f44cdc1b4757b2edefb98cfaa8dd738cfa"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 82875ce

Please sign in to comment.