A pratice for web crawler and gui application using python.
Facing difficult due to lack of performance.
Using Multi-threading to downloading picture(decrease I/O time).
- program flow
- MultiThread vs MultiProcessing
- Program explain MultiThread vs MultiProcessing
create by Hackmd
io=>inputoutput: 輸入日期和按讚數
cond2=>condition: 發文案讚數 > Like count?
cond1=>condition: 確認發文日期
op1=>operation: 收集發文
op2=>operation: 建立 QThread下載發文圖片
io->cond1(yes)->op1->cond2
cond2(yes)->op2
cond2(no)->cond1
cond1(no)->cond1
MultiThread vs MultiProcessing (from https://medium.com/contentsquare-engineering-blog/multithreading-vs-multiprocessing-in-python-ece023ad55a)
Task | MultiThreading | MultiProcessing | Original |
---|---|---|---|
IO Bounding | Fast | Fast(slower than thread) | slow |
CPU Bounding | slow | Fast | slow |
Testing for IO Bounding task :
Testing for CPU Bounding task :
Explanations :We have four task, each task is to looping numeric addition.
In our pool we have four workers for process and thread and in the result you can see that the cost time of thread is merely four times of process.
But when it comes to multiprocessing there were no GIL restriction so the four workers works parallelly.
Python - if name == 'main' -> If you only wrote one python file it doesn't matter.
def test_main():
print('I\'m cool!')
test_main()
-> But if you import python to other file
from cool import test_main
print('other_program call : ')
test_main()
--> Because when you import other python module the python Interpreter will execute the import module(cool.py).
--> To correct this problem, use name .
name is Builtin variable and value of it is differ in every file.
cool2.py import cool.py in cool.py the builtin variable name is cool but in cool2.py(execute file) the value is main.
So add the condition if name == 'main' to both file(cool.py, cool2.py) and you can fix the output result.