Skip to content

minano430/web_crawler_ptt

Repository files navigation

Web crawler Pratice Program

A pratice for web crawler and gui application using python.

Facing difficult due to lack of performance.

Using Multi-threading to downloading picture(decrease I/O time).

Next update

  • program flow
  • MultiThread vs MultiProcessing
  • Program explain MultiThread vs MultiProcessing

Program flow chart

create by Hackmd

io=>inputoutput: 輸入日期和按讚數

cond2=>condition: 發文案讚數 > Like count?
cond1=>condition: 確認發文日期
op1=>operation: 收集發文
op2=>operation: 建立 QThread下載發文圖片

io->cond1(yes)->op1->cond2
cond2(yes)->op2
cond2(no)->cond1
cond1(no)->cond1

image

Task MultiThreading MultiProcessing Original
IO Bounding Fast Fast(slower than thread) slow
CPU Bounding slow Fast slow

Testing for IO Bounding task :

Testing for CPU Bounding task :

Explanations :

We have four task, each task is to looping numeric addition.

In our pool we have four workers for process and thread and in the result you can see that the cost time of thread is merely four times of process.

image

There is a restriction of threading in python due to the GIL(global interpreter Lock),when cpu is processing the program multithread will switch to another thread frequently.

But when it comes to multiprocessing there were no GIL restriction so the four workers works parallelly.

Other Information

Python - if name == 'main' -> If you only wrote one python file it doesn't matter.

def test_main():
    print('I\'m cool!')
    
test_main()

image

-> But if you import python to other file


from cool import test_main 

print('other_program call : ')
test_main()

image

--> Because when you import other python module the python Interpreter will execute the import module(cool.py).

--> To correct this problem, use name .

name is Builtin variable and value of it is differ in every file.

image image

cool2.py import cool.py in cool.py the builtin variable name is cool but in cool2.py(execute file) the value is main.

So add the condition if name == 'main' to both file(cool.py, cool2.py) and you can fix the output result.

image image

About

A pratice for web crawler and gui application

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages