Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linkedin website changed and can not read basic data #36

Open
cyanide2019 opened this issue Jan 22, 2020 · 11 comments
Open

linkedin website changed and can not read basic data #36

cyanide2019 opened this issue Jan 22, 2020 · 11 comments
Labels
waiting for response waiting for issue owner response

Comments

@cyanide2019
Copy link

inished scraping url: https://www.linkedin.com/in/inmudassar-iqbal-a9a9159b/
scrapedin: 2020-01-22T07:42:56.489Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2020-01-22T07:42:56.490Z error: error on crawling profile: https://linkedin/in/mudassar-iqbal-a9a9159b/
Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2020-01-22T07:42:56.830Z info: starting scraping: https://linkedin/in/nadeem-aslam-057341102/
scrapedin: 2020-01-22T07:42:56.830Z info: [profile] starting scraping url: https://www.linkedin.com/in/innadeem-aslam-057341102/
scrapedin: 2020-01-22T07:42:58.070Z info: [profile] finished scraping url: https://www.linkedin.com/in/inamjad-khan-a03634b7/
scrapedin: 2020-01-22T07:42:58.070Z error: [cleanMessageData] LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2020-01-22T07:42:58.070Z error: error on crawling profile: https://linkedin/in/amjad-khan-a03634b7/
Error: LinkedIn website changed and scrapedin can't read basic data. Please report this issue at https://github.com/linkedtales/scrapedin/issues
2020-01-22T07:42:58.832Z info: starting scraping: https://linkedin/in/baraa-faisal-0529a5a3/
scrapedin: 2020-01-22T07:42:58.833Z info: [profile] starting scraping url: https://www.linkedin.com/in/inbaraa-faisal-0529a5a3/

@cyanide2019
Copy link
Author

020-01-23T02:23:36.378Z error: error on crawling profile: https://linkedin.com/in/ahmad-abdelqader-pmp-osha-iso-70493882/
Error: EACCES: permission denied, open './crawledProfiles/ahmad-abdelqader-pmp-osha-iso-70493882.json'
scrapedin: 2020-01-23T02:23:36.555Z info: [profile] finished scraping url: https://www.linkedin.com/in/ibrahim-saadeddine-1320b8100
2020-01-23T02:23:36.556Z error: error on crawling profile: https://linkedin.com/in/ibrahim-saadeddine-1320b8100/
Error: EACCES: permission denied, open './crawledProfiles/ibrahim-saadeddine-1320b8100.json'
2020-01-23T02:23:36.959Z info: starting scraping: https://linkedin.com/in/usman-mohammed-41332845/
scrapedin: 2020-01-23T02:23:36.959Z info: [profile] starting scraping url: https://www.linkedin.com/in/usman-mohammed-41332845
2020-01-23T02:23:37.960Z info: starting scraping: https://linkedin.com/in/smfaisal29/
scrapedin: 2020-01-23T02:23:37.960Z info: [profile] starting scraping url: https://www.linkedin.com/in/smfaisal29
scrapedin: 2020-01-23T02:23:41.554Z info: [profile] scrolling page to the bottom
scrapedin: 2020-01-23T02:23:42.066Z info: [scrollToPageBottom] scrolling to page bottom (1)
scrapedin: 2020-01-23T02:23:42.624Z info: [scrollToPageBottom] scrolling to page bottom (2)
scrapedin: 2020-01-23T02:23:42.988Z info: [profile] applying 1st delay

@Zackhardtoname
Copy link

Same problem

@leonardiwagner
Copy link
Member

@Zackhardtoname Are you using a company/recruiter profile to login or just a regular employee one?

Please set isHeadless to false on config.json , this will open the browser while crawling, please check if it's really logged (looking on the LinkedIn top bar)

And also confirm that's 1.0.11 scrapedin version on your package.json.

@cyanide2019 could you do the same please? I couldn't reproduce this error, it's working here, thanks.

@leonardiwagner leonardiwagner added the waiting for response waiting for issue owner response label Feb 11, 2020
@Zackhardtoname
Copy link

Zackhardtoname commented Feb 11, 2020 via email

@leonardiwagner
Copy link
Member

@Zackhardtoname so please do the mentioned configurations and post the results here when you can.

@cyanide2019
Copy link
Author

cyanide2019 commented Feb 11, 2020 via email

@Aditya94A
Copy link

@cyanide2019 Where you able to find a solution for this?

@pushparmar
Copy link

pushparmar commented May 15, 2020

What is the use of
"rootProfiles": [
"https://www.linkedin.com/in/place/",
"https://www.linkedin.com/in/here/",
"https://www.linkedin.com/in/profiles/",
"https://www.linkedin.com/in/to-start-the-crawler/"
]
in config.json?

Also, I want to search the profiles based on some particular keywords, but
"relatedProfilesKeywords": ["javascript"], does not seems to work.

@PriyaJainDev
Copy link

@cyanide2019 Is there any way that I can use particular keywords and then the crawler can search through all available profiles based on those keywords only?

@ThomasProctor
Copy link
Contributor

It's a little hard to follow what was happening here, but I think I had the same problem. Login from credentials doesn't work with headless, but everything works fine with the "headed" browser. Headless works fine with cookies for me though.

I suspect that they might just be checking the user-agent in the header and refusing to log you in or giving you a captcha if it says that it's headless. I might do some experimentation there if I find I need headless login.

@ThomasProctor
Copy link
Contributor

If I get the time, I'll do some more experimentation and open a separate issue if I really have a diagnosable problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for response waiting for issue owner response
Projects
None yet
Development

No branches or pull requests

7 participants