Ahmad Fijr's profile

Sustainable web scraping from wuzzuf site

I developed a sustainable program to pull data and Python jobs from wuzzuf...

The work of this program was a challenge, as most of the web scraping operations such as wuzzuf are for one time only or after a short period the program crashes due to the techniques that these sites use to combat web scraping, such as: the names of the changing classes, which change every period and the elements added with JavaScript and others.

For this reason, it is necessary to adopt a special technology for the work of web scraping for this site.
Technology: It simply takes advantage of the idea of ​​the stability of the html structure and depends on it and on the arrangement of the elements for children and parents.

The program is written in Python
There is an attached file called "Scraping_Data.xlsx" that contains the data scraped while the program is running:
link: shorturl.at/oAGQT

There is an attached file called "WUZZUF Scraping.rar" 
link:shorturl.at/FGOS8
that contains the source code for the program:

The document file expresses the preparatory raw code for the data extraction process.

myLib file containing functions with the same document code in an organized manner in order to be used as a library
main file executable file of the program

Note: It is true that I mentioned that this program is sustainable, but in fact nothing is guaranteed or permanent, it is only a way to make the program work for as long as possible.

The site may one day overcome this technology in order to prevent the withdrawal of data from it.
Sustainable web scraping from wuzzuf site
Published:

Owner

Sustainable web scraping from wuzzuf site

Published:

Tools

Creative Fields