For grabbing PDFs from ICRA 2022
Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.
Noëlle de443e9bf6
Finalize main.py, Readme, and CSS/JS
pirms 2 gadiem
html Finalize main.py, Readme, and CSS/JS pirms 2 gadiem
.gitignore Finalize main.py, Readme, and CSS/JS pirms 2 gadiem
LICENSE Initial commit pirms 2 gadiem
README.md Finalize main.py, Readme, and CSS/JS pirms 2 gadiem
config.ini Initialize pirms 2 gadiem
empty-config.ini add empty-config pirms 2 gadiem
main.py Finalize main.py, Readme, and CSS/JS pirms 2 gadiem
requirements.txt Update main & requirements pirms 2 gadiem
scraper.py Initialize pirms 2 gadiem

README.md

pdf-grabber

For grabbing PDFs from ICRA 2022!

Usage

Make sure you have Python 3.6 or later, install a virtual environment if you like, then run these in a command line:

  • pip3 install -r requirements.txt
  • python3 main.py

This script will create a sub-directory, pdfs/, where it will store the PDFs it downloads. PDFs are named according to the presentation’s name and the PDF’s file number.

You can use the -e flag (e.g. python3 main.py -e 88) to determine which event ID to scan for presentations that have PDFs. By default, this is event 88. (The number is unfortunate; it’s the event this was written for, the 39th IEEE International Conference on Robotics and Automation, and bears no other symbolism here.)

You can use the -s flag (e.g. python3 main.py -s) to save the HTML content of each page along with the PDF. This is mostly for diagnostic purposes. The CSS and Javascript files required by the HTML files are included here, but you may have to move them somewhere else to get them to work properly (where depends on your system).