I'm trying to extract the text included in this PDF file using
I'm using the PyPDF2 module, and have the following script:
import PyPDF2 pdf_file = open('sample.pdf') read_pdf = PyPDF2.PdfFileReader(pdf_file) number_of_pages = read_pdf.getNumPages() page = read_pdf.getPage(0) page_content = page.extractText() print page_content
When I run the code, I get the following output which is different from that included in the PDF document:
!"#$%#$%&%$&'()*%+,-%./01'*23%4 5'%1$#26%3/%7/))/8%&)/26%8#3"%3"*%313/9#&) %
How can I extract the text as is in the PDF document?