Extracting text and data from PDFs or images using optical character recognition (OCR)

Summary

Tools and resources available for getting help with OCR.

Body

Summary

Information on software and other tools that are available to extract text and data from a pdf, jpg or document, and where to go for help.

Environment

  • Researchers
  • Paper documents that need to be converted into tables or searchable text for tagging, analysis or to be used as content in other documents.
  • Digital documents that need to be converted into tables or searchable text for tagging, analysis or to be used as content in other documents.
  • ScholarSpace

Directions

Where to go for help

  • The ScholarSpace has the most software and support for optical character recognition (OCR). CSCAR and HPC support can also be of help when using programming methods.
    • Occasionally, the data is already available in digitized format. It's best to speak with the subject specialist in case you can save the time of extraction.

Computers and Tools available

  • The ScholarSpace has the best setup for machines to run the software. Additionally, some software below can be added to various U-M owned computers including some computer labs in departments and campus computing sites upon request.
  • LSA Technology Services also has field equipment you can test with that can help.


Software

  • ABBYY PDF Transformer is available for Mac and Windows and is licensed for windows on university owned resources.  It can be installed in various labs with university owned windows machines.
  • PDFpenPro is available for Mac and is licensed for University owned resources. It can be installed in various labs with university owned Mac hardware.
  • Adobe Acrobat is available for Mac and Windows and is licensed by LSA for university owned machines. It can be installed in various labs on university owned equipment.
  • ABBYY Fine Reader is available for Mac and Windows.  It is available for use in the Scholar Space in the library.  This requires you to purchase a license if you want to use it elsewhere.
  • OmniPage is available for Windows.  It is available for use in the Scholar Space in the library.  This requires you to purchase a license if you want to use it elsewhere.
  • Scanner Pro 7 turns your iOS phone into a scanner that directly does OCR. This requires an app purchase.
  • Prizmo is available for Mac and iOS and scans directly to OCR. This requires a purchase of a license.
  • With more programming skills:

Details

Details

Article ID: 1827
Created
Wed 5/27/20 11:23 AM
Modified
Fri 3/8/24 10:17 AM