->

OCR for Smart Data Extraction from PDF and Images with NER

OCR for Smart Data Extraction from PDF and Images with NER

https://www.udemy.com/course/ocr-for-smart-data-extraction-from-pdf-and-images-with-ner

 

Learn Data Extraction, Labelling with Training using Spacy & build a solution with Python, Pandas, OCR and NER concepts


 

 

What you'll learn: 

Understand data extraction from different types of documents such as PDF, Word and Scanned Images

Learn how to use Tesseract and PyTesseract for recognition of data from images

Learn how to use Spacy efficiently for labelling along with training on custom data for NER

Use Pandas to convert extracted data to a CSV format

Requirements:

Basic Python Programming knowledge

Description:

Gain a competitive edge in the world of Computer Vision through this course by learning how to do Smart Data Extraction from Pdf and Images.

The technology landscape of world has brought in cognitive skills at the forefront where major emphasis is on intelligent data extraction. This becomes more complex due to the huge variety of input documents such as pdf document with structured data, scanned pdf document and word document. This course aims to solve this challenging problem by helping you to understand these various formats and then empower you to do smart data extraction using Python, Pandas, OCR, Tesseract, PyTesseract, OpenCV, Spacy and NER concepts.

The course will guide you on how you can build a common pipeline irrespective of multiple data formats through a structured workflow wherein you will learn Data Extraction using OCR, Data Labelling with Spacy along with Training a model on custom NER data and validating the model through prediction. Towards the end, we will combine all the learnings to build a Smart Text Extractor application.

The course has been designed to explain text data extraction workflow in depth by first explaining the technology concepts and then their implementation through code. Detailed code walkthrough has been included for all the code implementations and 12 supporting source code files are available for download. In addition to this, the quiz at the end of course helps you to assess your knowledge and identify the improvement areas.

Enroll in this course and enhance your cognitive capabilities. Here are just few of the topics we will be learning:

 

· Understanding basics of Data Conversion 

· Conversion and Extraction from structured PDF document

· Conversion of Scanned PDF document to text

· Conversion and Extraction of data from word document to text

· Common Format for Pipeline for all types of document

· Image Reading using PIL and OpenCV

· Tesseract for Extraction 

· Tesseract Page Segmentation Mode (PSM) and OCR Engine Mode (OEM)

· Extraction of Data from Image

· PyTesseract Operations for conversion of  documents to readable text

· Named Entity Recognition (NER)

· Spacy Entity Types

· IOB Format

· Labelling with Spacy for NER

· Training Spacy model on custom data using NER

· Predicting using Trained Spacy Model

· Pandas

· Convert Data to CSV Output using DataFrameWho this course is for:Python Developer who want to learn data extraction using OCRNLP and NER Enthusiast who are keen to explore Text LabellingComputer Vision professionalsOCR Engineer

Who this course is for:

Python Developer who want to learn data extraction using OCR

NLP and NER Enthusiast who are keen to explore Text Labelling

Computer Vision professionals

OCR Engineer

 

OCR for Smart Data Extraction from PDF and Images with NER


 TO MAC USERS: If RAR password doesn't work, use this archive program: 

RAR Expander 0.8.5 Beta 4  and extract password protected files without error.


 TO WIN USERS: If RAR password doesn't work, use this archive program: 

Latest Winrar  and extract password protected files without error.


 Solid   |  

Information
Members of Guests cannot leave comments.




rss