Skip to content

mattlm0831/OCR-Handwriting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

OCR-Handwriting Project

1. Summary of Design Decisions

This project will follow an abstraction based design: letters, words, lines, and entire documents. Every document can be broken down into these respective groups of abstraction.

(i) An entire document.
(ii) A collection of lines in a document.
(iii) A collection of words that are consecutively placed on each line.
(iv) Single characters that make up the words.

It can be seen that each level abstraction relies on the previous, going all the way down to the individual letters that are on the document. Given the nature of that abstraction Dr. Johnson suggested we start from the ground up, meaning first we will be building the data set for letters, and training a model to recognize other letters of similar (1800’s English) style. Our current priority is to build this large data set of characters for our neural network to pull from. After this set is built up we will work on figuring out the optimal design of our model and start to train it. After this section is completed we will have a network that can identify individual characters. From this base level we will then work on the next level of abstraction, that will be able to identify the words in a line. The project will follow a similar style of abstraction based progress until we can use every level to read an entire document.

2. Past Progress

John Quincy Adams

Currently the project is in the data collection phase. We have scans of manuscripts from John Quincy Adams that we are imaging. The imaging process will conclude shortly after which we will work on building the networks for the various phases of the model. We anticipate a schedule that proceeds as follows:


(i) Data collection -- Complete
(ii) Model outline
(iii) Model optimization

3. Current Progress

Example image of a prediction

Example of our neural network correctly predicting an image from an alternate author.

Currently the project is progressing nicely. We are now in a phase of basic R&D where we are using our collected data set to figure out an optimal model (convolution neural network) to categorize the characters. We will continually update this page with significant chunks of development.

About

OCR project using the library's digital repositories.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published