OCR-Handwriting Project

1. Summary of Design Decisions

This project will follow an abstraction based design: letters, words, lines, and entire documents. Every document can be broken down into these respective groups of abstraction.

(i) An entire document.
(ii) A collection of lines in a document.
(iii) A collection of words that are consecutively placed on each line.
(iv) Single characters that make up the words.

It can be seen that each level abstraction relies on the previous, going all the way down to the individual letters that are on the document. Given the nature of that abstraction Dr. Johnson suggested we start from the ground up, meaning ﬁrst we will be building the data set for letters, and training a model to recognize other letters of similar (1800’s English) style. Our current priority is to build this large data set of characters for our neural network to pull from. After this set is built up we will work on ﬁguring out the optimal design of our model and start to train it. After this section is completed we will have a network that can identify individual characters. From this base level we will then work on the next level of abstraction, that will be able to identify the words in a line. The project will follow a similar style of abstraction based progress until we can use every level to read an entire document.

2. Past Progress

Currently the project is in the data collection phase. We have scans of manuscripts from John Quincy Adams that we are imaging. The imaging process will conclude shortly after which we will work on building the networks for the various phases of the model. We anticipate a schedule that proceeds as follows:

(i) Data collection -- Complete
(ii) Model outline
(iii) Model optimization

3. Current Progress

Example of our neural network correctly predicting an image from an alternate author.

Currently the project is progressing nicely. We are now in a phase of basic R&D where we are using our collected data set to figure out an optimal model (convolution neural network) to categorize the characters. We will continually update this page with significant chunks of development.

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
bin		bin
documentation		documentation
utilities		utilities
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

documentation

documentation

utilities

utilities

README.md

README.md

Repository files navigation

OCR-Handwriting Project

1. Summary of Design Decisions

2. Past Progress

3. Current Progress

About

Releases

Packages

mattlm0831/OCR-Handwriting

Folders and files

Latest commit

History

Repository files navigation

OCR-Handwriting Project

1. Summary of Design Decisions

2. Past Progress

3. Current Progress

About

Resources

Stars

Watchers

Forks