Optical Character Recognition Project Melanie Krug

Background of OCR

How did hospitals store all the information they get from patients?

Pennsylvania Hospital was founded in 1751 by Dr. Thomas Bond and Benjamin Franklin. When they built the first hospital their goal, according to the University of Pennsylvania, was "To care for the sick-poor and insane who were wandering the streets of Philadelphia." Ever since that day, hospitals have been recording medical records for every patient that walks in the door. Most medical records and cases are in paper form and are sloppily kept in file cabinets in the 5,627 hospitals, according to the American Hospital Association, scattered across the United States.

The problem with the written system

Because most of the files are still in paper form, if someone has to go to a different hospital than the one they have gone to prior, the new hospital doesn’t know anything about the patient. Also, if a patient has a rare disease or a series of symptoms that do not match any disease that they have seen before, it is incredibly hard to find if anyone has had that disease before nor can they know how to treat it because the medical records are stuffed in a file cabinet.

Where are Hospital records kept now?

In 1968, American physician and researcher, Lawrence Weed, introduced the idea of problem-oriented medical records and eventually his ideas invented the idea of Electronic Medical Records, also referred to as EMR’S. In 1972, the Regenstrief Institute developed the first medical records system. For the past thirty years the computerized provider order entry, CPOE, has been in use for keeping track of current medical records, and very little older records.

However, even with this change, CRICO, Controlled Risk Insurance Company, looked at over 23,000 medical malpractice lawsuits where patients suffered from harm. CRICO discovered that in over 7,000 cases the problem was directly caused by miscommunication of certain facts, figures and findings. These errors cost the healthcare system $1.7 billion that could be used for medicine and medical research grants instead of lawsuits. In all the high-severity injury cases 37% involve some sort of communication failure. This problem can easily be fixed if every hospital had access to every person's medical records.

The stereotype that most doctors have terrible handwriting is true because they are constantly in a rush and handwriting is not something they care about. Thus, the handwriting on medical files will be more difficult to read. A lot of doctors also wrote in cursive so transferring files from cursive to text is also significantly harder.

How do Post offices ORGANIZE the MAIL so that most of the time you get all of your mail?

The answer, OCR. OCR is short for optical character recognition and it is used in today's society every single day. In 2006, the average amount of main delivered was 660 million pieces nationwide per day. All the mail has to be organized in sections to make sure that the mail is sent to the right person. The United States Postal Service uses the OCR technology to sort through mail so individuals do not have to. However the United States Postal Service matches the name on the envelope or other documentation to a name in the United States Postal Service’s database. They do not type exactly what is written on the envelope. But, most of the work is already done for them. This process is OCR's most used form.

Did you know that with the USPS mail sorter, if the envelope or legal document is written in cursive, it takes three weeks for that to leave the sorter rather than the typical 2 day time.?

OCR in popular society

Samsung and Microsoft have both been working tremendously hard on trying to translate text from images, and most Samsung phones have the option to have a writing pad to write instead of using a keyboard. However, you have to only type one word at a time ruining fluidity which would be needed for text recognition. The technology to translating writing into text is already out there, the only obstacle now is trying to put the two concepts together. If there was a way to combine the reading technology that Samsung has for handwriting with pictures that are already taken with someone’s phone, then that would make transferring files from paper to computer significantly easier than doing it by hand.

WHat society finds wrong about having open access to medical files

The most prominent ethical issue that the people have with this concept is privacy and personal security. If this innovation were to take off and prosper and all 5,627 hospitals were using OCR’s, everyone's data would be in the same place. Although that is great for the doctors and ideally the patient, having this much access towards every person’s medical records can be an endangerment to society. Any doctor can look up any person’s medical information for clinical and personal use. The risk of personal use is putting fear in society.

Another issue is a lot of people feel that this will break the bond between doctor patient confidentiality. Although doctor patient confidentiality is used mainly for mental health and not clinical health, the people of the United States do not feel very comfortable with their information out there for any medical professor to see.

Original plan

For my project, I planned to use Adobe Acrobat DC and Microsoft OneNote to try and figure out the most efficient way for transferring paper files to computerized documentation. As well as using Adobe Acrobat DC and Microsoft OneNote I was also going to be using onlineocr.net, Google Drive, and ocrconvert.com.

Coming up with a hypothetical design

The majority of my project was hypothetical. Unfortunately I could not make my own OCR system and I was not very successful with altering the ones that already existed. But if I were to create my own OCR system, I would design it one step at a time. First I would program it to read all of the letters of the alphabet by making it be able to read arial font. Gradually I would adjust the font style so it would adapt to the more complex fonts, and I would not move on to the next one until the one before it had been successfully read at least 5 times in row. Once the computer fonts were successfully readable by the OCR software, I would start to work on handwriting. I would not design my program to be able to read script though because it is nearly impossible. I would need the software to be able to read print handwriting because that's the ultimate thing that I am trying to succeed in.

Another downfall I have when it comes to creating an OCR software is I barely know anything about coding. Over the last few weeks of my project I learned the basics of HTML coding on Codecademy, but not nearly enough information did I learn to create a software. Coding would be the number one most important thing in creating and editing OCR technology, and it is the only portion of my project that relates to STEM. Granted, coding is the entire project in a sense.

Where would my design go?

Ideally to connect this to the main branches of engineering I would want my product to be something used in the hospital department. However a more realistic target audience would be students. Students can use this OCR system to transfer their handwritten notes to computerized documents for easy storage, sharing, and editing. This would also make everything more organized and less cluttered. This can be used by students of all ages, from kindergardeners to doctoral students.

My progress

My process did not allow to adequately test a prototype. However, when I did do some testing with the OCR software on Adobe Acrobat, Microsoft Onenote, and Google Drive, the amount of words and characters that transferred over correctly was slim to none. This meant that these current models of OCR recognition are not suitable for the goal that I was trying to foresee. Because of these low efficiency rates, I decided I was going to learn what it took for me to make my own software. I used codecademy.com to learn the basics of HTML software. As of right now I still do not know how to create my own software, but I am more knowledgeable on some of the basic coding ideas. This could help me in creating a better software at a later date.


As a whole, my project did not plan out as it had intended to at the beginning of the semester. None of what I had expected to happen actually happened. All three software's recognition ability was not good enough for being able to read written text. The programs could not recognize text that didn't already come from the internet and didn't have previous alternative texts. For me personally, this project was at the most, a failure. I learned quite a bit about different aspects of different programs, but the goal of the project was not met by any means.

I think this project could be done in hindsight, but not my someone who knows so little about coding and previously made Microsoft and Adobe programs. So anyone who would want to do a project pertaining to OCR, should have a copious amount of background knowledge on OCR, coding, Adobe, and Microsoft. Also, knowing how to transfer files into a .jpg was very important in this project because the readings could only be read from .jpg files. Finally, anyone creating an OCR software should make sure that written text is their top priority to transfer being the most common everyday use of OCR is the transferring of written text to computerized text, not computerized text to more computerized text.

Made with Adobe Slate

Make your words and images move.

Get Slate

Report Abuse

If you feel that this video content violates the Adobe Terms of Use, you may report this content by filling out this quick form.

To report a Copyright Violation, please follow Section 17 in the Terms of Use.