![]() I used Python to create the average letter based on the two-dozen or so that I'd extracted. I copied all the letter "e"s into a folder. I didn't want to make some rather arbitrary decisions on which letters I like best. Here's a selection of letter "e" images which were extracted. Ideally we want several of each glyph because we're about to. It's hard to tell the difference between o and O, or commas and apostrophes. It's just a lot of tedious dragging and dropping. They can either be manually split, or ignored. We use this eroded image for contiguous detection - but we do the actual cropping on the original image.Īs you can see, it does make some character touch each other - which means you get occasional crops like this: In order to get glyphs which vertically separate, we need to vertically erode the image so it looks like this:Įrode = cv2.erode(image, kernel,iterations = 6) Something to note - the CHAIN_APPROX_SIMPLE is looking for contiguous characters. # Perform OCR and save individual lettersĮxtract_and_save_letters(image, contours, output_directory) Image, contours = preprocess_image(input_image_path) Letter_path = os.path.join(output_directory, letter_filename) # Create a filename with the detected letter #letter_text = letter_text.strip() # Remove leading/trailing whitespace #letter_text = pytesseract.image_to_string(letter_image, config='-psm 10') # (Don't) Perform OCR to extract the text (letter) from the contour # Crop and save each letter as a separate image # Create output directory if it doesn't exist # Find contours to isolate individual lettersĬontours, _ = cv2.findContours(binary_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)ĭef extract_and_save_letters(image, contours, output_directory): _, binary_image = cv2.threshold(image, 128, 255, cv2.THRESH_BINARY_INV) # Thresholding to convert to binary image Image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE) It then detects which character it is and saves each glyph to disk with a filename like this:Īs you can see, the text detection is good, but the letter recognition is poor. It then extracts every distinct letter, number, and punctuation mark. There are also plenty of ligatures to choose from: You can also see just how poor quality some of the letters are. Within the plays, there are some italic characters - which could be used to make a variant font. As you can see here, each letter "y" is substantially different. Of course, a modern font can't replicate the vagaries of hot metal printing. I plucked a couple of pages at random to see what I could find. I picked The Bodlian's scan as it seemed the highest resolution. There are various scans of the First Folio. You can read how it works, or skip straight to the demo. It's a nice font, but missing brackets and a few other pieces of punctuation. There's not much info about it, other than it's based on the 1623 folio. I found David Pustansky's First Folio Font. Now, before setting off on a journey, it is worth seeing if anyone else has tried this before. And, because I'm an idiot, I decided to try and build something similar using Shakespeare's first folio as a template. I recently read this wonderful blog post about using 17th Century Dutch fonts on the web. A most handsome set.Disclaimer! Work In Progress! See source code. ![]() Scattered foxing (as usual) volumes 1 and 4 with leather a little darker than the others usual offsetting on preliminaries from turn-ins. ![]() ![]() Quarto, original publisher's full tan morocco gilt. London and New York: The Nonesuch Press Random House, Inc., 1929-33. This is the finest of all editions of our greatest poet" ( Nonesuch Century, 58). The best of ancient and modern conjectural emendations are unobtrusively set in the margin for the benefit of a glancing eye. a model of careful proof reading and imaginative setting. The Nonesuch Shakespeare is "the chef d'oeuvre of the Nonesuch Press, and. Very handsomely bound and elegantly printed. Limited edition, one of only 1600 copies, of one of the most famous and desireable editions of Shakespeare's works. "The finest of all editions of our greatest poet" The text of the First Folio with Quarto variants and a selection of modern readings: edited by Herbert Farjeon ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |