Custom scanner guide

The Genius Scan SDK enables application developers to add a scanning module taking benefit of the power of the same technology embedded in the Genius Scan app.

The Core and the OCR modules of the SDK allow to add a fully customizable scan flow to any native app.

The Core module provides:

  • Document processing algorithms, which apply image treatment to image objects.
  • A PDF generator to create a PDF document from the processed images.
  • UI elements that help build the main screens of a scanning application.

Prerequisites

This guide assumes that you have followed the Getting Started guide :

  • You have integrated the GSKCore.xcframework (and, optionally, GSKOCR.xcframework) in your app.
  • You have initialized the SDK with the license key.
  • You have configured your app to request proper user permissions.

Core document processing

We’ve split the document processing operations into two classes: GSKDocumentDetector, which mainly handles the real-time document detection on the camera preview; and GSKDocumentProcessor, which applies various image processing algorithms such as perspective correction and filters to a single scan.

Document detection

The edge detection takes in an image and returns a quadrangle representing the four corners of the detected document.

GSKDocumentDetector *documentDetector = [GSKDocumentDetector new];

NSError *error = nil;
GSKQuadrangleDetectionResult *result = [self.documentDetector detectQuadrangleFromImage:self.image options:GSKDetectQuadrangleOptionsNone error:&error];
if (!result) {
  NSLog(@"Error while detecting document frame: %@", error);
  return;
}
GSKQuadrangle *quadrangle = result.quadrangle;

Document Processing

The GSKDocumentProcessor class takes an image as input and a list of operations to apply to this image. It returns a result object containing the processed image, and the enhancements that it applied.

The SDK can apply the following operations to an image:

  • Perspective correction
  • Curvature correction
  • Document enhancement
UIImage *image = 

// If previous code already applied the detection, we have a quadrangle:
GSKQuadrangle *quadrangle = 

GSKPerspectiveCorrectionConfiguration *perspectiveCorrectionConfiguration = [GSKPerspectiveCorrectionConfiguration perspectiveCorrectionConfigurationWithQuadrangle:quadrangle];
GSKCurvatureCorrectionConfiguration *curvatureCorrectionConfiguration = [GSKCurvatureCorrectionConfiguration curvatureCorrectionConfigurationWithCurvatureCorrection:YES];
GSKEnhancementConfiguration *enhancementConfiguration = [GSKEnhancementConfiguration enhancementConfigurationWithFilter:GSKFilterBlackAndWhite];

NSError *error = nil;
GSKProcessingResult *result = [[GSKDocumentProcessor new] processImage:originalImage
                                    perspectiveCorrectionConfiguration:perspectiveCorrectionConfiguration
                                      curvatureCorrectionConfiguration:curvatureCorrectionConfiguration
                                              enhancementConfiguration:enhancementConfiguration
                                                 rotationConfiguration:nil
                                                   outputConfiguration:[GSKOutputConfiguration defaultConfiguration]
                                                                 error:&error];
if (!result) {
    NSLog(@"Error while processing scan: %@", error);
    return;
}

UIImage *image = [UIImage contentsOfFile:result.processedImagePath];

PDF generation

The PDF generation module provides a couple objects to generate a PDF file.

PDF Page

An object wrapping the information to create a PDF page.

Objective-C
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:[[GSKPDFSize alloc] initWithWidth:8.27 height:11.69] /* size in inches for an A4 sheet */];

PDF Document

A PDF document representing a collection of PDF pages.

Objective-C
GSKPDFPage *page1 = 
GSKPDFPage *page2 = 

GSKPDFDocument *document = [[GSKPDFDocument alloc] initWithTitle:title password:nil keywords:nil pages:@[ page1, page2 ]];

PDF Generator

The generator takes in a PDF document and offers the ability to write the PDF file.

Objective-C
GSKPDFGenerator *generator = [GSKPDFGenerator createWithDocument:document];
[generator generatePDF:outputFilePath];

UI components

Live Capture Screen

The “capture” view displays a camera preview. It takes care of setting up the entire camera stack with automatic capture for you. The view comes free of buttons or any UI element so that you can design it as you want. You will need to take care of this in your implementation.

You subclass GSKCameraViewController and you can customize it as desired. You can refer to the CameraViewController class in the GSSDKDemo for an example.

Note that GSKCameraViewController sets up a cameraView that takes the entire screen. To add toolbars or other buttons, in your subclass’ viewDidLoad, take control of the layout by disabling the cameraView’s autoresizing mask: cameraView.translatesAutoresizingMaskIntoConstraints = false.

Edit Frame Screen

The edit frame screen lets the user adjust the auto-detected edges of a document.

You subclass GSKEditFrameViewController and you can customize it as desired. You can refer to the EditFrameViewController class in the GSSDKDemo for an example.

Text Recognition (OCR)

The OCR module provides a way to extract text and its layout from scanned images. The extraction outputs the text in two different formats: raw text and XML containing both the text and its layout (also called hOCR). It’s then possible to generate a PDF document using this information to make it searchable and selectable.

Extract text from images

Text Recognition relies on the Tesseract library and needs training data files on the mobile device. Training data files are specific for each language in which text needs to be recognized and are available on this page. You have to place them in a directory named tessdata on the device.

Objective-C
GSKOCRConfiguration *ocrConfiguration = [GSKOCRConfiguration new];
ocrConfiguration.trainedDataPath = 
ocrConfiguration.languageCodes = @[@"eng"];

GSKOCRResult *result = [GSKOCR recognizeTextForImageAtPath:filePath ocrConfiguration:ocrConfiguration onProgress:&progress error:&error];

GSKTextLayout* textLayout = result.textLayout;

Generate PDF document with text

PDF generation allows a text layout to be provided for each page of the document.

Objective-C
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:A4Size textLayout:textLayout];
// Generate PDF

Handle characters from various languages

By default, the PDF generation uses a standard font which supports English and Western European languages characters. If you perform text recognition for another language, you need to specify a font supporting this language’s characters when generating the PDF document.

© 2024 The Grizzly Labs. All rights reserved.