Custom scanner guide

Introduction

The Genius Scan SDK enables application developers to add a scanning module taking benefit of the power of the same technology embedded in the Genius Scan app.

The Core and the OCR modules of the SDK allow to build a fully customizable scan flow that can be added to a native app.

The Core module provides:

  • The Core Document Processing, which applies image treatment to image objects,
  • A PDF generator, to create a PDF document from the processed images,
  • UI elements that help building the main screens of a scanning application.

The SDK includes a licensing system and needs to be initialized with a key to work. Without it, all the methods included in the SDK will fail.

Please refer to the platform specific API documentation for more details about each method.

Licensing

A license key is needed to initialize the SDK. The key contains the Application ID / Bundle ID of your application and an expiration date. Please contact us if you don’t have a license key yet. If the key is invalid or your license is expired, the initialization will fail, and the SDK will not work. A good practice is to check if the initialization succeeds properly, and if not, you can take an appropriate workaround (e.g. disable the feature, prompt the user to update the application…).

iOS

if (![GSK initWithLicenseKey:@"<YOUR LICENSE KEY>"]) {
   // The license is expired or invalid
}

Android

The best place to initialize the SDK is in the main Activity of your application:

try {
   GeniusScanLibrary.init(getApplicationContext(), "<YOUR LICENSE KEY>");
} catch (RuntimeException e) {
   // The license is expired or invalid
}

Core document processing

At the core of the SDK, you will find the image processing methods doing all the transformations on the images. There are four main image processing routines, listed below, and they are generally used one after another, piping the output of the previous one into the next one.

Document detection

The edge detection takes in an image and returns a quadrangle representing the four corners of the detected document.

iOS

GSKQuadrangle *quadrangle;
quadrangle = [GSK detectQuadrangleFromImage:imageOutOfCamera options:0];

Android

Quadrangle quadrangle = GeniusScanLibrary.detectFrame(imageToAnalyzePath);

Document perspective correction

Also referred to as image warping, the perspective correction takes in the original image as well as a quadrangle (typically the quadrangle returned from the document detection) and returns a new image, warped.

iOS

UIImage *warpedImage = [GSK warpImage:imageOutOfCamera withQuadrangle:quadrangle];

Android

GeniusScanLibrary.warpImage(imageToWarpPath, warpedImagePath, quadrangle);

Document type detection

The document type detection estimates the best filter that you can apply to the given image. Typically, you apply this routine to the warped image.

iOS

GSKPostProcessingType type = [GSK bestPostProcessingForImage:warpedImage];

Android

ImageType type = GeniusScanLibrary.detectImageType(warpedImagePath);

Document enhancement

Document enhancement applies a filter (ie a set of image processing routines) to the image given in the parameters. The image output of this method is generally what’s considered as the final document.

iOS

UIImage *enhancedImage = [self enhanceImage:warpedImage withPostProcessing:type];

Android

GeniusScanLibrary.enhanceImage(warpedImagePath, enhancedImagePath, type);

PDF generation

The PDF generation module provides a couple objects to generate a PDF file.

PDF Page

An object wrapping the information to create a PDF page.

iOS

GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:[[GSKPDFSize alloc] initWithWidth:8.27 height:11.69] /* size in inches for an A4 sheet */];

Android

PDFPage page = new PDFPage(imageFilePath, new PDFSize(8.27f, 11.69f));

PDF Document

A PDF document representing a collection of PDF pages.

iOS

GSKPDFPage *page1 = 
GSKPDFPage *page2 = 

GSKPDFDocument *document = [[GSKPDFDocument alloc] initWithTitle:title password:nil keywords:nil pages:@[ page1, page2 ]];

Android

PDFPage page1 = 
PDFPage page2 = 

PDFDocument document = new PDFDocument(title, password, keywords, Arrays.asList(pages1, page2));

PDF Generator

The generator takes in a PDF document and offers the ability to write the PDF file.

iOS

GSKPDFGenerator *generator = [GSKPDFGenerator createWithDocument:document];
[generator generatePDF:outputFilePath];

Android

PDFGenerator generator = PDFGenerator.createWithDocument(document, null, null);
generator.generatePDF(outputFilePath);

UI components

Capture Screen

The “capture” view displays a camera preview. It takes care of setting up the entire camera stack for you. The view comes free of buttons or any UI element, so that you can design it as you want. You will need to take care of this in your implementation.

iOS

On iOS, you subclass GSKCameraViewController, and you can customize it as desired. You can refer to the CameraViewController class in the GSSDKDemo for an example.

Android

On Android, you will need to include the ScanFragment into an Activity that implements the ScanFragment.CameraCallbackProvider. You can refer to the ScanActivity class in the CustomDemo application for an example.

To enable or disable live document detection, use the method scanFragment.setRealTimeDetectionEnabled. And to implement auto trigger or react based on document detection events, set a listener and implement the callbacks with scanFragment.setBorderDetectorListener.

Edit Frame Screen

The edit frame screen lets the user adjust the auto-detected edges of a document.

iOS

On iOS, you subclass GSKEditFrameViewController and you can customize it as desired. You can refer to the EditFrameViewController class in the GSSDKDemo for an example.

Android

On Android, you subclass the BorderDetectionImageView and include it into the layout of an Activity. You can refer to the BorderDetectionActivity class in the CustomDemo application for an example.

Text Recognition (OCR)

The OCR module provides a way to extract text and its layout from scanned images. The extraction outputs the text in 2 different formats: raw text and XML containing both the text and its layout (also called hOCR). It’s then possible to generate a PDF document using this information to make it searchable and selectable.

Extract text from images

Text Recognition relies on the Tesseract library and needs training data files to be present on the mobile device. Training data files are specific for each language in which text needs to be recognized and are available on this page. They need to be stored in a directory called tessdata on the device.

iOS

GSKOCRConfiguration *ocrConfiguration = [GSKOCRConfiguration new];
ocrConfiguration.trainedDataPath = 
ocrConfiguration.languageCodes = @[@"eng"];

GSKOCRResult *result = [GSKOCR recognizeTextForImageAtPath:filePath ocrConfiguration:ocrConfiguration onProgress:&progress error:&error];

GSKTextLayout* textLayout = result.textLayout;

Android

OcrConfiguration ocrConfiguration = new OcrConfiguration(Arrays.asList("eng"), tessdataDirectory, false);

OcrResult result = ocrProcessor.processImage(image, ocrConfiguration, progressListener);

String xmlTextLayout = result.textLayout;

Generate PDF document with text

PDF generation allows a text layout to be provided for each page of the document that is converted to PDF.

iOS

GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:A4Size textLayout:textLayout];
// Generate PDF

Android

PDFPage page = new PDFPage(image.getAbsolutePath(), A4_SIZE, xmlTextLayout);
// Generate PDF

Products

Industries

Integration

Company

© 2023 The Grizzly Labs. All rights reserved.