The Genius Scan SDK enables application developers to add a scanning module taking benefit of the power of the same technology embedded in the Genius Scan app.
The Core and the OCR modules of the SDK allow to build a fully customizable scan flow that can be added to a native app.
The Core module provides:
The SDK includes a licensing system and needs to be initialized with a key to work. Without it, all the methods included in the SDK will fail.
Please refer to the platform specific API documentation for more details about each method.
A license key is needed to initialize the SDK. The key contains the Application ID / Bundle ID of your application and an expiration date. Please contact us if you don’t have a license key yet. If the key is invalid or your license is expired, the initialization will fail, and the SDK will not work. A good practice is to check if the initialization succeeds properly, and if not, you can take an appropriate workaround (e.g. disable the feature, prompt the user to update the application…).
if (![GSK initWithLicenseKey:@"<YOUR LICENSE KEY>"]) {
// The license is expired or invalid
}
The best place to initialize the SDK is in the main Activity of your application:
try {
GeniusScanLibrary.init(getApplicationContext(), "<YOUR LICENSE KEY>");
} catch (RuntimeException e) {
// The license is expired or invalid
}
At the core of the SDK, you will find the image processing methods doing all the transformations on the images. There are four main image processing routines, listed below, and they are generally used one after another, piping the output of the previous one into the next one.
The edge detection takes in an image and returns a quadrangle representing the four corners of the detected document.
GSKQuadrangle *quadrangle;
quadrangle = [GSK detectQuadrangleFromImage:imageOutOfCamera options:0];
Quadrangle quadrangle = GeniusScanLibrary.detectFrame(imageToAnalyzePath);
Also referred to as image warping, the perspective correction takes in the original image as well as a quadrangle (typically the quadrangle returned from the document detection) and returns a new image, warped.
UIImage *warpedImage = [GSK warpImage:imageOutOfCamera withQuadrangle:quadrangle];
GeniusScanLibrary.warpImage(imageToWarpPath, warpedImagePath, quadrangle);
The document type detection estimates the best filter that you can apply to the given image. Typically, you apply this routine to the warped image.
GSKPostProcessingType type = [GSK bestPostProcessingForImage:warpedImage];
ImageType type = GeniusScanLibrary.detectImageType(warpedImagePath);
Document enhancement applies a filter (ie a set of image processing routines) to the image given in the parameters. The image output of this method is generally what’s considered as the final document.
UIImage *enhancedImage = [self enhanceImage:warpedImage withPostProcessing:type];
GeniusScanLibrary.enhanceImage(warpedImagePath, enhancedImagePath, type);
The PDF generation module provides a couple objects to generate a PDF file.
An object wrapping the information to create a PDF page.
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:[[GSKPDFSize alloc] initWithWidth:8.27 height:11.69] /* size in inches for an A4 sheet */];
PDFPage page = new PDFPage(imageFilePath, new PDFSize(8.27f, 11.69f));
A PDF document representing a collection of PDF pages.
GSKPDFPage *page1 = …
GSKPDFPage *page2 = …
GSKPDFDocument *document = [[GSKPDFDocument alloc] initWithTitle:title password:nil keywords:nil pages:@[ page1, page2 ]];
PDFPage page1 = …
PDFPage page2 = …
PDFDocument document = new PDFDocument(title, password, keywords, Arrays.asList(pages1, page2));
The generator takes in a PDF document and offers the ability to write the PDF file.
GSKPDFGenerator *generator = [GSKPDFGenerator createWithDocument:document];
[generator generatePDF:outputFilePath];
PDFGenerator generator = PDFGenerator.createWithDocument(document, null, null);
generator.generatePDF(outputFilePath);
The “capture” view displays a camera preview. It takes care of setting up the entire camera stack for you. The view comes free of buttons or any UI element, so that you can design it as you want. You will need to take care of this in your implementation.
On iOS, you subclass GSKCameraViewController
, and you can customize it as desired. You can refer to the CameraViewController
class in the GSSDKDemo for an example.
On Android, you will need to include the ScanFragment
into an Activity
that implements the ScanFragment.CameraCallbackProvider
. You can refer to the ScanActivity
class in the CustomDemo application for an example.
To enable or disable live document detection, use the method scanFragment.setRealTimeDetectionEnabled
. And to implement auto trigger or react based on document detection events, set a listener and implement the callbacks with scanFragment.setBorderDetectorListener
.
The edit frame screen lets the user adjust the auto-detected edges of a document.
On iOS, you subclass GSKEditFrameViewController
and you can customize it as desired. You can refer to the EditFrameViewController
class in the GSSDKDemo for an example.
On Android, you subclass the BorderDetectionImageView
and include it into the layout of an Activity
. You can refer to the BorderDetectionActivity
class in the CustomDemo application for an example.
The OCR module provides a way to extract text and its layout from scanned images. The extraction outputs the text in 2 different formats: raw text and XML containing both the text and its layout (also called hOCR). It’s then possible to generate a PDF document using this information to make it searchable and selectable.
Text Recognition relies on the Tesseract library and needs training data files to be present on the mobile device. Training data files are specific for each language in which text needs to be recognized and are available on this page. They need to be stored in a directory called tessdata
on the device.
GSKOCRConfiguration *ocrConfiguration = [GSKOCRConfiguration new];
ocrConfiguration.trainedDataPath = …
ocrConfiguration.languageCodes = @[@"eng"];
GSKOCRResult *result = [GSKOCR recognizeTextForImageAtPath:filePath ocrConfiguration:ocrConfiguration onProgress:&progress error:&error];
GSKTextLayout* textLayout = result.textLayout;
OcrConfiguration ocrConfiguration = new OcrConfiguration(Arrays.asList("eng"), tessdataDirectory, false);
OcrResult result = ocrProcessor.processImage(image, ocrConfiguration, progressListener);
String xmlTextLayout = result.textLayout;
PDF generation allows a text layout to be provided for each page of the document that is converted to PDF.
GSKPDFPage *page = [[GSKPDFPage alloc] initWithFilePath:imageFilePath inchesSize:A4Size textLayout:textLayout];
// Generate PDF
PDFPage page = new PDFPage(image.getAbsolutePath(), A4_SIZE, xmlTextLayout);
// Generate PDF
© 2023 The Grizzly Labs. All rights reserved.