Getting Annotations off the OnyxBook 60 (aka. BeBook Neo)



It is fairly annoying and useless not to be able to get your annotations and comments off an otherwise excellent ebook reader like the OnyxBook 60.

I started this back of the napkin project to learn a bit of Python and to be able to populate by Book reviews with some relevant quotes.

Prerequisites


The development is being done using:


Findings


The comments are stored in a folder together with the ebook. Folder is called .onyx and inside we have a file that is called [Name of the Book file].sketch. This is an sqlite3 database file.

The comments are stored in a sqlite database file using the following format:

CREATE TABLE annotation (id integer PRIMARY KEY, page_position integer,DATA blob);
CREATE TABLE sketch (id integer PRIMARY KEY, page_id text,DATA blob,background_id text);
CREATE INDEX id_index ON sketch (id);


The annotation table contains the annotations.
The sketch table presumably contains you hand drawings on top of pdfs if you are so inclined.
I will be focusing on the annotations for now only.

Obviously, the comments are stored in the BLOB, apparently in UTF-8 format and some additional structure. I have some annotations from an older firmware version and apparently they were ASCII back than (?!)

The following python code dumps the database from Confessions of an IT Manager:

#import the necessary module
from sqlite3 import dbapi2 as sqlite


#connect to the db and create a cursor
con=sqlite.connect("Confessions.sketch")
cur=con.cursor()


#return all rows
cur.execute("select * from annotation")
#iterating trough the response
for row in cur:
    print "| ID: %s | Page position: %s |" % (row[0], row[1])
    print " Data: %s" % (row[2])
    print "==================="
   

#close currsor and connection
cur.close()
con.close()



The output is previously kind of garbled but we can identify the source text only d o u b l e s p a c e d. Probably because we are not handling the Unicode correctly.

After trials and tribulations and the use of a spreadsheet I came to the conclusion that this is the structure of the record:

NoByteMeaning
10000 Beginning, we skip
2XXYYNumber of annotations in this record
30000Marker, we skip
4000B 0000 00 Misterious sequence of 5 bytes
5ZZWWNumber of records to follow
6AABBLength of record
7Actual Data Actual data, in PDFs it starts with #pdfloc
80000 Marker
9AABBLength of record
10Actual Data Actual data, in PDFs it starts with #pdfloc
110000 Marker
12AABBLength of comments record
13Actual Data This is our comment!!!
140000 Marker

However there are some inconsistencies:

Switching to Java I created the following piece of code (not to proud of it) that handles the annotation part only (5-14). This class will have to be instantiated for each and every comment in the record.

public class Annotation {
public int NumberOfAnnotations;
public String[] AnnotationText;

    public Annotation(byte[] data) throws Exception{
      int groupSize= BytesToInt(data[0], data[1]);
      int seek=2; //we just used the first two bytes for reading the size

      AnnotationText = new String[groupSize+1];
     
      System.err.println("Established group size: " + groupSize);


       for(int i=0;i<=groupSize;i++){
           System.err.println("Group " + i);

           seek=seek+2; //jump over the two zero bytes.
           int datalength=BytesToInt(data[seek],data[seek+1]);
           seek=seek+2;

           System.err.println("Data length: " + datalength);

           byte[] holder = new byte[datalength]; //this is or actual data.
           System.arraycopy(data, seek, holder, 0, datalength);

           AnnotationText[i]=new String(holder, "UNICODE"); //it;s stored as unicode
           seek=seek+datalength; //move to next record.
           System.err.println("text: " + AnnotationText[i]);
       }


    }

    private int BytesToInt(byte b1, byte b2){

       return new Integer(b1)*256+(new Integer (b2));


    }

}




CategoryProjects
CategoryIT
CategoryBooks
CategoryDraft
There is one comment on this page.
[Display comment]
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki