Getting Annotations off the OnyxBook 60 (aka. BeBook Neo)
It is fairly annoying and useless not to be able to get your annotations and comments off an otherwise excellent ebook reader like the OnyxBook 60.
I started this back of the napkin project to learn a bit of Python and to be able to populate by Book reviews with some relevant quotes.
Prerequisites
The development is being done using:
- OnyxBook 60 (white, branded Bebook Neo)
- Firmware version 1.4 (20100811) unofficial, Onyx branded
- Python 2.7 (decided that I don't have time to learn a new language, came back to Java)
- SQLite3 module
Findings
The comments are stored in a folder together with the ebook. Folder is called .onyx and inside we have a file that is called [Name of the Book file].sketch. This is an sqlite3 database file.
The comments are stored in a sqlite database file using the following format:
CREATE TABLE annotation (id integer PRIMARY KEY, page_position integer,DATA blob);
CREATE TABLE sketch (id integer PRIMARY KEY, page_id text,DATA blob,background_id text);
CREATE INDEX id_index ON sketch (id);
CREATE TABLE sketch (id integer PRIMARY KEY, page_id text,DATA blob,background_id text);
CREATE INDEX id_index ON sketch (id);
The annotation table contains the annotations.
The sketch table presumably contains you hand drawings on top of pdfs if you are so inclined.
I will be focusing on the annotations for now only.
Obviously, the comments are stored in the BLOB, apparently in UTF-8 format and some additional structure. I have some annotations from an older firmware version and apparently they were ASCII back than (?!)
The following python code dumps the database from Confessions of an IT Manager:
#import the necessary module
from sqlite3 import dbapi2 as sqlite
#connect to the db and create a cursor
con=sqlite.connect("Confessions.sketch")
cur=con.cursor()
#return all rows
cur.execute("select * from annotation")
#iterating trough the response
for row in cur:
print "| ID: %s | Page position: %s |" % (row[0], row[1])
print " Data: %s" % (row[2])
print "==================="
#close currsor and connection
cur.close()
con.close()
from sqlite3 import dbapi2 as sqlite
#connect to the db and create a cursor
con=sqlite.connect("Confessions.sketch")
cur=con.cursor()
#return all rows
cur.execute("select * from annotation")
#iterating trough the response
for row in cur:
print "| ID: %s | Page position: %s |" % (row[0], row[1])
print " Data: %s" % (row[2])
print "==================="
#close currsor and connection
cur.close()
con.close()
The output is previously kind of garbled but we can identify the source text only d o u b l e s p a c e d. Probably because we are not handling the Unicode correctly.
After trials and tribulations and the use of a spreadsheet I came to the conclusion that this is the structure of the record:
| No | Byte | Meaning |
|---|---|---|
| 1 | 0000 | Beginning, we skip |
| 2 | XXYY | Number of annotations in this record |
| 3 | 0000 | Marker, we skip |
| 4 | 000B 0000 00 | Misterious sequence of 5 bytes |
| 5 | ZZWW | Number of records to follow |
| 6 | AABB | Length of record |
| 7 | Actual Data | Actual data, in PDFs it starts with #pdfloc |
| 8 | 0000 | Marker |
| 9 | AABB | Length of record |
| 10 | Actual Data | Actual data, in PDFs it starts with #pdfloc |
| 11 | 0000 | Marker |
| 12 | AABB | Length of comments record |
| 13 | Actual Data | This is our comment!!! |
| 14 | 0000 | Marker |
However there are some inconsistencies:
- If comment (14) is the last one in record, it does not end in a 0000 Marker
- Number of records to follow is usually a 2 but there are 3 records (pdfloc start, pdfloc end and actual comment). If we start counting from 0 than this is ok
- The mysterious 5 bytes do not make sense and they are identically repeated between records
Switching to Java I created the following piece of code (not to proud of it) that handles the annotation part only (5-14). This class will have to be instantiated for each and every comment in the record.
public class Annotation {
public int NumberOfAnnotations;
public String[] AnnotationText;
public Annotation(byte[] data) throws Exception{
int groupSize= BytesToInt(data[0], data[1]);
int seek=2; //we just used the first two bytes for reading the size
AnnotationText = new String[groupSize+1];
System.err.println("Established group size: " + groupSize);
for(int i=0;i<=groupSize;i++){
System.err.println("Group " + i);
seek=seek+2; //jump over the two zero bytes.
int datalength=BytesToInt(data[seek],data[seek+1]);
seek=seek+2;
System.err.println("Data length: " + datalength);
byte[] holder = new byte[datalength]; //this is or actual data.
System.arraycopy(data, seek, holder, 0, datalength);
AnnotationText[i]=new String(holder, "UNICODE"); //it;s stored as unicode
seek=seek+datalength; //move to next record.
System.err.println("text: " + AnnotationText[i]);
}
}
private int BytesToInt(byte b1, byte b2){
return new Integer(b1)*256+(new Integer (b2));
}
}
public int NumberOfAnnotations;
public String[] AnnotationText;
public Annotation(byte[] data) throws Exception{
int groupSize= BytesToInt(data[0], data[1]);
int seek=2; //we just used the first two bytes for reading the size
AnnotationText = new String[groupSize+1];
System.err.println("Established group size: " + groupSize);
for(int i=0;i<=groupSize;i++){
System.err.println("Group " + i);
seek=seek+2; //jump over the two zero bytes.
int datalength=BytesToInt(data[seek],data[seek+1]);
seek=seek+2;
System.err.println("Data length: " + datalength);
byte[] holder = new byte[datalength]; //this is or actual data.
System.arraycopy(data, seek, holder, 0, datalength);
AnnotationText[i]=new String(holder, "UNICODE"); //it;s stored as unicode
seek=seek+datalength; //move to next record.
System.err.println("text: " + AnnotationText[i]);
}
}
private int BytesToInt(byte b1, byte b2){
return new Integer(b1)*256+(new Integer (b2));
}
}
CategoryProjects
CategoryIT
CategoryBooks
CategoryDraft
[Display comment]