- Using the original image to segment instead of the extracted slides, the problems of blurry and non-complete images can be solved (see Fig 1)
- Found a program pdf2Text to extract all the texts from a pdf slide set into text files. Using that, the next step is to think about what are the key words to be linked to a particular visual element.
- Started on reading up of how google carries out queries and the data structures used to store the index.
TO DO:
- Which key words to be linked to a visual element in a slide?
- To identify what kind of images the segmented images are: text, tables, charts, pictures etc.

Fig. 1: The segmented visual elements (example) cropped from the original slide to prevent any loss of quality.
No comments:
Post a Comment