Document
Details for the DOCUMENT link type
Type Value: DOCUMENT
Status: PRODUCTION
A DOCUMENT
link type represents common document formats including:
- PDF files (
.pdf
) - Microsoft Word documents (
.doc
,.docx
) - Rich Text Format files (
.rtf
) - OpenDocument Text files (
.odt
) - Microsoft Excel spreadsheets (
.xls
,.xlsx
) - Microsoft PowerPoint presentations (
.ppt
,.pptx
) - XML documents (
.xml
)
In addition to the common attributes (title
, description
, image
), DOCUMENT
links provide detailed information like page count, author, and optional raw text access. If available, you can also retrieve responsive images for visual previews and even inspect individual pages of the document.
What It Includes
When type
is DOCUMENT
, the response includes a document
object with fields like:
- title: Title of the document.
- type: The document format (e.g., pdf, docx, xlsx, etc.).
- description: A brief summary if available or inferred.
- estimatedReadingTime: Approximate reading time in minutes based on the document’s length.
- rawTextUrl: A URL to fetch the document’s raw text.
- image: A responsive image object providing different resolutions of a representative page image (e.g., cover page).
- pages: An array of page-level metadata (if available).
- pageCount: Total number of pages in the document.
- author: The author or creator’s name.
- isEncrypted: Indicates whether the document is encrypted or password-protected.
- lastModified: Timestamp of the last modification date.
- language: The primary language of the document’s content.
These details let you build rich document previews—for instance, displaying the page count next to the title, showing a cover image thumbnail, or offering a “Read More” button linked to the raw text or a viewer.
Example Request
Example Response
Special Notes
- Fallback Strategies: Similar to PAGE links, if the document’s metadata is limited, Peekalink uses AI-driven techniques to infer missing title or description.
- Estimated Reading Time: Calculated based on the extracted text’s length.
- Page-Level Images & Data: Each page may have its own responsive image set, letting you show previews of individual pages.
- Format & Encryption: The
type
field helps identify the file format (e.g., pdf), andisEncrypted
warns if the document is protected.
Schema reference
For a full technical breakdown of all fields and validation rules, including detailed JSON schemas, please refer to the API Reference.