Skip to content

Latest commit

 

History

History
756 lines (655 loc) · 56.5 KB

react-component.mdx

File metadata and controls

756 lines (655 loc) · 56.5 KB
title description
Implementation

Installation


To install Carbon Connect as a pre-built React component, use npm as follows:

npm install carbon-connect

Prerequisites


The following packages will be added as peer dependencies:

  • @radix-ui/react-checkbox
  • @radix-ui/react-dialog
  • @radix-ui/react-dropdown-menu
  • @radix-ui/react-popover
  • @radix-ui/react-slot
  • class-variance-authority
  • clsx
  • next-themes
  • react
  • react-dom
  • react-dropzone
  • react-infinite-scroll-component
  • react-loader-spinner
  • tailwind-merge

Please check for the versions from package.json if you encounter a version mismatch error.

Component Properties


The CarbonConnect component accepts the following properties:

Property Type Required? Description
brandIcon String Yes A URL or a local path to your organization's brand icon.
orgName String Yes The name of your organization. This is displayed in the initial announcement modal view.
tokenFetcher Function Yes A function that returns a promise which resolves with the access and refresh tokens.
onSuccess Function No A callback function that will be called after the file upload is successful.
onError Function No A callback function that will be called if there is any error in the file upload.
children React Node(JSX) No You can pass any valid React node that will be used as a trigger to open the component.
entryPoint String No The initial active step when the component loads. Default entry point is LOCAL_FILES.
maxFileSize Number No Maximum file size in bytes that is allowed to be uploaded. Defaults to 10 MB
tags Object No Any additional data you want to associate with the component's state, such as an app ID.
enabledIntegrations dict No Let's you choose which 3rd party integrations to show. See below for more details about this prop.
primaryBackgroundColor String No The primary background color of the component. Defaults to #000000.
primaryTextColor String No The primary text color of the component. Defaults to #FFFFFF.
secondaryBackgroundColor String No The secondary background color of the component. Defaults to #FFFFFF.
secondaryTextColor String No The secondary text color of the component. Defaults to #000000.
allowMultipleFiles Boolean No Whether or not to allow multiple files to be uploaded at once. Defaults to false.
chunkSize Number No The no.of tokens per chunk. Defaults to 1500.
overlapSize Number No The no.of tokens to overlap between chunks. Defaults to 20.
open Boolean No Whether or not to open the component. Defaults to false.
setOpen Function No A function that will be called to set the open state of the component. Defaults to None.
alwaysOpen Boolean No Whether or not to always keep the component open. Defaults to false.
tosURL String No A URL to your organization's terms of service. Defaults to https://carbon.ai/terms.
privacyPolicyURL String No A URL to your organization's privacy policy. Defaults to https://carbon.ai/privacy.
navigateBackURL String No A URL to your intended destination. Defaults to None.
backButtonText String No The label that you want to show on the back button. Defaults to Go back.
zIndex Number No Update the z-index of the Carbon Connect modal.
embeddingModel String No Specifies the embedding model used for the integration. The options are OPENAI, AZURE_OPENAI, or COHERE_MULTILINGUAL_V3 for text and audio files, and VERTEX_MULTIMODAL for image files.
filePickerMode String No Specifies whether users can locally upload files, folders, or both. The options are FILES, FOLDERS, or BOTH.
prependFilenameToChunks String No Adds the file title to each chunk. Defaults to false.
showFilesTab Boolean No Shows the synced files tab in Carbon Connect 2.0. Defaults to true.
useRequestIds Boolean No A request_id will be assigned to the uploaded files in that session.
loadingIconColor String No Defines the color of the loader icon. This can be specified using standard CSS color names, or directly as either a Hexadecimal (Hex) code or RGB color values.
sendDeletionWebhooks Boolean No When set to true, enables triggering a FILE_DELETED webhook event whenever a user deletes files within Carbon Connect. If set to false, deleting files will not generate any webhook notifications.
fileSyncConfig Array No Includes data source and file specific configurations.
splitRows Boolean No If splitRows is set to true, CSV rows will be automatically split if they exceed either the specified chunk size or the maximum token limit of the embedding model. For LOCAL_FILES, splitRows can be set on the integration or extension level. For third-party connectors, this value can be set under the fileSyncConfig as split_rows. Defaults to false.
incrementalSync Boolean No By setting incremental_sync to true, only new or updated files since the last sync will be re-synced. Defaults to false.
filesTabColumns Array No Specifies which columns are displayed in the file list view and accepts an array of strings which can have values "name", "status", "created_at", "created_at", "external_url".
theme String No Specifies whether dark or light mode is enabled. The prop can have values "light", "dark", and "auto".
dataSourcePollingInterval Number No Specifies how frequently data sources are polled for any updates and events. The value is specified in milliseconds (ms) and the minimum allowed value for this property is 3000 ms. Defaults to 8000 ms.
openFilesTabTo String No Specifies which tab (FILE_PICKER or FILES_LIST) the user is taken to by default when they select an integration. Only applies when customer has enabled Carbon’s in-house file picker.
apiURL String No Defaults to https://api.carbon.ai but can be set to another URL value. For self-hosting customers, this URL value then acts as the base path for all of the requests made through Carbon Connect.
dataSourceTags String No Key-value pairs that will be added to all data sources connected through Carbon Connect as custom metadata. Example: {{"userId": "swapnil@carbon.ai"}}
dataSourceTagsFilterQuery String No This parameter filters for tags when querying data sources. It functions similarly to our documented file filters. If not provided, all data sources will be returned. Example: {{"key": "userId", "value": "swapnil@carbon.ai"}}

When you do not pass open or setOpen, Carbon Connect will manage the open state internally. If you pass open and setOpen, you will have to manage the open state yourself.

Usage


This section demonstrates how to integrate the Carbon Connect component within a Next.js project.

Client Side Configuration

1. Import Libraries and Components:

import { CarbonConnect } from 'carbon-connect';
import axios from 'axios';

2. Token Retrieval:

The tokenFetcher function is set up to request access tokens from Carbon directly via your backend:

const tokenFetcher = async () => {
  const response = await axios.get('/api/auth/fetchCarbonTokens', {
    params: { customer_id: 'your_customer_id' },
  });
  return response.data; // Must return data containing access_token
};

In the example above, tokenFetcher is a helper function that retrieves the necessary tokens for authentication. This function should be implemented in your client-side code and is designed to make a request to an API on your backend server. The API then requests tokens from the Carbon token creation endpoint. The Carbon token creation endpoint is a secure endpoint that requires a valid API key and customer ID. The customer ID is a unique identifier for your end-user, and you can pass any string as the customer ID. The API key is a secret key provided to you by Carbon. Please contact us to obtain your API key.

3. Implement Carbon Connect Component:

Here's a concise usage example. Customize according to your requirements:

<CarbonConnect
  orgName="Your Organization"
  brandIcon="path/to/your/brand/icon"
  embeddingModel={EmbeddingGenerators.OPENAI_ADA_LARGE_1024}
  tokenFetcher={tokenFetcher}
  tags={{
    tag1: 'tag1_value',
    tag2: 'tag2_value',
    tag3: 'tag3_value',
  }}
  maxFileSize={10000000}
  enabledIntegrations={[
    {
      id: 'LOCAL_FILES',
      chunkSize: 100,
      overlapSize: 10,
      maxFileSize: 20000000,
      allowMultipleFiles: true,
      maxFilesCount: 5,
      allowedFileTypes: [
        {
          extension: 'csv',
          chunkSize: 1200,
          overlapSize: 120,
          embeddingModel: 'OPENAI',
        },
        {
          extension: 'txt',
          chunkSize: 1599,
          overlapSize: 210,
          embeddingModel: 'AZURE_OPENAI',
        },
        {
          extension: 'pdf',
        },
      ],
    },
    {
      id: 'NOTION',
      chunkSize: 1500,
      overlapSize: 20,
      embeddingModel: 'OPENAI',
    },
    {
      id: 'WEB_SCRAPER',
      chunkSize: 1500,
      overlapSize: 20,
    },
    {
      id: 'GOOGLE_DRIVE',
      chunkSize: 1000,
      overlapSize: 20,
      fileSyncConfig: {
        detect_audio_language: true,
        split_rows: true,
        generate_chunks_only: true,
      },
    },
    {
      id: 'INTERCOM',
      chunkSize: 1000,
      overlapSize: 20,
      fileSyncConfig: {
         "auto_synced_source_types": AutoSyncedSourceTypes.TICKET,
      },
    },
  ]}
  onSuccess={(data) => console.log('Data on Success: ', data)}
  onError={(error) => console.log('Data on Error: ', error)}
  primaryBackgroundColor="#F2F2F2"
  primaryTextColor="#555555"
  secondaryBackgroundColor="#f2f2f2"
  secondaryTextColor="#000000"
  allowMultipleFiles={true}
  open={true}
  chunkSize={1500}
  overlapSize={20}
  // entryPoint="LOCAL_FILES"
></CarbonConnect>

4. Specify an Embedding Model (Optional)

If you are using Carbon to generate embeddings, in the Carbon Connect component, the specification of an embedding model (view available models) can be set at different levels:

Global Level: Here, the Embedding Model (embeddingModel) prop applies universally across the entire system or application. It serves as the default unless overridden at other levels.

Per Connector Level: This setting applies to a specific connector (ie: Google Drive), allowing customization for a particular connector's behavior, which takes precedence over the global setting for that connector.

Per File Type Level: The most specific setting, it applies at the individual file level, allowing granular control over embedding models for particular file types. This setting supersedes both connector and global settings, providing the highest priority. This is applicable for local file uploads only.

The order of precedence is: File Level > Connector Level > Global Level. Meaning, if a specific embedding model is defined at the file level, it takes priority over the connector-level setting, and the connector-level setting takes priority over the global setting. We default to OPENAI if no value is provided.

Server Side Configuration

Your backend should handle token requests like this:

const response = await axios.get('https://api.carbon.ai/auth/v1/access_token', {
  headers: {
    'Content-Type': 'application/json',
    'customer-id': '<YOUR_USER_UNIQUE_IDENTIFIER>',
    authorization: 'Bearer <YOUR_API_KEY>',
  },
});
if (response.status === 200 && response.data) {
  res.status(200).json(response.data);
}

Return Value

Ensure that your tokenFetcher returns an object structured as:

{
  access_token: string;
}

Enabling Connectors


You can enable connectors users can access via the enabledIntegrations property. The property also allows additional configuration per connector.

Here's the list of connectors available for activation:

`LOCAL_FILES`: This integration lets you upload files from your local machine. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Default is 1500. - `overlapSize`: Size of overlap in tokens. Default is 20. - `maxFileSize`: Maximum file size allowed for upload in bytes. Default is 10 MB. - `allowMultipleFiles`: Determines if multiple files can be uploaded simultaneously. Default is `false`. - `maxFilesCount`: Maximum number of files allowed for upload at once. Default is 10. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Default is `false`. - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when useOCR is set to true. Default is `false`. (Please set on file extension level.) - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `transcriptionService`: Specifies the model being used for audio transcripton. Accepts an enum of `ASSEMBLYAI` or `DEEPGRAM`. Defaults to `DEEPGRAM`. - `includeSpeakerLabels`: Specifies whether speaker diarization will be enabled for the audio transcription services. This allows us to format chunks so that the text is organized by utterances and each utterance will be labeled with the speaker. Defaults to `false`. - `generateChunksOnly`: When this flag is set to `true`, documents will be chunked without generating embeddings, and the `/list_chunks_and_embeddings` will list chunks only. Defaults to `false`. - `allowedFileTypes`: This is an array of objects. Each object represents a file type that is allowed to be uploaded. Each object can have the following properties: - `extension`: File extension of the allowed file type (required property). - `chunkSize`: Number of tokens per chunk for this file type. Defaults to global setting if not specified. - `overlapSize`: Overlap size in tokens for this file type. Defaults to global setting if not specified. - `skipEmbeddingGeneration`: Toggle to skip embedding generation for this file type. Defaults to global setting if not specified. - `embeddingModel`: Specifies the embedding model for this file type. Options same as global but specific to this file type. Defaults to global setting if not specified. - `useOcr`: Toggle to enable OCR for PDF files. Defaults to global setting if not specified. - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search for this file type. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the file. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. - `transcriptionService`: Specifies the model being used for audio transcripton. Accepts an enum of `ASSEMBLYAI` or `DEEPGRAM`. Defaults to `DEEPGRAM`. `NOTION`: This integration lets you upload files from your notion account. You can pass the following configuration for this integration
- `chunkSize`: Number of tokens per chunk. Defaults to 1500.
- `overlapSize`: Size of the overlap in tokens. Defaults to 20.
- `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`.
- `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models).
- `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`.
- `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. 
- `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only)
- `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`.
- `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only)
- `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`.
- `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`.
`WEB_SCRAPER`: This integration lets you scrape URLs. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `sitemapEnabled`: This option enables the sitemap tab to be displayed. Defaults to `true`. - `recursionDepth`: Depth of recursion for scraping. Defaults to 3. Use 1 to disable recursion and 0 to scrape recursively until reaching the `maxPagesToScrape` limit. - `maxPagesToScrape`: Maximum number of pages to scrape. Defaults to 100. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `enableAutoSync`: Toggle to enable scheduled syncs. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `htmlTagsToSkip`: Define HTML tags to exclude when converting HTML to plaintext. Defaults to `[]`, an empty list. - `cssClassesToSkip`: Define CSS Classes to exclude when converting HTML to plaintext. Defaults to `[]`, an empty list. - `cssSelectorsToSkip`: Define CSS Selectors to exclude when converting HTML to plaintext. Defaults to `[]`, an empty list. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `generateChunksOnly`: When this flag is set to `true`, documents will be chunked without generating embeddings, and the `/list_chunks_and_embeddings` will list chunks only. Defaults to `false`. `GOOGLE_DRIVE`: This integration lets you upload files from your Google Drive. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `INTERCOM`: This integration lets you select pages from your Intercom. You can pass the following configuration for this integration:
- `chunkSize`: Number of tokens per chunk. Defaults to 1500.
- `overlapSize`: Size of the overlap in tokens. Defaults to 20.
- `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`.
- `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models).
- `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`.
- `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. 
- `syncFilesOnConnection`: Auto-sync all files from a user’s connected account.
- `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only)
- `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`.
- `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only)
- `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`.
- `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`.
`DROPBOX`: This integration lets you upload files from your Dropbox. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `BOX`: This integration lets you upload files from your Box. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `ONEDRIVE`: This integration lets you upload files from your OneDrive. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `ZOTERO`: This integration lets you upload files from your Zotero. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `SHAREPOINT`: This integration lets you upload files from your SharePoint. You can reference the supported file formats [here](learn/files/text). You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `maxItemsPerChunk`: Specifies the number of items to include in a specific chunk. Defaults to `null`. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `useOcr`: Toggle to enable Optical Character Recognition (OCR) for PDFs. Default is `false`. - `parsePdfTablesWithOcr`: Enables table parsing when `useOCR` is set to `true`. Default is `false`. - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `CONFLUENCE`: This integration lets you upload files from your Confluence. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `ZENDESK`: This integration lets you upload files from your Zendesk. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `FRESHDESK`: This integration lets you sync pages from your Freshdesk. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `GITBOOK`: This integration lets you sync pages from your Gitbook. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `GITBOOK`: This integration lets you sync pages from your Gitbook. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `false`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `SALESFORCE`: This integration lets you sync pages from your Salesforce Knowledge. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `GURU`: This integration lets you sync content from your Guru workspace. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `GMAIL`: This integration enables you to import emails from Gmail, including file attachments. You can reference the supported file formats [here](learn/files/text). Once a user has connected their Gmail account, you can select which emails to upload via the `/integrations/gmail/sync` endpoint.

You can pass the following configuration for this integration: - filesTabColumns: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values "name", "status", "created_at", "external_url". - skipEmbeddingGeneration: Toggle to skip embedding generation. Defaults to false.

`OUTLOOK`: This integration enables you to import emails from Outlook, including file attachments. You can reference the supported file formats [here](learn/files/text). Once a user has connected their Outlook account, you can select which emails to upload via the `/integrations/outlook/sync` endpoint.

You can pass the following configuration for this integration: - filesTabColumns: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values "name", "status", "created_at", "external_url". - skipEmbeddingGeneration: Toggle to skip embedding generation. Defaults to false.

`Slack`: This integration enables you to import conversations from Slack. Once a user has connected their Slack account, you can select which conversations to upload via the `/integrations/slack/conversations` and `/integrations/slack/sync` endpoints. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. `RSS_FEED`: This integration lets you upload content from a RSS or Atom feed. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. `S3`: This integration lets you upload files from your Confluence. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `enableDigitalOcean`: Specifies whether files from Digital Ocean Spaces can be synced. The default value is `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `AZURE_BLOB_STORAGE`: This integration lets you upload files from your Confluence. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `enableDigitalOcean`: Specifies whether files from Digital Ocean Spaces can be synced. The default value is `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`. `GCS`: This integration lets you upload files from your Confluence. You can pass the following configuration for this integration: - `chunkSize`: Number of tokens per chunk. Defaults to 1500. - `overlapSize`: Size of the overlap in tokens. Defaults to 20. - `skipEmbeddingGeneration`: Toggle to skip embedding generation. Defaults to `false`. - `embeddingModel`: Specifies the embedding model used. You can find the model options [here](learn/models/models). - `generateSparseVectors`: Toggle to `true` to generate sparse vectors for hybrid search. Default is `false`. - `prependFilenameToChunks`: Adds the file title to each chunk for the integration. Defaults to `false`. - `syncFilesOnConnection`: Auto-sync all files from a user’s connected account. - `showFilesTab`: Shows the synced files tab in Carbon Connect for this specific integration. Defaults to `true`. (Carbon Connect 2.0 only) - `syncSourceItems`: Controls whether items from the file directory are synced by default. It defaults to `true`. - `useCarbonFilePicker`: Controls whether Carbon Connect defaults to Carbon’s file picker instead of the source’s file picker. Defaults to `false`. (Carbon Connect 3.0 only) - `incrementalSync`: By setting `incremental_sync` to `true`, only new or updated files since the last sync will be re-synced. Defaults to `false`. - `filesTabColumns`: Specifies which columns are displayed in the file list view and accepts an array of strings which can have values `"name"`, `"status"`, `"created_at"`, `"external_url"`.

Callback Function Props


onSuccess

Responds to successful events: file upload, 3rd party account connection, file selection, and web scraping initiation.

Event Types

  1. INITIATE: This event type is triggered when a user enters the integration flow (either for auth or file selection)
  2. ADD: This event type is triggered when a user authenticates an account under an integration.
  3. UPDATE: This event type is triggered when a user adds or removes files for an integration.
  4. CANCEL: This event type is triggered when when a user exits the integration flow without taking any action.

Callback Response

The data passed to the onSuccess callback prop will be:

LOCAL_FILES:

{
  status: 200,
  data: {
    "data_source_external_id": null, // This field is not applicable for local files
    "sync_status": null, // This is not applicable for local files
    "files": <Array of objects corresponding to the files uploaded>, (Refer to the file object format below)
  },
  action: 'UPDATE'
  event: 'UPDATE'
  integration: 'LOCAL_FILES',
}

WEB_SCRAPER:

{
  status: 200,
  data: {
    "data_source_external_id": null, // This field is not applicable for webscrapers
    "sync_status": null, // This is not applicable for webscrapers
    "files": <Array of objects corresponding to the parent URLs submitted>, (Refer to the file object format below)
  },
  action: 'UPDATE'
  event: 'UPDATE'
  integration: 'WEB_SCRAPER',
}

3rd Party Connectors

{
  status: 200,
  data: {
    "data_source_external_id": <Unique ID for the data source>
    "sync_status": <SYNC_STATUS>
    "files_synced": `true` or `false`
    "request_id": <Unique ID generated for the upload. Can be auto-generated if `useRequestIds` prop is set to `true`.>
  } or null,
  action: <ACTION_TYPE>, // `ACTION_TYPE` can be one of the following: `INITIATE`, `ADD`, `UPDATE`, `CANCEL`
  event: <EVENT_TYPE>, // `EVENT_TYPE` can be one of the following: `INITIATE`, `ADD`, `UPDATE`, `CANCEL`
  integration: <INTEGRATION_NAME>, // `INTEGRATION_NAME` can be one of the following: `LOCAL_FILES`, `NOTION`, `WEB_SCRAPER`, `GOOGLE_DRIVE`, `INTERCOM`, `DROPBOX`, `ONEDRIVE`,`BOX`
}

Each files object follows this format:

{
    "id": `Unique ID for the file, can be used for resyncing, deleting, updating tags etc.`,
    "source": `<integration_name>`, // One among `LOCAL_FILES`, `NOTION`, `WEB_SCRAPER`, `GOOGLE_DRIVE`, `INTERCOM`, `DROPBOX`, `ONEDRIVE`
    "organization_id": `<organization_id>`, // This is your unique organization id in carbon
    "organization_supplied_user_id": `<organization_supplied_user_id>`, // This is the unique user id that you pass to CC
    "organization_user_data_source_id": `<organization_user_data_source_id>`, // This is the unique user data source id that Carbon Connect creates for each user for each integration
    "external_file_id": `<external_file_id>`, // This is the unique file id in the 3rd party integration
    "external_url": `<external_url>`, // This is the unique url of the file in the 3rd party integration
    "sync_status": `<sync_status>`, // This is the sync status of the file. It can be one of the following: `READY`, `QUEUED_FOR_SYNCING`, `SYNCING`, `SYNC_ERROR`
    "last_sync": `<last_sync>`, // This is the timestamp of the last sync
    "tags": `<tags>`, // These are the tags passed in to CC
    "file_statistics": `<file_statistics>`, // This is the file statistics object
    "file_metadata": `<file_metadata>`, // This is the file metadata object
    "chunk_size":   `<chunk_size>`, // This is the chunk size used for the file
    "chunk_overlap": `<chunk_overlap>`, // This is the chunk overlap used for the file
    "name": `<name>`, // This is the name of the file
    "enable_auto_sync": `<enable_auto_sync>`, // This is the auto sync status of the file. This is a boolean flag
    "presigned_url": `<presigned_url>`, // This is the presigned url of the file
    "parsed_text_url": `<parsed_text_url>`, // This is the parsed text url of the file
    "skip_embedding_generation": `<skip_embedding_generation>`, // This is the skip embedding generation status of the file. This is a boolean flag
    "created_at": `<created_at>`, // This is the timestamp of the file creation
    "updated_at": `<updated_at>`, // This is the timestamp of the file updation
    "action": `<action>`, // This is the action type. It can be one of the following: `ADD`, `UPDATE`, `REMOVE`
}

onError

Triggered during file upload errors.

Structure:

{
  status: 400,
  action: 'UPDATE',
  event: 'UPDATE',
  integration: `<INTEGRATION_NAME>`, // 'LOCAL_FILES' or 'WEB_SCRAPER',
  data: `<data_object>`, // This field will be present only if the error is related to a file or web scraper
}