Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentAnalysisClient and ClassifyDocumentOptions are inconsistent over different client SDKs #30040

Open
3 of 6 tasks
pnoyens opened this issue Jun 13, 2024 · 1 comment
Open
3 of 6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Form Recognizer customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@pnoyens
Copy link

pnoyens commented Jun 13, 2024

  • Package Name: @azure/ai-form-recognizer
  • Package Version: 5.0.0
  • Operating system: MacOS Sonoma 14.5
  • nodejs
    • version: v20.13.1
  • browser
    • name/version:
  • typescript
    • version: v4.2.3
  • Is the bug related to documentation in

Describe the bug
The documentation and implementation are unclear for DocumentAnalysisClient and ClassifyDocumentOptions.

When working in a TypeScript environment on NodeJS, it seems impossible to pass the pages query parameter to the beginClassifyDocument(FromUrl) function. However, when looking into the latest docs for the Python SDK, this is mentioned to be possible (https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-classify-document-from-url). At the same time, none of these options are mentioned in the JS docs, or even in the REST specification (https://learn.microsoft.com/en-us/rest/api/aiservices/document-classifiers/classify-document?view=rest-aiservices-v4.0%20(2024-02-29-preview)&tabs=HTTP).

This seems very weird, as the UI within Azure Document Intelligence Studio offers the possibility to define a range of pages to be analysed/classified when testing the models. This made me look into the network tab, and indeed a pages query parameter is passed to the classify endpoint (POST /documentintelligence/documentClassifiers/{modelid}:analyze). For now, I'm using this in my code with success, but it would be nice to know what is happening here and if this parameter will be completely deprecated in the future. In our situation, we classify documents based on the first page only, which works fine and fast. The solution can be found below and works (for now).

To Reproduce
Steps to reproduce the behavior:

  1. Try passing pages parameter via JS SDK; it will not be used in the underlying call to the analyze endpoint and is not recognized by the TS type of ClassifyDocumentOptions

Expected behavior
Consistent behaviour and implementation across SDKs and REST endpoints

Additional context
The solution we worked out:

analyze = async (fileUrl: string, modelId: string): Promise<string> => {
  const response = await this.provider.restClient.post(
    `/documentintelligence/documentClassifiers/${modelId}:analyze?pages=1&api-version=${this.configService.get<string>(
      'DOCUMENT_CLASSIFIER_API_VERSION',
    )}`,
    { urlSource: fileUrl },
  );
  return response.headers['operation-location'];
};
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. Cognitive - Form Recognizer customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. labels Jun 13, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @ctstone @vkurpad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. Cognitive - Form Recognizer customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

2 participants