Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of paragraphs and tables #118

Open
Prigin opened this issue Nov 17, 2021 · 4 comments
Open

Order of paragraphs and tables #118

Prigin opened this issue Nov 17, 2021 · 4 comments

Comments

@Prigin
Copy link

Prigin commented Nov 17, 2021

Problem

I need to get all paragraphs and tables in order they have in docx file. Is there any way I can do this?

Solution

May be just one index for paragraph objects and table objects will be enough.

@satoryu
Copy link
Member

satoryu commented Nov 18, 2021

You mean that at the latest version of this gem Document#paragraphs returns paragraphs in wrong order, right?
Could you give us a docx file to reproduce this behavior if you have?
The file would help us to investigate what happens.

Thanks

@Prigin
Copy link
Author

Prigin commented Nov 18, 2021

Not exactly. :) Sorry for not being transparent. Lets say I have a docx that I want to convert to txt:

image

I need to know place of each element(paragraphs and tables). How to get the same order of elements they have in DOCX? Or maybe they already have that method(which returns order number from doc). I cant actually find it :(

@aunghtain
Copy link

aunghtain commented Mar 10, 2022

I was able to do this as followed. I'm using private vars/methods, but if they open up more APIs in the future, we won't have to.

    doc = Docx::Document.open(file)
    doc.instance_variable_get("@doc").xpath('//w:document//w:body').children.each do |c|
      if c.name == 'p' # paragraph
        p = doc.send(:parse_paragraph_from, c)      
      elsif c.name = 'tbl' # table
        t = doc.send(:parse_table_from, c)  
      else # other types?
      end
    end

@aunghtain
Copy link

if u just want text, u don't need to parse them as paragraph/table. u can just get as "c.content"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants