Reading large sas7bdat file #144

iPinaki · 2021-09-08T19:21:57Z

iPinaki
Sep 8, 2021

Hi,

I need to read a large sas7bdat file (about 150 gb) and join it with another sas7bdat (a very small table with just few account ids) based on the account id. I have tried to use read in chunks and enabled multiprocessing but yet, it is taking way too long to read and load the large file. Can you please suggest if it can handle 150 Gb sas7bdat file? If Yes, what is the most efficient way of doing that. A little code snippet would be helpful and appreciated! Thanks in advance.

ofajardo · 2021-09-09T06:05:51Z

ofajardo
Sep 9, 2021
Maintainer

To be honest I have never dealt with such a big file, so I am just guessing: you could try either doing it with SAS (I guess they are more efficient managing their own files), or read it once with pyreadstat but transform it to a different format that would make it faster to handle, some kind of relational database probably as you need to join two tablesn (you could try with sqlite to start with, but maybe you will need something more powerful). You could research on the internet what people use for those situations and transform your data into that.

Maybe others have other suggestions.

0 replies

theinexorable · 2022-01-11T21:42:09Z

theinexorable
Jan 11, 2022

I had to read >1.5 TB from a network storage, and never got to read anything, although n records and n offset were set to about 100 or 1000, with no more than 30 columns, not too long text inside. So the one purpose I needed to use this library for was not met.
It just started running (kernel indicating "working"), and network passed constant but small amounts of data - ~2.5MB/s, and nothing happened for minutes.

1 reply

amcenroe Jan 20, 2023

So i think n records and offset it requires file to load completely first before cut them off, and your ram size should be able to handle that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading large sas7bdat file #144

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Reading large sas7bdat file #144

iPinaki Sep 8, 2021

Replies: 2 comments · 1 reply

ofajardo Sep 9, 2021 Maintainer

theinexorable Jan 11, 2022

amcenroe Jan 20, 2023

iPinaki
Sep 8, 2021

Replies: 2 comments 1 reply

ofajardo
Sep 9, 2021
Maintainer

theinexorable
Jan 11, 2022