!!! abstract “”
How to use `open_()` for reading and writing files with automatic compression detection (gzip, bzip2, lzma, zip), `atomic_write` for safe file writes that clean up on failure, `iter_splitlines` and `iter_line_blocks` for streaming large files, and `is_url`/`open_url` for working with URLs.
open_() detects the compression format from the file
suffix and handles it automatically. Writing a gzip-compressed text file
is identical to writing a plain text file — just use a .gz
suffix.
```python { linenums=“1” notest } from scinexus import open_
with open_(“data/sample.txt.gz”, “wt”) as f: f.write(“Hello, compressed world!”) f.write(“Line two of the file.”)
<!-- [[[end]]] -->
## Reading a compressed file
Reading works the same way — `open_()` detects the `.gz` suffix and decompresses transparently.
<!-- [[[cog
from cog_utils import exec_codeblock, DOCS_DIR
src = \
"""
from scinexus import open_
with open_("data/sample.txt.gz") as f:
print(f.read())
"""
exec_codeblock(src=src)
import pathlib
pathlib.Path(DOCS_DIR / "data/sample.txt.gz").unlink(missing_ok=True)
]]] -->
```python { linenums="1" notest }
from scinexus import open_
with open_("data/sample.txt.gz") as f:
print(f.read())
# Hello, compressed world! Line two of the file.
Supported compression formats are gzip (.gz), bzip2
(.bz2), lzma (.xz, .lzma), and
zip (.zip).
open_() also handles URLs. Use is_url() to
check whether a path is a URL before opening it.
???- example “Checking and reading a URL”
```python linenums="1"
from scinexus.io_util import is_url
from scinexus import open_
url = "https://github.com/user-attachments/files/26728407/raw.zip"
print(is_url(url)) # (1)!
with open_(url, "rb") as f: # (2)!
header = f.read(20)
print(header)
```
1. `is_url()` returns `True` for `http`, `https`, and `file` scheme URLs.
2. `open_()` detects the URL and delegates to `open_url()`. Only read mode is supported for URLs.
Reading an entire large file into memory or iterating line by line
with Python’s built-in readline() can be inefficient. The
built-in approach makes a system call for every line, which becomes a
bottleneck for files with millions of lines. scinexus
provides two functions that read data in large chunks and then split
into lines, greatly reducing I/O overhead.
iter_splitlinesiter_splitlines(path, chunk_size=1_000_000) reads a file
in chunks (default 1 MB) and yields individual lines. It correctly
handles lines that span chunk boundaries.
```python { notest } from scinexus.io_util import iter_splitlines
for line in iter_splitlines(“large_file.txt”): process(line)
### `iter_line_blocks`
`iter_line_blocks(path, num_lines=1000, chunk_size=5_000_000)` builds on `iter_splitlines` — it accumulates lines into lists of `num_lines` and yields each list. This is useful when downstream processing works on batches (e.g. FASTA records where each record spans a fixed number of lines).
```python { notest }
from scinexus.io_util import iter_line_blocks
for block in iter_line_blocks("large_file.txt", num_lines=1000):
process_batch(block) # block is a list of up to 1000 strings
Use iter_splitlines when you need one line at a time.
Use iter_line_blocks when your processing naturally
operates on batches of lines.