datatrove

PyPI

Is datatrove safe to use?

Based on the latest brin safety scan, no vulnerabilities or threats were detected for datatrove v0.8.0. Trust score: 80/100. No known CVE vulnerabilities, no detected threat patterns, and no suspicious capabilities identified. This is an automated, point-in-time assessment.

Install (safety-checked)

datatrove Passed Security Checks

No security concerns detected

clean
CVEs

0

Threats

0

Install Scripts

0

No Concerns Detected

No security concerns detected in the latest brin assessment. This is an automated, point-in-time evaluation — security posture may change.

This is an automated, point-in-time assessment and may contain errors. Findings are risk indicators, not confirmed threats. Security posture may change over time. Maintainers can dispute findings via the brin review process.

datatrove Capabilities & Permissions

What datatrove can access when installed. Review these capabilities before using with AI agents like Cursor, Claude Code, or Codex.

Network Access

This package makes network requests.

ai2-s2-research-public.s3-us-west-2.amazonaws.comarxiv.orgdl.fbaipublicfiles.comen.wikipedia.orgfasttext.ccgithub.comgroups.google.comhf.cohuggingface.cojmlr.org+5 more
Protocols: http, https, tcp

Filesystem Access

Reads and writes to the filesystem.

.env (rw)/usr/ (rw).env (rw).env (rw)/home/ (rw).env (rw).env (rw).env (rw)+2 more

Process Spawning

This package can spawn child processes.

Environment Variables

Accesses the following environment variables.

DATATROVE_CPUS_PER_TASKDATATROVE_EXECUTORDATATROVE_GPUS_ON_NODEDATATROVE_MEM_PER_CPUDATATROVE_NODE_IPSDATATROVE_NODE_RANKHF_HOMEHF_HUB_ENABLE_HF_TRANSFERRAY_NODEIDRAY_NODELIST+6 more

Native Modules

Contains native code that runs outside the JavaScript sandbox.

numpy

AGENTS.md for datatrove

Good instructions lead to good results. brin adds datatrove documentation to your AGENTS.md so your agent knows how to use it properly—improving both safety and performance.

brin init

Vercel's research: 100% accuracy with AGENTS.md vs 53% without →

datatrove Documentation & Source Code

For the full datatrove README, API documentation, and source code, visit the official package registry.

Frequently asked questions about datatrove safety

Weekly Downloads

17.0K

Version

0.8.0

Last Scanned

Feb 12, 2026

Trust Score

80/100·Legitimacy signals, not safety

Capabilities

Network

Connects to: ai2-s2-research-public.s3-us-west-2.amazonaws.com, arxiv.org, dl.fbaipublicfiles.com...

Filesystem

Reads & Writes files

Process

Spawns child processes

Environment

Accesses: DATATROVE_CPUS_PER_TASK, DATATROVE_EXECUTOR, DATATROVE_GPUS_ON_NODE...

Native

Contains native modules