ray.data.Dataset.randomize_block_order#

Dataset.randomize_block_order(*, seed: Optional[int] = None) → ray.data.dataset.Dataset[source]#

Randomly shuffle the blocks of this Dataset.

This method is useful if you split() your dataset into shards and want to randomize the data in each shard without performing a full random_shuffle().

Examples

>>> import ray
>>> ds = ray.data.range(100)
>>> ds.take(5)
[{'id': 0}, {'id': 1}, {'id': 2}, {'id': 3}, {'id': 4}]
>>> ds.randomize_block_order().take(5)  
{'id': 15}, {'id': 16}, {'id': 17}, {'id': 18}, {'id': 19}]

Parameters: seed – Fix the random seed to use, otherwise one is chosen based on system randomness.
Returns: The block-shuffled Dataset.

Ray 2.7.2

ray.data.Dataset.randomize_block_order

ray.data.Dataset.randomize_block_order#