Skip to content

Fail to reproduce the results on re10k #45

@fangchuan

Description

@fangchuan

I've attempted to reproduce the results on the RealEstate-10K dataset following the [DFoT](https://github.com/kwsong0113/diffusion-forcing-transformer) protocol (**measure the difference between the generation and gt dataset**), but the generated outputs show significant discrepancies from the reported numbers in the paper.

Setup and preprocessing

  • Randomly selected 150 scenes as test set following DFoT protocol

  • Annotated test set with [ViPE](https://github.com/nv-tlabs/vipe) to obtain metric depth videos and camera poses

  • Generated detailed prompts using Gemini-3-pro from the given video

  • Adapted data_engine/create_input.py to match expected input conditions

Evaluation metrics

  • FVD, PSNR, LPIPS, SSIM for comparing generations against ground truth

Results comparison

Image

Here are the input and output:

Image
video_voyager.mp4

The result shows that the generated video contain sever color shift and restoration shift. Could you release the evaluation code on re10k, or help me debug the reproduction problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions