作者: 制定法律 時(shí)間: 2025-3-21 22:24
,An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for?Large Vision-L, QwenVL-Chat, and Video-LLaVA. We find that the attention computation over visual tokens is extremely inefficient in the deep layers of popular LVLMs, suggesting a need for a sparser approach compared to textual data handling. To this end, we introduce FastV, a versatile plug-and-play method design作者: 偏見 時(shí)間: 2025-3-22 02:46 作者: Injunction 時(shí)間: 2025-3-22 04:58 作者: 沒血色 時(shí)間: 2025-3-22 12:21
,Bridging Different Language Models and?Generative Vision Models for?Text-to-Image Generation,a language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of components in text-to-image diffusion models with more adva作者: 帶傷害 時(shí)間: 2025-3-22 13:24
,Tackling Structural Hallucination in?Image Translation with?Local Diffusion,ages, such as unseen tumors in medical images, causing “image hallucination” and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducting separate image generations alleviates hallucina作者: 帶傷害 時(shí)間: 2025-3-22 17:04
,Hierarchical Separable Video Transformer for?Snapshot Compressive Imaging,posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstruction architecture without temporal aggregation i作者: 盤旋 時(shí)間: 2025-3-22 21:45 作者: deceive 時(shí)間: 2025-3-23 03:10 作者: Projection 時(shí)間: 2025-3-23 06:04 作者: eczema 時(shí)間: 2025-3-23 12:02
,Overcome Modal Bias in?Multi-modal Federated Learning via?Balanced Modality Selection,isting client selection methods simply consider the mining of distributed uni-modal data, yet, their effectiveness may diminish in multi-modal FL (MFL) as the modality imbalance problem not only impedes the collaborative local training but also leads to a severe global modality-level bias. We empiri作者: NIP 時(shí)間: 2025-3-23 14:49
,Comprehensive Attribution: Inherently Explainable Vision Model with?Feature Detector,ribution method aims to enhance the understanding of model behavior by identifying the important regions in images that significantly contribute to predictions. It is achieved by cooperatively training a selector (generating an attribution map to identify important features) and a predictor (making 作者: 強(qiáng)制性 時(shí)間: 2025-3-23 21:05 作者: 展覽 時(shí)間: 2025-3-23 23:24 作者: acrophobia 時(shí)間: 2025-3-24 05:23
,Pre-trained Visual Dynamics Representations for?Efficient Policy Learning,ilable and inhere a vast amount of prior world knowledge, the absence of action annotations and the common domain gap with downstream tasks hinder utilizing videos for RL pre-training. To address the challenge of ., we propose .re-trained .isual .ynamics .epresentations (PVDR) to bridge the domain g作者: ANN 時(shí)間: 2025-3-24 10:18 作者: 枕墊 時(shí)間: 2025-3-24 13:29 作者: 名義上 時(shí)間: 2025-3-24 17:53
,Follow the?Rules: Reasoning for?Video Anomaly Detection with?Large Language Models,little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direct use falls short of VAD. Specifically, the implic作者: TOXIC 時(shí)間: 2025-3-24 19:39 作者: 旅行路線 時(shí)間: 2025-3-25 00:21
0302-9743 ce on Computer Vision, ECCV 2024, held in Milan, Italy, during September 29–October 4, 2024...The 2387 papers presented in these proceedings were carefully reviewed and selected from a total of 8585 submissions. They deal with topics such as computer vision; machine learning; deep neural networks; r作者: 移動 時(shí)間: 2025-3-25 04:50
0302-9743 reconstruction; stereo vision; computational photography; neural networks; image coding; image reconstruction; motion estimation..978-3-031-73003-0978-3-031-73004-7Series ISSN 0302-9743 Series E-ISSN 1611-3349 作者: FAR 時(shí)間: 2025-3-25 11:22
https://doi.org/10.1007/978-3-642-34946-1 in the mask without losing minor ones. Our approach, validated through extensive experimentation, significantly improves WSS across five benchmarks (VOC: 79.8%, COCO: 53.9%, Context: 49.0%, ADE: 32.9%, Stuff: 37.4%), reducing the gap with fully supervised methods by over 84% on the VOC validation set. Code is available at ..作者: 維持 時(shí)間: 2025-3-25 12:09
ADR Tools in Spanish Administrative Lawl dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation. We conduct experiments on a series of robotics visual control tasks and verify that PVDR is an effective form for pre-training with videos to promote policy learning.作者: 有毛就脫毛 時(shí)間: 2025-3-25 18:13 作者: OUTRE 時(shí)間: 2025-3-25 23:20
,Pre-trained Visual Dynamics Representations for?Efficient Policy Learning,l dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation. We conduct experiments on a series of robotics visual control tasks and verify that PVDR is an effective form for pre-training with videos to promote policy learning.作者: 鼓掌 時(shí)間: 2025-3-26 02:18 作者: 終止 時(shí)間: 2025-3-26 05:21 作者: 信條 時(shí)間: 2025-3-26 11:03 作者: expunge 時(shí)間: 2025-3-26 13:18 作者: 協(xié)迫 時(shí)間: 2025-3-26 19:50
,Reinforcement Learning via?Auxiliary Task Distillation,ment task from the environment reward without demonstrations, a learning curriculum, or pre-trained skills. AuxDistill achieves . higher success than the previous state-of-the-art baseline in the Habitat Object Rearrangement benchmark and outperforms methods that use pre-trained skills and expert demonstrations.作者: 混合,攙雜 時(shí)間: 2025-3-26 22:58 作者: Adenocarcinoma 時(shí)間: 2025-3-27 03:23 作者: Lethargic 時(shí)間: 2025-3-27 05:44 作者: 帳單 時(shí)間: 2025-3-27 09:34 作者: Infinitesimal 時(shí)間: 2025-3-27 14:51 作者: Offset 時(shí)間: 2025-3-27 17:57
Qingquan Tony Zhang,Beibei Li,Danxia Xiea language model that interprets user prompts and a vision model that generates corresponding images. As language and vision models continue to progress in their respective domains, there is a great potential in exploring the replacement of components in text-to-image diffusion models with more adva作者: 鳴叫 時(shí)間: 2025-3-27 22:13 作者: Limousine 時(shí)間: 2025-3-28 02:47 作者: 高調(diào) 時(shí)間: 2025-3-28 06:14
Qingquan Tony Zhang,Beibei Li,Danxia Xieisions in their corresponding reports, and in turn facilitates analysis and interpretation of intricate imaging data. However, such observation is predominantly justified on single-modality data (mostly 2D images like X-rays), adapting VLP to learning unified representations for medical images in re作者: Overthrow 時(shí)間: 2025-3-28 11:31
Qingquan Tony Zhang,Beibei Li,Danxia Xieormance significantly, we identify a vulnerability associated with skip connections to Model Inversion (MI) attacks, a type of privacy attack that aims to reconstruct private training data through abusive exploitation of a model. In this paper, as a pioneer work to understand how DNN architectures a作者: 顯微鏡 時(shí)間: 2025-3-28 15:32
Frances Stewart,Sanjaya Lall,Samuel Wangwe the recent paradigm of adopting off-the-shelf models, however, access to their training data is often infeasible or not practical, while most of such models are not originally trained concerning adversarial robustness. In this paper, we develop a scalable and model-agnostic solution to achieve adve作者: 越自我 時(shí)間: 2025-3-28 18:53 作者: BOOST 時(shí)間: 2025-3-29 01:41 作者: inscribe 時(shí)間: 2025-3-29 07:07
Dacian C. Dragos,Bogdana Neamtu,Raluca Suciuhorizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves this by concurrently carrying out multi-task RL with auxiliary tasks, which are easier to learn and relevant to the main task. A weighted distillation loss transfers behaviors from these auxiliary tas作者: 可以任性 時(shí)間: 2025-3-29 09:09 作者: 本能 時(shí)間: 2025-3-29 12:09 作者: BRIDE 時(shí)間: 2025-3-29 18:21 作者: agglomerate 時(shí)間: 2025-3-29 21:17 作者: FOVEA 時(shí)間: 2025-3-30 00:10
Energetische Beurteilungskriterien,little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direct use falls short of VAD. Specifically, the implic作者: 不能和解 時(shí)間: 2025-3-30 04:34
Jochem Unger,Antonio Hurtado,Rafet Islerel-aware tasks. Our method enables MLLMs to learn pixel-level location information without requiring excessive modifications to the existing model architecture or adding specialized tokens. We introduce an inquiry-based approach that can effectively find prompt points for SAM to perform segmentation作者: 匯總 時(shí)間: 2025-3-30 08:56 作者: AVANT 時(shí)間: 2025-3-30 12:27
https://doi.org/10.1007/978-3-031-73004-7artificial intelligence; computer networks; computer systems; computer vision; education; Human-Computer 作者: 天空 時(shí)間: 2025-3-30 20:25 作者: DEMUR 時(shí)間: 2025-3-30 20:49
Lecture Notes in Computer Sciencehttp://image.papertrans.cn/d/image/242338.jpg作者: TIGER 時(shí)間: 2025-3-31 04:32
https://doi.org/10.1007/978-3-030-97351-3ental tasks. Extensive experiments on three benchmark datasets validate that . consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting. Source code is available at ..作者: 名字 時(shí)間: 2025-3-31 07:07 作者: liposuction 時(shí)間: 2025-3-31 12:16
Alternative Comedy Now and Thenadversarial attacks, SOL achieved the best attack success rate across seven SOTA defense models. On GNN training, SOL discovered optimizers can outperform Adam on three different datasets. On BERT fine-tuning, SOL also outperformed AdamW on five benchmarks. The source code is available at ..作者: 心神不寧 時(shí)間: 2025-3-31 14:24 作者: 含水層 時(shí)間: 2025-3-31 19:48
Qingquan Tony Zhang,Beibei Li,Danxia Xieble and plug-and-play approach without requiring modifications to the original weights of the language and vision models. Our pipeline is compatible with various language models and generative vision models, accommodating different structures. Within this framework, we demonstrate that incorporating作者: 逃避系列單詞 時(shí)間: 2025-3-31 23:14
Qingquan Tony Zhang,Beibei Li,Danxia Xiey, reducing misdiagnosis by 40% and 25% in the real-world medical and natural image datasets, respectively. It also demonstrates compatibility with various pre-trained diffusion models. Code is available at ..作者: 頂點(diǎn) 時(shí)間: 2025-4-1 02:31
Qingquan Tony Zhang,Beibei Li,Danxia Xienes, CSS-MSA introduces an inductive bias of paying more attention within frames instead of between frames while saving computational overheads. GSM-FFN further enhances the locality via gated mechanism and factorized spatial-temporal convolutions. Extensive experiments demonstrate that our method o作者: Urgency 時(shí)間: 2025-4-1 07:03
Qingquan Tony Zhang,Beibei Li,Danxia Xierepresentations for diverse modalities of medical images (especially for 2D and 3D images). Under the text’s guidance, . effectively select text-related 2D slices from sophisticated 3D volume, which acts as pseudo-pairs to bridge 2D and 3D data, ultimately enhancing the consistency across various me作者: 糾纏 時(shí)間: 2025-4-1 12:11