• 随机缩放
    2019-12-19 00:43:48






    import random
    import numpy as np
    def random_crop
  • 让图像随机缩放进行数据增强We all know the scene. Two detectives on a cop show stand in a dimly lit room filled with monitors, reviewing surveillance images. A tech guy (yes, it’s almost always a guy)...


    We all know the scene. Two detectives on a cop show stand in a dimly lit room filled with monitors, reviewing surveillance images. A tech guy (yes, it’s almost always a guy) queues up image after image as the detectives look on, squinting at the screen in concentration. “There’s nothing here!” one detective insists. They’re about to give up, when the other detective (our hero) shouts, “Wait!”

    我们都知道现场。 警察表演中的两名侦探站在昏暗的房间里,房间里装满监视器,查看监视图像。 当侦探望着镜头时,一个技术人员(是的,几乎总是一个人)在一个接一个的图像中排队,凝视着屏幕。 “这里什么都没有!” 一名侦探坚持。 当另一名侦探(我们的英雄)大喊:“等等!”时,他们将放弃。

    Everyone stops. “Zoom in there!” the detective says. The tech guy obligingly zooms in on a grainy corner of the image. “Enhance that!” the detective intones. The tech guy taps some keys, mutters something about algorithms, and suddenly the image comes into focus, revealing some tiny, significant detail. The case is cracked wide open!

    大家停下来 “放大那里!” T他的侦探说。 技术人员会努力放大图像的颗粒状角落。 “增强!” 侦探声调。 技术人员轻按一些键,喃喃自语一些算法,然后图像立即成为焦点,揭示了一些微小的重要细节。 案子开得裂了!

    This scene is a crime drama cliché so pervasive that it has inspired its own meme video with nearly a million views.


    Scenes like these drive real tech people bananas, because “zoom and enhance” has always seemed like an impossible fantasy. Until now. Thanks to two recent innovations, zoom and enhance is finally here. It has the potential to radically change police surveillance, often in concerning ways — or at least help you bring back your photos from the early ’00s.

    诸如此类的场景推动了真正的技术人员的成长,因为“ 缩放和增强 ”一直看起来像是不可能的幻想。 到现在。 多亏了最近的两项创新,缩放和增强终于到了。 它有可能从根本上改变警察的监视方式,通常以令人关注的方式进行,或者至少可以帮助您恢复20年代初期的照片。

    The first innovation behind real-life zoom and enhance comes from the world of photography. Until recently, photographers had two primary options for digital cameras: professional DSLRs like the Nikon D series, or cheap compact consumer cameras, like the kind you’d use for birthday or travel snapshots. DSLRs take great photos, but they’re bulky and conspicuous and can be hard to operate — not a great combo for surveillance work. Compact cameras rarely have the quality necessary for surveillance professionals.

    现实生活中缩放和增强背后的第一个创新来自摄影界。 直到最近,摄影师还为数码相机提供了两个主要选择: 尼康D系列等专业数码单反相机,或者像您用于生日或旅行快照的廉价紧凑型家用相机。 DSLR可以拍摄出色的照片,但它们体积庞大且引人注目,并且难以操作-并不是监视工作的绝佳组合 。 紧凑型摄像机很少具有监视专业人员所需的质量。

    That all began to change around 2015, with the rise of mirrorless cameras. These cameras have the tiny form factor of a compact camera, but thanks to advances in imaging chips driven in part by smartphones, they pack in the same high-quality image sensors usually found in a DSLR. Increasingly, they also borrow complex image processing software from the smartphone world, further enhancing their capabilities. And crucially, they allow for the use of professional lenses — easily the most important factor for taking high-quality photos.

    随着无反光镜相机的兴起,这一切在2015年左右开始改变。 这些相机的外形比紧凑型相机小,但是由于部分由智能手机驱动的成像芯片的进步,它们采用了通常与DSLR相同的高质量图像传感器。 他们也越来越多地从智能手机领域借用复杂的图像处理软件 ,从而进一步增强了功能。 至关重要的是,它们允许使用专业镜头-轻松成为拍摄高质量照片的最重要因素

    For a few thousand dollars, a surveillance professional or police force can now purchase tiny, easy-to-use cameras that take better photos than the best professional cameras from just a few years ago.


    The end result is a tiny camera that you can carry and use inconspicuously, while taking extremely detailed, high-resolution photos. The Q, a mirrorless camera from legendary German camera maker Leica, largely kicked off the trend. The latest Q model weighs just 1.4 pounds and takes 47-megapixel photos through an obscenely crisp lens that sees more detail than the human eye. With an ISO rating of 50,000 (15 times higher than that achieved by the fastest analog films), it can also essentially see in the dark.

    最终结果是一个微型相机,您可以毫不费力地携带和使用它,同时可以拍摄极其详细的高分辨率照片。 Q是传奇的德国相机制造商莱卡(Leica)生产的无反光镜相机,在很大程度上开始了这一趋势。 最新的Q型号仅重1.4磅,可通过一个令人眼花lens乱的清晰镜头拍摄47兆像素的照片,该镜头比人眼可以看到更多细节 。 ISO等级为50,000(比最快的模拟胶片高15倍),它实际上也可以在黑暗中看到。

    Image for post
    The Leica Q2. Photo: Leica
    徕卡Q2。 照片:徕卡

    Lower-priced competitors, like the Sony Alpha, have since emerged. For a few thousand dollars, a surveillance professional or police force can now purchase tiny, easy-to-use cameras that take better photos than the best professional cameras from just a few years ago. Zooming into photos taken on these cameras can sometimes feel like using zoom and enhance. The detail they capture — especially paired with modern software — is remarkable.

    此后出现了诸如Sony Alpha之类的低价竞争对手。 监视专业人员或警察部队现在只需花费几千美元,就可以购买微型,易于使用的相机,这些相机比几年前最好的专业相机拍摄的照片更好。 放大在这些相机上拍摄的照片有时会感觉像是使用变焦和增强功能。 他们捕获的细节(尤其是与现代软件搭配使用)非常出色。

    But combine mirrorless camera images with compressive sensing, and zoom and enhance is truly here. Compressive sensing allows you to massively enlarge an image without a major loss in quality. The tech has been around since the early 2000s, but it gained prominence in 2010 when researchers showed how it could be used to reconstruct an image of President Barack Obama using a tiny sample of randomly distributed pixels.

    但是,将无反光镜相机图像与压缩感测相结合,变焦和增强确实在这里。 压缩感应使您可以在不损失质量的情况下大幅放大图像。 该技术自2000年代初就出现了,但是在2010年引起了人们的关注,研究人员展示了如何利用随机分布的微小像素样本将其用于重建巴拉克·奥巴马(Barack Obama)总统的图像

    In 2017, Google showed how principles of compressive sensing could be combined with neural networks to reconstruct degraded or low-quality images in a process called A.I. super-resolution. The tech works by starting with sample images — often of faces or rooms — and deliberately messing them up by making them blurry, running them through a terrible JPEG compression system, and the like.

    2017年,谷歌展示了如何将压缩感测原理与神经网络结合起来,以称为AI超分辨率的过程重建退化或低质量的图像。 该技术的工作原理是从样本图像(通常是面Kong或房间)开始,然后通过使图像模糊,故意通过可怕的JPEG压缩系统运行等等来故意弄乱它们。

    A neural network then looks at the degraded images, compares them to their high-quality counterparts, and learns how the two relate. Essentially, the network teaches itself all the ways that a digital image can degrade. Once it knows this, the process is reversed. The system is handed a low-quality or degraded image, and based on its training, it constructs a high-quality, undegraded version from scratch.

    然后,神经网络查看降级的图像,将其与高质量的图像进行比较,并了解两者之间的关系。 本质上,网络会自学数字图像可能降级的所有方法。 一旦知道这一点,该过程就被逆转。 该系统将获得低质量或降级的图像,并基于其培训,从头开始构建高质量,未降级的版本。

    Though Google has since largely exited the field, A.I. super-resolution has taken off. Services like Big JPG allow users to upload a low-quality photograph and have it instantly upscaled 400% or more, often with minimal loss of quality. Photoshop plugins have delivered similar tech to photographers, who use it to remove blurriness and sharpen images. My A.I.-driven photography company often uses the tech to upscale digital camera photos taken in the early 2000s, allowing even these low-quality early images to meet today’s standards for use in publications.

    尽管Google从此基本上退出了该领域,但AI超分辨率已经起飞。 Big JPG之类的服务允许用户上传低质量的照片,并立即将其放大400%或更多,而质量损失通常最小。 Photoshop插件向摄影师提供了类似的技术,摄影师使用它来消除模糊和锐化图像。 我的AI驱动的摄影公司经常使用该技术来放大2000年代初拍摄的数码相机照片,甚至使这些低质量的早期图像也能满足当今出版物使用的标准。

    The tech, though, is also being used for surveillance. Quickly after its development, researchers began to show how the super-resolution could be used to upscale low-resolution surveillance photos or frames from surveillance videos. Others focused on using the tech for targeted applications, like license plate recognition. And many groups have focused on super-resolution for facial recognition images, going so far as to develop specialized algorithms for enhancing facial images.

    但是,该技术也正在用于监视。 在其开发后不久,研究人员开始展示如何将超分辨率用于放大低分辨率的监视照片监视视频的帧 。 其他人则专注于将该技术用于目标应用,例如车牌识别 。 并且许多小组都致力于面部识别图像的超分辨率,甚至开发了增强面部图像的专用算法

    Several vendors have integrated these algorithms into dedicated software products. Topaz Labs, in my experience, is the most advanced. Pair its Gigapixel AI product with the output of a modern mirrorless camera, and you’ve got zoom and enhance that rivals the imagined systems on shows like CSI.

    多家供应商已将这些算法集成到专用软件产品中。 以我的经验, Topaz Labs是最先进的。 将其Gigapixel AI产品与现代无反光镜相机的输出配合使用,您将获得与CSI这类可与想象中的系统匹敌的变焦和增强功能。

    Here, for example, is a photo of a Jamba Juice restaurant in Marin County, California, taken on my Leica Q mirrorless camera.

    例如,这是使用我的Leica Q无反光镜相机拍摄的加利福尼亚州马林县Jamba Juice餐厅的照片。

    Image for post
    Jamba Juice restaurant taken on a Leica Q mirrorless camera. Photos courtesy of the author.
    使用Leica Q无反光镜相机拍摄的Jamba Juice餐厅。 照片由作者提供。

    I took this from across a street, with the palm-sized camera hanging around my neck. I then ran the photo through Topaz’s Gigapixel AI software, upscaling it 400% and using the company’s proprietary face reconstruction and sharpening algorithms.

    我从一条街对面拿来的,手掌大小的相机挂在脖子上。 然后,我通过Topaz的Gigapixel AI软件运行该照片,将其放大400%,并使用该公司专有的面部重建和锐化算法。

    Zooming in to full size on the enhanced image, you can see some incredible detail. Through the restaurant’s front window, you can clearly see a patron waiting in line and examining a menu.

    放大放大后的图像,可以看到一些令人难以置信的细节。 通过餐厅的前窗,您可以清楚地看到顾客在排队等候并检查菜单。

    Image for post
    The red box shows the region that was zoomed and enhanced in the photo below.
    Image for post
    People are visible after applying zoom and enhance.

    You can even see that he’s wearing a blue surgical mask. Great job staying safe, unknown smoothie man! Flyers posted on the door are also visible, including some of the graphics on the flyer. You can see patrons inside placing their orders.

    您甚至可以看到他戴着蓝色的外科口罩。 保持安全的好工作,不知名的冰沙人! 张贴在门上的传单也可见,包括传单上的一些图形。 您会看到顾客在下订单。

    Zooming and enhancing another part of the image, you can see the text on signs in the far background (“Jamba Curbside Pickup”) and how they’ve been attached to pillars using yellow tape. And in the far distance, you can see the mannequins in another nearby store and diners eating at outdoor tables.

    缩放并增强图像的另一部分,您可以在远处的背景上看到招牌上的文字(“ Jamba路边拾音器”)以及如何使用黄色胶带将其连接到Struts上。 在远处,您可以看到附近的另一家商店里的人体模特和在户外餐桌上用餐的食客。

    Image for post
    The red box shows the region that was zoomed and enhanced in the photo below.
    Image for post
    Text is visible on signs in the far background after applying zoom and enhance.

    With more extreme zoom and a tweak to exposure, you can clearly make out the store’s signature Blendtec blenders on the counter inside.


    Image for post
    The red box shows the region that was zoomed and enhanced in the photo below.
    Image for post
    Left: Zoomed and enhanced image of blender inside the restaurant. Right: A similar model of Blendtec blender for comparison. Photo: Blendtec via PRWeb
    左:餐厅内搅拌机的放大和增强图像。 右:用于比较的Blendtec搅拌机的类似模型。 照片:通过PRWeb的Blendtec

    Blender identification, of course, is not the most groundbreaking use of a new technology. But when you apply zoom and enhance in a surveillance context, things get scary fast.

    当然,搅拌机识别并不是最先进的新技术。 但是,当您在监视环境中应用缩放和增强功能时,事情会很快变得可怕。

    Here, for example, is a photo I took of a Black Lives Matter protest in Times Square in 2016.

    例如,这是我在2016年在时代广场(Times Square)拍摄的关于黑人生活问题抗议活动的照片。

    Image for post
    Black Lives Matter protest on July 7, 2016. The red box at the center left of image is zoomed and enhanced in the photo below.
    2016年7月7日,Black Lives Matter抗议。图像中央左方的红色框在下面的照片中放大和增强。

    Applying zoom and enhance, you can clearly see the faces of police officers in the far back of the crowd. With facial reconstruction applied, these images would likely be good enough to find matches in a facial recognition database.

    应用缩放和增强功能,您可以清晰地看到人群后面的警察的面Kong。 应用面部重建后,这些图像可能足以在面部识别数据库中找到匹配项。

    Image for post
    A police officer’s face from the back of the crowd is clearly visible after applying zoom and enhance. His eyes are redacted with black bar to protect the officer’s identity.
    应用缩放和增强后,可以清晰地看到人群后方的警官脸。 黑色的眼睛修饰了他的眼睛,以保护军官的身份。

    Combining this tech with facial recognition systems like Clearview AI would make it trivial to identify large numbers of people in a crowd of protesters. A plainclothes police officer or federal agent posing as a tourist could easily walk through a crowd of protesters while snapping photos on a tiny mirrorless camera. The photos could be run through a super-resolution system, enlarging them massively and enhancing the faces present.

    将该技术与Clearview AI等面部识别系统结合使用,可以轻松地识别出一群抗议者中的大量人员。 伪装成游客的便衣警察或联邦特工可以轻松地穿过一群抗议者,同时用微型无反相机拍摄照片。 这些照片可以通过超分辨率系统运行,可以对其进行大规模放大并增强当前的面Kong。

    Individual faces could then be pulled out of the image and run through a system like Clearview’s to identify every individual by name. Police forces and other agencies are reportedly already using A.I. to identify different actions (like breaking into a vehicle or loitering) and to search surveillance images for people based on their physical descriptions. It’s unclear if any are using super-resolution yet, but undoubtedly that will come. Face reconstruction tech will likely improve as well — many faces today still come out distorted when enhanced, but facial reconstruction errors will likely diminish with time.

    然后可以将个人面部从图像中拉出,并通过像Clearview这样的系统运行,以通过名称识别每个个人。 据报道,警察部队和其他机构已经在使用AI识别不同的动作(例如闯入车辆或游荡),并根据其身体描述搜索监视图像。 尚不清楚是否正在使用超分辨率,但是无疑会出现。 面部重建技术也可能会得到改善-如今,许多面部在增强后仍然会变形,但是面部重建错误可能会随着时间而减少。

    We need to ensure that technologies like zoom and enhance are available to law enforcement when they’re truly needed. But we also need to make sure that they’re not abused.

    我们需要确保缩放和增强等技术在真正需要时可供执法人员使用。 但是我们还需要确保它们未被滥用。

    As the tech improves, you might not even need a mirrorless camera or other high-quality cameras. Super-resolution may ultimately become good enough to perform zoom-and-enhance functions on the low-resolution output of a traditional surveillance camera, identifying every individual in a crowd using footage from traffic cams, surveillance cameras from a store or nearby home, or even a circling drone. It could also one day be applied to photos taken on a smartphone or even the low-resolution photos displayed on social media platforms like Instagram.

    随着技术的进步,您甚至可能不需要无反光镜相机或其他高质量的相机。 超分辨率最终可能会变得足够好,可以在传统监控摄像头的低分辨率输出上执行缩放和增强功能,使用交通摄像头,商店或附近家庭的监控摄像头的镜头识别人群中的每个人,或者甚至是盘旋的无人机。 它也可能有一天应用于在智能手机上拍摄的照片,甚至是在Instagram等社交媒体平台上显示的低分辨率照片。

    As with any new surveillance technology, ensuring responsible use of zoom and enhance is a matter of establishing the right laws and policies. The Fourth Amendment of the U.S. Constitution already provides protection against searches without a warrant. Courts have weighed issues of new tech in the past — for example, looking at whether surveillance with telephoto lenses violates the Fourth Amendment. They have generally ruled that widely available tech like zoom lenses can be used in many contexts, but specialized tech like radar that sees through walls cannot.

    与任何新的监视技术一样,确保负责任地使用缩放和增强是建立正确的法律和政策的问题。 美国宪法第四修正案已经提供了防止搜查而没有逮捕令的保护。 过去,法院一直在权衡新技术问题,例如,研究用远摄镜头监视是否违反了第四修正案。 他们通常裁定可以在许多情况下使用诸如变焦镜头之类的广泛使用的技术,但是不能穿透墙壁的雷达等专门技术却可以使用。

    It’s not yet clear where zoom and enhance would fall on that spectrum. The technology might be viewed as just another version of the zoom lens on a traditional camera. But given its elements of artificial intelligence, courts might find that it’s too specialized of a technology to be mobilized without proper search warrants.

    目前尚不清楚缩放和增强将在该频谱上落在何处。 该技术可能只是传统相机上变焦镜头的另一个版本。 但是考虑到人工智能的要素,法院可能会发现,它过于专业化,无法在没有适当搜查令的情况下进行动员。

    For now, the tech is too new for these precedents to have been established. As citizens, the best thing we can do is to be aware of its existence. If you’re at a protest or another sensitive event, assume that you’re being surveilled and photographed. Even if you don’t see someone with a professional-looking camera, authorities could still be capturing your image at a high enough quality to look you up using facial recognition and identify you by name.

    就目前而言,这项技术对于这些先例尚不成熟。 作为公民,我们能做的最好的事情就是意识到它的存在。 如果您在抗议或其他敏感事件中 ,请假设您正在接受监视和拍照。 即使您看不到带有专业外观的相机的人,当局也可能会以足够高的质量捕获图像,从而可以使用面部识别功能查找您并按名称识别您。

    We can also proactively inform lawmakers about which new technologies we’re comfortable with and which ones we’re not. Popular anger over facial recognition technologies led to a proposed bill to ban the use of this tech in policing. We need to ensure that technologies like zoom and enhance are available to law enforcement when they’re truly needed. But we also need to make sure that they’re not abused.

    我们还可以主动告知立法者哪些技术适合我们,哪些技术不适合。 对面部识别技术的普遍愤怒导致提议的一项法案禁止在警务中使用该技术。 我们需要确保缩放和增强等技术在真正需要时可供执法人员使用。 但是我们还需要确保它们未被滥用。

    Much as science fiction did a good job of preparing us for space travel and computers, shows like CSI have done a good job of introducing us to the concept of zoom and enhance before it existed. But when you move beyond the imagined world of a good-guy cop fighting evil criminals, the real-world ethics of tech like zoom and enhance get blurry fast.

    就像科幻小说为我们为太空旅行和计算机做准备做得很好一样,像CSI一样的节目也为我们介绍了缩放和增强的概念做得很好。 但是,当您超越一个好人警察与邪恶的罪犯作战的想象世界时,变焦和增强等现实世界的技术伦理就会Swift变得模糊。

    翻译自: https://onezero.medium.com/zoom-and-enhance-is-finally-here-c727b3258a11


  • 如有错误,恳请指出。 在之前使用opencv就介绍使用过一些常用的数据增强的实现方法,见:《数据增强 | 旋转、平移、缩放、错切、HSV增强》,当时介绍了旋转、平移、缩放、... 随机旋转、平移、缩放、错切2. hsv增强.


    在之前使用opencv就介绍使用过一些常用的数据增强的实现方法,见:《数据增强 | 旋转、平移、缩放、错切、HSV增强》,当时介绍了旋转、平移、缩放、错切、HSV增强,但是只是针对了图像的数据增强,并没有涉及到label的变化。


    1. 随机旋转、平移、缩放、错切

    这节来介绍其他的数据正确方式,比如仿射变换还有hsv增强,虽然之前我使用opencv进行了部分尝试,详细见:数据增强 | 旋转、平移、缩放、错切、HSV增强,不过这里还是更加yolov3-spp代码进行补充。


    # train阶段默认为:
    # img:(1472, 1472, 3), targets:(k, 5)
    # 旋转角度degrees: 0.0, 平移系数translate: 0.0, 缩放因子scale=0.0, 错切角度shear:0.0
    # border=-368
    def random_affine(img, targets=(), degrees=10, translate=.1, scale=.1, shear=10, border=0):
        仿射变换 增强
        torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))
        :param img: img4  [2 x img_size, 2 x img_size, 3]=[1472, 1472, 3]  img_size为我们指定的图片大小
        :param targets: labels4 [:, cls+x1y1x2y2]=[7, 5]  相对img4的   (x1,y1)左下角  (x2,y2)右上角
        :param degrees: 旋转角度  0
        :param translate: 水平或者垂直移动的范围  0
        :param scale: 放缩尺度因子  0
        :param shear: 裁剪因子 0
        :param border: -368  图像每条边需要裁剪的宽度  也可以理解为裁剪后的图像与裁剪前的图像的border
        :return: img: 经过仿射变换后的图像 img [img_size, img_size, 3]
                 targets=[3, 5] 相对仿射变换后的图像img的target 之所以这里的target少了,是因为仿射变换使得一些target消失或者变得极小了
        # 对图像进行仿射变换
        # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))
        # targets = [cls, xyxy]
        # 最终输出的图像尺寸,等于img4.shape / 2
        height = img.shape[0] + border * 2
        width = img.shape[1] + border * 2
        # Rotation and Scale
        # 生成旋转以及缩放矩阵
        R = np.eye(3)  # 生成对角阵
        a = random.uniform(-degrees, degrees)  # 随机旋转角度
        s = random.uniform(1 - scale, 1 + scale)  # 随机缩放因子
        R[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)
        # Translation
        # 生成平移矩阵
        T = np.eye(3)
        T[0, 2] = random.uniform(-translate, translate) * img.shape[0] + border  # x translation (pixels)
        T[1, 2] = random.uniform(-translate, translate) * img.shape[1] + border  # y translation (pixels)
        # Shear
        # 生成错切矩阵
        S = np.eye(3)
        S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
        S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)
        # Combined rotation matrix
        # 将三个仿射变换矩阵相乘,即可得到最后的仿射变换矩阵
        M = S @ T @ R  # ORDER IS IMPORTANT HERE!!
        if (border != 0) or (M != np.eye(3)).any():  # image changed
            # 进行仿射变化
            # 最后输出的图像大小为dsize=(width, height)
            img = cv2.warpAffine(img, M[:2], dsize=(width, height), flags=cv2.INTER_LINEAR, borderValue=(114, 114, 114))
        # Transform label coordinates
        # 对图像的label信息进行仿射变换
        n = len(targets)
        if n:
            # warp points
            xy = np.ones((n * 4, 3))
            # 求出所有目标边界框的四个顶点(x1y1, x1y2, x2y1, x2y2)
            # x1:1, y1:2, x2:3, y2:4
            # x1y1:(1,2), x2y2:(3,4), x1y2:(1,4), x2y1:(3,2)
            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            # 对四个顶点坐标进行仿射变换,就是与仿射矩阵进行相乘
            # 对于仿射矩阵中,最后一行是没有用的;所有对于坐标点需要增加一维,矩阵相乘后去除
            # 这里算是矩阵相乘的一个小trick,比较细节
            # [4*n, 3] -> [n, 8]
            xy = (xy @ M.T)[:, :2].reshape(n, 8)
            # create new boxes
            # 再求出仿射变换后的所有x坐标与y坐标
            # 对transform后的bbox进行修正(假设变换后的bbox变成了菱形,此时要修正成矩形)
            x = xy[:, [0, 2, 4, 6]]  # [n, 4]
            y = xy[:, [1, 3, 5, 7]]  # [n, 4]
            # 这里取xy的最小值作为新的边界框的左上角,取xy的最大值最为新的边界框的右下角
            # 因为随机变换有可能将图像进行旋转,那么边界框也会选择,所以这时候需要对选择的边界框进行修正为不旋转的矩形,而不是菱形◇
            xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T  # [n, 4]
            # reject warped points outside of image
            # 对坐标进行裁剪,防止越界,最小值为0,最大值为对于的宽高
            xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)
            xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)
            w = xy[:, 2] - xy[:, 0]
            h = xy[:, 3] - xy[:, 1]
            # 计算调整后的每个box的面积:{ndarray:(9,)}
            area = w * h
            # 计算调整前的每个box的面积
            area0 = (targets[:, 3] - targets[:, 1]) * (targets[:, 4] - targets[:, 2])
            # 计算每个box的比例
            ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))  # aspect ratio
            # 选取长宽大于4个像素,且调整前后面积比例大于0.2,且比例小于10的box
            i = (w > 4) & (h > 4) & (area / (area0 * s + 1e-16) > 0.2) & (ar < 10)
            # 筛选边界框,所以其实经过仿射变换后的有些标签信息是使用不上的,也就是被忽略掉了
            targets = targets[i]
            # 变换后的边界框信息重新赋值
            targets[:, 1:5] = xy[i]
        return img, targets
    # 对图像与标签应用仿射变换
    img4, labels4 = random_affine(img4, labels4,                     # 输入图片与边界框信息
                                  degrees=self.hyp['degrees'],       # 旋转角度
                                  translate=self.hyp['translate'],   # 平移系数
                                  scale=self.hyp['scale'],           # 缩放系数
                                  shear=self.hyp['shear'],           # 错切角度
                                  border=-s // 2)                    # 这里的s是期待输出图片的大小



    # create new boxes
    # 再求出仿射变换后的所有x坐标与y坐标
    # 对transform后的bbox进行修正(假设变换后的bbox变成了菱形,此时要修正成矩形)
    x = xy[:, [0, 2, 4, 6]]  # [n, 4]
    y = xy[:, [1, 3, 5, 7]]  # [n, 4]
    # 这里取xy的最小值作为新的边界框的左上角,取xy的最大值最为新的边界框的右下角
    # 因为随机变换有可能将图像进行旋转,那么边界框也会选择,所以这时候需要对选择的边界框进行修正为不旋转的矩形,而不是菱形◇
    xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T  # [n, 4]


    2. hsv增强

    部分参考,见之前的笔记:数据增强 | 旋转、平移、缩放、错切、HSV增强


    def augment_hsv(img, h_gain=0.5, s_gain=0.5, v_gain=0.5):
        hsv增强  处理图像hsv,不对label进行任何处理
        :param img: 待处理图片  BGR [736, 736]
        :param h_gain: h通道色域参数 用于生成新的h通道
        :param s_gain: h通道色域参数 用于生成新的s通道
        :param v_gain: h通道色域参数 用于生成新的v通道
        :return: 返回hsv增强后的图片 img
        # 从-1~1之间随机生成3随机数与三个变量进行相乘
        r = np.random.uniform(-1, 1, 3) * [h_gain, s_gain, v_gain] + 1  # random gains
        hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))
        dtype = img.dtype  # uint8
        # 分别针对hue, sat以及val生成对应的Look-Up Table(LUT)查找表
        x = np.arange(0, 256, dtype=np.int16)
        lut_hue = ((x * r[0]) % 180).astype(dtype)
        lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
        lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
        # 使用cv2.LUT方法利用刚刚针对hue, sat以及val生成的Look-Up Table进行变换
        img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))).astype(dtype)
        aug_img = cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img)  # no return needed
        # 这里源码是没有进行return的,不过我还是觉得return一下比较直观了解
        return aug_img 

    3. 随机翻转(水平与竖直)


    # 平移增强 随机左右翻转 + 随机上下翻转
    if self.augment:
        # 随机左右翻转
        # random left-right flip
        lr_flip = True
        # random.random() 生成一个[0,1]的随机数
        if lr_flip and random.random() < 0.5:
            img = np.fliplr(img)  # np.fliplr 将数组在左右方向翻转
            if nL:
                labels[:, 1] = 1 - labels[:, 1]  # 1 - x_center  label也要映射
        # 随机上下翻转
        # random up-down flip
        ud_flip = False
        if ud_flip and random.random() < 0.5:
            img = np.flipud(img)  # np.flipud 将数组在上下方向翻转。
            if nL:
                labels[:, 2] = 1 - labels[:, 2]  # 1 - y_center  label也要映射

    4. 完整的数据增强代码展示



    # 自定义数据集
    class LoadImagesAndLabels(Dataset):  # for training/testing
        def __init__(self,
                     path,   # 指向data/my_train_data.txt路径或data/my_val_data.txt路径
                     # 这里设置的是预处理后输出的图片尺寸
                     # 当为训练集时,设置的是训练过程中(开启多尺度)的最大尺寸
                     # 当为验证集时,设置的是最终使用的网络大小
                     augment=False,  # 训练集设置为True(augment_hsv),验证集设置为False
                     hyp=None,  # 超参数字典,其中包含图像增强会使用到的超参数
                     rect=False,  # 是否使用rectangular training
                     cache_images=False,  # 是否缓存图片到内存中
                     single_cls=False, pad=0.0, rank=-1):
            # 注意: 开启rect后,mosaic就默认关闭
            self.mosaic = self.augment and not self.rect
        # 自定义处理格式
        def __getitem__(self, index):
                self: self.img_files: 存放每张照片的地址
                      self.label_files: 存放每张照片的label的地址
                      self.imgs=[None] * n  cache image 恐怕没那么大的显存
                      self.labels: 存放每4张图片的label值 [cls+xywh] xywh都是相对值 cache label
                                   并在cache label过程中统计nm, nf, ne, nd等4个变量
                      self.batch: 存放每张图片属于哪个batch  self.shape: 存放每张图片原始的shape
                      self.n: 总的图片数量     self.hyp  self.img_size
                      数据增强相关变量: self.augment; self.rect; self.mosaic
                      rect=True: 会生成self.batch_shapes 每个batch的所有图片统一输入网络的shape
                index: 传入要index再从datasets中随机抽3张图片进行mosaic增强以及一系列其他的增强,且label同时也要变换
                torch.from_numpy(img): 返回一张增强后的图片(tensor格式)
                labels_out: 这张图片对应的label (class, x, y, w, h) tensor格式
                self.img_files[index]: 当前这张图片所在的路径地址
                shapes: train=None  val=(原图hw),(缩放比例),(pad wh) 计算coco map时要用
                index: 当前这张图片的在self.中的index
            hyp = self.hyp
            # 训练过程使用mosaic数据增强
            if self.mosaic:
                # load mosaic
                img, labels = load_mosaic(self, index)
                shapes = None
            # 推理阶段使用rect加快推理过程
                # load image
                img, (h0, w0), (h, w) = load_image(self, index)
                # letterbox
                shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape
                img, ratio, pad = letterbox(img, shape, auto=False, scale_up=self.augment)
                shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling
                # load labels
                labels = []
                x = self.labels[index]
                if x.size > 0:
                    # Normalized xywh to pixel xyxy format
                    labels = x.copy()  # label: class, x, y, w, h
                    labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0]  # pad width
                    labels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1]  # pad height
                    labels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0]
                    labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1]
            # 是否进行数据增强
            if self.augment:
                # 由于mosaic中已经进行了random_affine,所以不需要;没有进行mosaic才需要
                if not self.mosaic:
                    img, labels = random_affine(img, labels,
                # Augment colorspace: hsv数据增强, 这一部分由于没有对标签进行更改,所以不需要对边界框进行处理
                img = augment_hsv(img, h_gain=hyp["hsv_h"], s_gain=hyp["hsv_s"], v_gain=hyp["hsv_v"])
            # 在进行仿射变换之后会忽略一些边界框,如果没有边界框信息就可以跳过了,如果有则进行处理
            nL = len(labels)  # number of labels
            if nL:
                # convert xyxy to xywh
                labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])
                # Normalize coordinates 0-1: 归一化处理
                labels[:, [2, 4]] /= img.shape[0]  # height
                labels[:, [1, 3]] /= img.shape[1]  # width
            # 进行随机水平翻转也竖直翻转
            if self.augment:
                # random left-right flip
                lr_flip = True  # 随机水平翻转
                if lr_flip and random.random() < 0.5:
                    img = np.fliplr(img)
                    if nL:
                        labels[:, 1] = 1 - labels[:, 1]  # 1 - x_center
                # random up-down flip
                ud_flip = False
                if ud_flip and random.random() < 0.5:
                    img = np.flipud(img)   # 随机竖直翻转
                    if nL:
                        labels[:, 2] = 1 - labels[:, 2]  # 1 - y_center
            # 判断翻转后是否还有边界框信息, 并进行格式转换
            labels_out = torch.zeros((nL, 6))  # nL: number of labels
            if nL:
                labels_out[:, 1:] = torch.from_numpy(labels)
            # Convert BGR to RGB, and HWC to CHW(3x512x512)
            img = img[:, :, ::-1].transpose(2, 0, 1)
            img = np.ascontiguousarray(img)   # 内存连续
            return torch.from_numpy(img), labels_out, self.img_files[index], shapes, index
    • 总结:



    1. https://www.bilibili.com/video/BV1t54y1C7ra?p=5
    2. https://blog.csdn.net/qq_38253797/article/details/117961285
  • 利用径向基函数(RBF,radial basis function)神经网络进行阈值学习求出图像的重要度阈值,根据阈值将图像分成保护区域和非保护区域,并按缩放要求为其分配不同的缩放比,分别进行依概率随机裁剪。在MSRA图像数据库...
  • 3Dsmax随机旋转缩放插件,直接拖入3Dsmax即可使用,可兼容多个版本,种树时方便使用,随机旋转,缩放
  • 资源介绍:。源码实现了图片分块随机显示。资源作者:。资源下载:。
  • yolov4的图像数据预处理--随机缩放并加上灰度条


    time: 2022/04/14
    author: cong
    theme: 对图像进行长和宽的扭曲达到缩放的目的并且多余部分加上灰度条。
    from PIL import Image
    import numpy as np
    def rand(a=0, b=1):
        return np.random.rand() * (b - a) + a
    w = 416
    h = 416
    jitter = 0.3
    new_ar = w / h * rand(1 - jitter, 1 + jitter) / rand(1 - jitter, 1 + jitter)
    print('new_ar:', new_ar)
    scale = rand(.25, 2)
    print('scale:', scale)
    image = Image.open('img.png')
    # 随机缩放
    if new_ar < 1:
        nh = int(scale * h)
        nw = int(nh * new_ar)
        print('nw:', nw, 'nh:', nh)
        nw = int(scale * w)
        nh = int(nw / new_ar)
        print('nw:', nw, 'nh:', nh)
    image = image.resize((nw, nh), Image.BICUBIC)
    # ------------------------------------------#
    #   将图像多余的部分加上灰条
    # ------------------------------------------#
    dx = int(rand(0, w - nw))
    dy = int(rand(0, h - nh))
    print('dx:', dx, 'dy:', dy)
    new_image = Image.new('RGB', (w, h), (128, 128, 128))
    new_image.paste(image, (dx, dy)) # 把image粘到new_image上,起始位置相对于new_image的位置(dx,dy)



  • 源码实现了图片分块随机显示。
  • 机器学习种对数据的缩放,标准化实现以及案例
  • import cv2 img = cv2.imread('resize_1.jpg') h, w = img.shape[:2] cv2.imshow('origin', img) # 缩小到原来的一半 new_img = cv2.resize(img, (int(w/2),int(h/2)), interpolation=cv2.INTER_AREA) ...
  • python 机器学习之数据预处理与缩放 序 之前我们在接触监督学习时了解到,有一些算法(譬如神经网络和SVM)对于数据的缩放非常敏感。因此,通常的做法是对数据集进行调节,使得数据表示更适合于这些算法。通常来说,...
  • 1. 特征缩放在随机梯度下降(stochastic gradient descent)算法中,特征缩放有时能提高算法的收敛速度。1.1 什么是特征缩放特征缩放是用来标准化数据特征的范围。1.2 机器算法为什么要特征缩放特征缩放还可以使机器...
  • 为什么要做特征缩放 大多数情况下,你的数据集将包含在大小、单位和范围上差别很大的特征。但是,由于大多数机器学习算法在计算中使用两个数据点之间的欧氏距离,这会是一个问题。 如果不加考虑,这些算法只考虑...
  • QT QML控件自动缩放

    2020-11-29 16:43:53
    前言: 转载请附上连接,本帖原创请勿照抄。 效果图: QML部分: main.qml import QtQuick 2.12 import QtQuick.Window 2.12 import QtQuick.Controls 2.5 Window { visible: true ... propert.
  • 数据处理之范围缩放

    2021-06-15 10:50:23
    范围缩放 将样本矩阵中的每一列的最小值和最大值设定为相同的区间, 统一各列特征值的范围. 一般情况下,会把特征区间缩放至[0,1]. [17, 20, 23] 如何使这组数据的最小值等于0: [0, 3, 6] 如何使这组数据的最大值等于1...
  • 批量数据增强:目标检测系列(1)缩放 目标检测的数据增强需要同时对标注数据进行修改。
  • 数据特征预处理(数据缩放

    千次阅读 2020-10-27 16:55:13
    对于数值类型数据可采用标准的缩放,其方法有: 归一化 标准化 缺失值处理 (2)类别型数据 采用one-hot编码,像字典、文本数据进行特征抽取,转成了one-hot编码。 (3)时间类型 采用时间切分的方式。 注意...
  • transform的缩放和旋转 缩放: 1为正常大小,正常情况下,变形时的原点在元素中心位置,但,可改变原点位置transform-oringin:50% 50%(默认位置,中心) 放:transform:scale(1.5); 缩:transform:scale(0.6); ...
  • 缩放 缩放分为均匀缩放和非均匀缩放 缩放将会导致变长、变短、正交投影、镜像 沿轴进行缩放的正好就是对角矩阵 沿任意轴缩放 求v沿着n进行缩放 首先将v分解为平行n的向量v∥和垂直于n的向量v⟂,根据2D缩放情况,...
  • 基于PyTorch的目标检测数据增强

    千次阅读 2020-08-22 20:41:10
    针对图像的数据增强3.1 随机镜像3.2 随机缩放3.3 随机裁剪4. 总结 SSD 中的数据增强顺序如下(其中第 2 和 3 步以 0.5 的概率实施) 数据类型和坐标转换 ConvertFromInts(np.float32) ToAbsoluteCoords(bbox ...
  • 先导入图片。 import cv2 import numpy as np from matplotlib import pyplot as plt filename = '10.png' ## [Load an image from a file] img = cv2.imread(filename) img = cv2.cvtColor(img...1.图片缩放 用函数cv2
  • opencv: 图像缩放(cv2.resize)

    万次阅读 多人点赞 2017-10-12 19:43:02
    但是当图像缩放时,它类似于INTER_NEAREST方法。 INTER_CUBIC 4x4像素邻域的双三次插值 INTER_LANCZOS4 8x8像素邻域的Lanczos插值 具体示例 原图像: 缩放后的图像: ...
  • 帕金森病分类任务使用PCA执行缩放,数据分割和降维,并使用随机森林,SVM和KNN算法进行分类
  • 常用数据增强方法Blur 模糊VerticalFlip 水平翻转HorizontalFlip 垂直翻转Flip 翻转Normalize 归一化Transpose 转置RandomCrop 随机裁剪RandomGamma 随机GammaRandomRotate90 随机旋转90度Rotate旋转...
  • Mosaic数据增强随机选取4张图片进行随机缩放、随机裁剪、随机排布的方式进行拼接,对于小目标的检测效果还是很不错的。 2.1.2 自适应锚框计算 在Yolo算法中,针对不同的数据集,都会有初始设定长宽的锚框。在网络...
  • 该脚本不会自动缩放数据,允许您设置每个级别相对于屏幕的绝对高度。 stereogram(A,'method') 指定替代查看方法。 默认是平行眼法。可用的方法有: '平行眼' '斜眼' 请参阅http://www.vision3d.com/以了解查看...



1 2 3 4 5 ... 20
收藏数 58,003
精华内容 23,201